Basti's Scratchpad on the Internet
07 Sep 2016


In a recent project, I tried to parse MATLAB code. During this trying exercise, I stumbled upon a few… unique design decisions of the MATLAB language:

Use of apostrophes (')

Apostrophes can mean one of two things: If applied as a unary postfix operator, it means transpose. If used as a unary prefix operator, it marks the start of a string. While not a big problem for human readers, this makes code surprisingly hard to parse. The interesting bit about this, though, is the fact that there would have been a much easier way to do this: Why not use double quotation marks for strings, and apostrophes for transpose? The double quotation mark is never used in MATLAB, so this would have been a very easy choice.

Use of parens (())

If Parens follow a variable name, they can mean one of two things: If the variable is a function, the parens denote a function call. If the variable is anything else, this is an indexing operation. This can actually be very confusing to readers, since it makes it entirely unclear what kind of operation foo(5) will execute without knowledge about foo (which might not be available until runtime). Again, this could have been easily solved by using brackets ([]) for indexing, and parens (()) for function calls.

Use of braces ({}) and cell arrays

Cell arrays are multi-dimensional, ordered, heterogeneous collections of things. But in contrast to every other collection (structs, objects, maps, tables, matrices), they are not indexed using parens, but braces. Why? I don't know. In fact, you can index cell arrays using parens, but this only yields a new cell array with only one value. Why would this ever be useful? I have no explanation. This constantly leads to errors, and for the life of me I can not think of a reason for this behavior.

Use of line breaks

In MATLAB, your line can end on one of three characters: A newline character, a semicolon (;), and a comma (,). As we all know, the semicolon suppresses program output, while the newline character does not. The comma ends the logical line, but does not suppress program output. This is a relatively little-known feature, so I thought it would be useful to share it. Except, the meaning of ; and , changes in literals (like [1, 2; 3, 4] or {'a', 'b'; 3, 4}). Here, commas separate values on the same row and are optional, and semicolons end the current row. Interestingly, literals also change the meaning of the newline character: Inside a literal, a newline acts just like a semicolon, overrides a preceding comma, and you don't have to use ellipsis (...) for line continuations.

Syntax rules for commands

Commands are function calls without the parenthesis, like help disp, which is syntactically equivalent to help('disp'). You see, if you just specify a function name (can't be a compound expression or a function handle), and don't use parenthesis, all following words will be interpreted as strings, and passed to the function. This is actually kind of a neat feature. However, how do you differentiate between variable_name + 5 and help + 5? The answer is: Commands are actually a bit more complex. A command starts with a function name, followed by a space, which is not followed by an operator and a space. Thus, help +5 + 4 is a command, while help + 5 + 4 is an addition. Tricky!

The more-than-one-value value

If you want to save more than one value in a variable, you can use a collection (structs, matrices, maps, tables, cell arrays). In addition though, MATLAB knows another way of handling more than one values at once: The thing you get when you index a cell array with {:} or assign a function call with more than one result. In that case, you get something that is assignable to several variables, but that is not itself a collection. Just another quirk of MATLAB's indexing logic. However, you can capture these values into matrices or cell arrays using brackets or braces, like this: {x{:}} or [x{:}]. Note that this also works in assignments in a confusing way: [z{:}] = x{:} (if both x and z have the same length). Incidentally, this is often a neat way of converting between different kinds of collections (but utterly unreadable, because type information is hopelessly lost).

26 Jun 2016

P⋅O⋅L⋅L⋅E⋅N and Walk 'em ups

I like boring games. I like games that give me time to think. Like flight simulators, truck simulators, history simulators, and (don't call them walking simulators) walk 'em ups. And this weekend, this year's Steam Summer Sale started, and thus it was time to get some gaming done!

First, I played Everybody's Gone to the Rapture, a walk 'em up that got good reviews, and was especially praised for it's story. And it dutifully enraptured me, with its British landscapes, and personal story lines. But somehow it didn't quite connect. Maybe I'm not British enough, and I certainly didn't get all of the story. Still, this is well worth picking up, and visiting a digital Shropshire after just having visited the real one earlier this year was a real treat!

Second, I played P⋅O⋅L⋅L⋅E⋅N. It did not start well: The download took ages, I had to unplug all that fancy flight simulation gear to get it to recognize my controller, and performance was rather lackluster. Oh, and reviews were rather mediocre, too. (Also, here's a free tip for you if you games developers: putting fancy Unicode characters into your game title does not improve SEO).

But, I entirely fell in love with P⋅O⋅L⋅L⋅E⋅N! You arrive on Titan in your dinky near-future space capsule, take a short walk on the surface of the moon, and then explore the local space station. Of course, something went wrong, and the space station is deserted. Oh, and there is something about space bees.

What really worked for me here was the environmental story telling. This is a space station much in the tradition of 2001: A Space Odyssey and Alien. It's that perfect utilitarian and inhumanely sterile work place that human corporation always strive for, and human employees always mess up. The stark industrial bleakness of these space corridors is the perfect backdrop to make those little human irregularities really stand out.

There is no conflict, only light puzzles, and the story is developed only through voice diaries and environmental clutter. But in contrast to Rapture, it all makes sense. The futurist setting, the deep humanity of the people dealing with it. The creepiness of machines without operators. The struggle of the scientists and engineers against the forces of nature and greedy corporations. The nods —nay— genuflections to everything that has ever been great in Scifi. Even that Kubrick ending. I loved every inch of this! ★★★★★

21 Jun 2016

Transplant, revisited

A few months ago, I talked about the performance of calling Matlab from Python. Since then, I implemented a few optimizations that make working with Transplant a lot faster:


The workload consisted of generating a bunch of random numbers (not included in the times), and sending them to Matlab for computation. This task is entirely dominated by the time it takes to transfer the data to Matlab (see table at the end for intra-language benchmarks of the same task).

As you can see, the new Transplant is significantly faster for small workloads, and still a factor of two faster for larger amounts of data. It is now almost always a faster solution than the Matlab Engine for Python (MEfP) and Oct2Py. For very large datasets, Oct2Py might be preferable, though.

This improvement comes from three major changes: Matlab functions are now returned as callable objects instead of ad-hoc functions, Transplant now uses MsgPack instead of JSON, and loadlibrary instead of a Mex file to call into libzmq. All of these changes are entirely under the hood, though, and the public API remains unchanged.

The callable object thing is the big one for small workloads. The advantage is that the objects will only fetch documentation if __doc__ is actually asked for. As it turns out, running help('funcname') for every function call is kind of a big overhead.

Bigger workloads however are dominated by the time it takes Matlab to decode the data. String parsing is very slow in Matlab, which is a bad thing indeed if you're planning to read a couple hundred megabytes of JSON. Thus, I replaced JSON with MsgPack, which eliminates the parsing overhead almost entirely. JSON messaging is still available, though, if you pass msgformat='json' to the constructor. Edit: Additionally, binary data is no longer encoded as base64 strings, but passed directly through MsgPack. This yields about a ten-fold performance improvement, especially for larger data sets.

Lastly, I rewrote the ZeroMQ interaction to use loadlibrary instead of a Mex file. This has no impact on processing speed at all, but you don't have to worry about compiling that C code any more.

Oh, and Transplant now works on Windows!

Here is the above data again in tabular form:

Task New Transplant Old Transplant Oct2Py MEfP Matlab Numpy Octave
startup 4.8 s 5.8 s 11 ms 4.6 s      
sum(randn(1,1)) 3.36 ms 34.2 ms 29.6 ms 1.8 ms 9.6 μs 1.8 μs 6 μs
sum(randn(1,10)) 3.71 ms 35.8 ms 30.5 ms 1.8 ms 1.8 μs 1.8 μs 9 μs
sum(randn(1,100)) 3.27 ms 33.9 ms 29.5 ms 2.06 ms 2.2 μs 1.8 μs 9 μs
sum(randn(1,1000)) 4.26 ms 32.7 ms 30.6 ms 9.1 ms 4.1 μs 2.3 μs 12 μs
sum(randn(1,1e4)) 4.35 ms 34.5 ms 30 ms 72.2 ms 25 μs 5.8 μs 38 μs
sum(randn(1,1e5)) 5.45 ms 86.1 ms 31.2 ms 712 ms 55 μs 38.6 μs 280 μs
sum(randn(1,1e6)) 44.1 ms 874 ms 45.7 ms 7.21 s 430 μs 355 μs 2.2 ms
sum(randn(1,1e7)) 285 ms 10.6 s 643 ms 72 s 3.5 ms 5.04 ms 22 ms
15 Jun 2016

Teaching with Matlab Live Scripts

For a few years now, I have been teaching programming courses using notebooks. A notebook is an interactive document that can contain code, results, graphs, math, and prose. It is the perfect teaching tool:

You can combine introductory resources with application examples, assignments, and results. And after the lecture, students can refer to these notebooks at their leisure, and re-run example code, or try different approaches with known data.

The first time I saw this was with the Jupyter notebook (née IPython notebook). I immediately used it for teaching an introductory programming course in Python.

Later, I took over a Matlab course, but Matlab lacked a notebook. So for the next two years of teaching Matlab, I hacked up a small IPython extension that allowed me to run Matlab code in an Jupyter notebook as a cell magic.

Now, with 2016a, Matlab introduced Live Scripts, which is Mathworkian for notebook. This blog post is about how Live Scripts compare to Jupyter notebooks.

First off, Live Scripts work. The basic functionality is there: Code, prose, figures, and math can be saved in one document; The notebook can be exported as PDF and HTML, and Students can download the notebook and play with it. This latter part was not possible with my homegrown solution earlier.

However, Live Scripts are new, and still contain a number of bugs. You can't customize figure sizes, formatting options are very basic, image rendering is terrible, and math rendering using LaTeX is of poor quality and limited. Also, using Live Scripts on a retina Mac is borderline impossible: Matlab crashes on screen resolution changes (i.e. connecting a projector), Live Scripts render REALLY slowly (type a word, watch the characters crawl onto the screen one by one), and all figures export in twice their intended size (fixed in 2016b). You can work around some of these issues by starting Matlab in Low Resolution Mode.

No doubt some of these issues are going to get addressed in future releases. 2016b added script-local functions, which I read mostly as "you can now write functions in Live Scripts", and autocorrection-like text replacements that convert Markdown formatting into formatted text. This is highly appreciated.

Additionally though, here are a few features I would love to see:

Still, all griping aside, I want to reiterate that Live Scripts work. They aren't quite as nice as Jupyter notebooks, but they serve their purpose, and are a tremendously useful teaching tool.

31 May 2016

Matlab has an FFI and it is not Mex

Sometimes, you just have to use C code. There's no way around it. C is the lingua franca and bedrock of our computational world. Even in Matlab, sometimes, you just have to call into a C library.

So, you grab your towel, you bite the bullet, you strap into your K&R, and get down to it: You start writing a Mex file. And you curse, and you cry, because writing C is hard, and Mex doesn't exactly make it any better. But you know what? There is a better way!

Because, unbeknownst to many, Matlab includes a Foreign Function Interface. The technique was probably pioneered by Common Lisp, and has since been widely adopted everywhere: calling functions in a C library without writing any C code and without invoking a compiler!

Mind you, there remains a large and essential impedance mismatch between C's statically typed calling conventions and the vagaries of a dynamically typed language such as Matlab, and even the nicest FFI can't completely hide that fact. But anything is better than the abomination that is Mex.

So here goes, a very simple C library that adds two arrays:

// test.c:
#include test.h

void add_arrays(float *out, float* in, size_t length) {
    for (size_t n=0; n<length; n++) {
        out[n] += in[n];

// test.h:
#include <stddef.h>
void add_arrays(float *out, float *in, size_t length);

Let's compile it! gcc -shared -std=c99 -o test.c will do the trick.

Now, let's load that library into Matlab with loadlibrary:

if not(libisloaded('test'))
    [notfound, warnings] = loadlibrary('', 'test.h');
    assert(isempty(notfound), 'could not load test library')

Note that loadlibrary can't parse many things you would commonly find in header files, so you will likely have to strip them down to the bare essentials. Additionally, loadlibrary doesn't throw errors if it can't load a library, so we always have to check the notfound output argument to see if the library was actually loaded successfully.

With that, we can call functions in that library using calllib. But we can't just pass in Matlab vectors, that would be too easy. We first have to convert them to something C can understand: Pointers

vector1 = [1 2 3 4 5];
vector2 = [9 8 7 6 5];

vector1ptr = libpointer('singlePointer', vector1);
vector2ptr = libpointer('singlePointer', vector2);

What is nice about this is that this automatically converts the vectors from double to float. What is less nice is that it uses its weird singlePtr notation instead of the more canonical float* that you would expect from a self-respecting C header.

Then, finally, let's call our function:

calllib('test', 'add_arrays', vector1ptr, vector2ptr, length(vector1));

If you see no errors, everything went smoothly, and you will now have changed the content of vector1ptr, which we can have a look at like this:

added_vectors = vector1ptr.Value;

Note that this didn't change the contents of vector1, only of the newly created pointer. So there will always be some memory overhead to this technique in comparison to Mex files. However, runtime overhead seems pretty fine:

timeit(@() calllib('test', 'add_arrays', vector1ptr, vector2ptr, length(vector1)))
%   ans = 1.9155e-05
timeit(@() the_same_thing_but_as_a_mex_file(single(vector1), single(vector2)))
%   ans = 4.6262e-05
timeit(@() the_same_thing_plus_argument_conversion(vector1, vector2))
%   ans = 1.2326e-04

So as you can see, the calllib is plenty fast. However, if you add the Matlab code for converting the double arrays to pointers and extracting the summed data afterwards, the FFI is noticeably slower than a Mex file.

However, If I ask myself whether I would sacrifice 0.00007 seconds of computational overhead for hours of my life not spent with Mex, there really is no competition. I will choose Matlab's FFI over writing Mex files every time.

Older posts