Basti's Scratchpad on the Internet
26 Jun 2016

P⋅O⋅L⋅L⋅E⋅N and Walk 'em ups

I like boring games. I like games that give me time to think. Like flight simulators, truck simulators, history simulators, and (don't call them walking simulators) walk 'em ups. And this weekend, this year's Steam Summer Sale started, and thus it was time to get some gaming done!

First, I played Everybody's Gone to the Rapture, a walk 'em up that got good reviews, and was especially praised for it's story. And it dutifully enraptured me, with its British landscapes, and personal story lines. But somehow it didn't quite connect. Maybe I'm not British enough, and I certainly didn't get all of the story. Still, this is well worth picking up, and visiting a digital Shropshire after just having visited the real one earlier this year was a real treat!

Second, I played P⋅O⋅L⋅L⋅E⋅N. It did not start well: The download took ages, I had to unplug all that fancy flight simulation gear to get it to recognize my controller, and performance was rather lackluster. Oh, and reviews were rather mediocre, too. (Also, here's a free tip for you if you games developers: putting fancy Unicode characters into your game title does not improve SEO).

But, I entirely fell in love with P⋅O⋅L⋅L⋅E⋅N! You arrive on Titan in your dinky near-future space capsule, take a short walk on the surface of the moon, and then explore the local space station. Of course, something went wrong, and the space station is deserted. Oh, and there is something about space bees.

What really worked for me here was the environmental story telling. This is a space station much in the tradition of 2001: A Space Odyssey and Alien. It's that perfect utilitarian and inhumanely sterile work place that human corporation always strive for, and human employees always mess up. The stark industrial bleakness of these space corridors is the perfect backdrop to make those little human irregularities really stand out.

There is no conflict, only light puzzles, and the story is developed only through voice diaries and environmental clutter. But in contrast to Rapture, it all makes sense. The futurist setting, the deep humanity of the people dealing with it. The creepiness of machines without operators. The struggle of the scientists and engineers against the forces of nature and greedy corporations. The nods —nay— genuflections to everything that has ever been great in Scifi. Even that Kubrick ending. I loved every inch of this! ★★★★★

21 Jun 2016

Transplant, revisited

A few months ago, I talked about the performance of calling Matlab from Python. Since then, I implemented a few optimizations that make working with Transplant a lot faster:

timings.png

The workload consisted of generating a bunch of random numbers (not included in the times), and sending them to Matlab for computation. This task is entirely dominated by the time it takes to transfer the data to Matlab (see table at the end for intra-language benchmarks of the same task).

As you can see, the new Transplant is significantly faster for small workloads, and still a factor of two faster for larger amounts of data. It is now almost always a faster solution than the Matlab Engine for Python (MEfP) and Oct2Py. For very large datasets, Oct2Py might be preferable, though.

This improvement comes from three major changes: Matlab functions are now returned as callable objects instead of ad-hoc functions, Transplant now uses MsgPack instead of JSON, and loadlibrary instead of a Mex file to call into libzmq. All of these changes are entirely under the hood, though, and the public API remains unchanged.

The callable object thing is the big one for small workloads. The advantage is that the objects will only fetch documentation if __doc__ is actually asked for. As it turns out, running help('funcname') for every function call is kind of a big overhead.

Bigger workloads however are dominated by the time it takes Matlab to decode the data. String parsing is very slow in Matlab, which is a bad thing indeed if you're planning to read a couple hundred megabytes of JSON. Thus, I replaced JSON with MsgPack, which eliminates the parsing overhead almost entirely. JSON messaging is still available, though, if you pass msgformat='json' to the constructor. Edit: Additionally, binary data is no longer encoded as base64 strings, but passed directly through MsgPack. This yields about a ten-fold performance improvement, especially for larger data sets.

Lastly, I rewrote the ZeroMQ interaction to use loadlibrary instead of a Mex file. This has no impact on processing speed at all, but you don't have to worry about compiling that C code any more.

Oh, and Transplant now works on Windows!

Here is the above data again in tabular form:

Task New Transplant Old Transplant Oct2Py MEfP Matlab Numpy Octave
startup 4.8 s 5.8 s 11 ms 4.6 s      
sum(randn(1,1)) 3.36 ms 34.2 ms 29.6 ms 1.8 ms 9.6 μs 1.8 μs 6 μs
sum(randn(1,10)) 3.71 ms 35.8 ms 30.5 ms 1.8 ms 1.8 μs 1.8 μs 9 μs
sum(randn(1,100)) 3.27 ms 33.9 ms 29.5 ms 2.06 ms 2.2 μs 1.8 μs 9 μs
sum(randn(1,1000)) 4.26 ms 32.7 ms 30.6 ms 9.1 ms 4.1 μs 2.3 μs 12 μs
sum(randn(1,1e4)) 4.35 ms 34.5 ms 30 ms 72.2 ms 25 μs 5.8 μs 38 μs
sum(randn(1,1e5)) 5.45 ms 86.1 ms 31.2 ms 712 ms 55 μs 38.6 μs 280 μs
sum(randn(1,1e6)) 44.1 ms 874 ms 45.7 ms 7.21 s 430 μs 355 μs 2.2 ms
sum(randn(1,1e7)) 285 ms 10.6 s 643 ms 72 s 3.5 ms 5.04 ms 22 ms
15 Jun 2016

Teaching with Matlab Live Scripts

For a few years now, I have been teaching programming courses using notebooks. A notebook is an interactive document that can contain code, results, graphs, math, and prose. It is the perfect teaching tool:

You can combine introductory resources with application examples, assignments, and results. And after the lecture, students can refer to these notebooks at their leisure, and re-run example code, or try different approaches with known data.

The first time I saw this was with the Jupyter notebook (née IPython notebook). I immediately used it for teaching an introductory programming course in Python.

Later, I took over a Matlab course, but Matlab lacked a notebook. So for the next two years of teaching Matlab, I hacked up a small IPython extension that allowed me to run Matlab code in an Jupyter notebook as a cell magic.

Now, with 2016a, Matlab introduced Live Scripts, which is Mathworkian for notebook. This blog post is about how Live Scripts compare to Jupyter notebooks.

First off, Live Scripts work. The basic functionality is there: Code, prose, figures, and math can be saved in one document; The notebook can be exported as PDF and HTML, and Students can download the notebook and play with it. This latter part was not possible with my homegrown solution earlier.

However, Live Scripts are new, and still contain a number of bugs. You can't customize figure sizes, formatting options are very basic, image rendering is terrible, and math rendering using LaTeX is of poor quality and limited. Also, using Live Scripts on a retina Mac is borderline impossible: Matlab crashes on screen resolution changes (i.e. connecting a projector), Live Scripts render REALLY slowly (type a word, watch the characters crawl onto the screen one by one), and all figures export in twice their intended size (fixed in 2016b). You can work around some of these issues by starting Matlab in Low Resolution Mode.

No doubt some of these issues are going to get addressed in future releases. 2016b added script-local functions, which I read mostly as "you can now write functions in Live Scripts", and autocorrection-like text replacements that convert Markdown formatting into formatted text. This is highly appreciated.

Additionally though, here are a few features I would love to see:

Still, all griping aside, I want to reiterate that Live Scripts work. They aren't quite as nice as Jupyter notebooks, but they serve their purpose, and are a tremendously useful teaching tool.

31 May 2016

Matlab has an FFI and it is not Mex

Sometimes, you just have to use C code. There's no way around it. C is the lingua franca and bedrock of our computational world. Even in Matlab, sometimes, you just have to call into a C library.

So, you grab your towel, you bite the bullet, you strap into your K&R, and get down to it: You start writing a Mex file. And you curse, and you cry, because writing C is hard, and Mex doesn't exactly make it any better. But you know what? There is a better way!

Because, unbeknownst to many, Matlab includes a Foreign Function Interface. The technique was probably pioneered by Common Lisp, and has since been widely adopted everywhere: calling functions in a C library without writing any C code and without invoking a compiler!

Mind you, there remains a large and essential impedance mismatch between C's statically typed calling conventions and the vagaries of a dynamically typed language such as Matlab, and even the nicest FFI can't completely hide that fact. But anything is better than the abomination that is Mex.

So here goes, a very simple C library that adds two arrays:

// test.c:
#include test.h

void add_arrays(float *out, float* in, size_t length) {
    for (size_t n=0; n<length; n++) {
        out[n] += in[n];
    }
}

// test.h:
#include <stddef.h>
void add_arrays(float *out, float *in, size_t length);

Let's compile it! gcc -shared -std=c99 -o test.so test.c will do the trick.

Now, let's load that library into Matlab with loadlibrary:

if not(libisloaded('test'))
    [notfound, warnings] = loadlibrary('test.so', 'test.h');
    assert(isempty(notfound), 'could not load test library')
end

Note that loadlibrary can't parse many things you would commonly find in header files, so you will likely have to strip them down to the bare essentials. Additionally, loadlibrary doesn't throw errors if it can't load a library, so we always have to check the notfound output argument to see if the library was actually loaded successfully.

With that, we can call functions in that library using calllib. But we can't just pass in Matlab vectors, that would be too easy. We first have to convert them to something C can understand: Pointers

vector1 = [1 2 3 4 5];
vector2 = [9 8 7 6 5];

vector1ptr = libpointer('singlePointer', vector1);
vector2ptr = libpointer('singlePointer', vector2);

What is nice about this is that this automatically converts the vectors from double to float. What is less nice is that it uses its weird singlePtr notation instead of the more canonical float* that you would expect from a self-respecting C header.

Then, finally, let's call our function:

calllib('test', 'add_arrays', vector1ptr, vector2ptr, length(vector1));

If you see no errors, everything went smoothly, and you will now have changed the content of vector1ptr, which we can have a look at like this:

added_vectors = vector1ptr.Value;

Note that this didn't change the contents of vector1, only of the newly created pointer. So there will always be some memory overhead to this technique in comparison to Mex files. However, runtime overhead seems pretty fine:

timeit(@() calllib('test', 'add_arrays', vector1ptr, vector2ptr, length(vector1)))
%   ans = 1.9155e-05
timeit(@() the_same_thing_but_as_a_mex_file(single(vector1), single(vector2)))
%   ans = 4.6262e-05
timeit(@() the_same_thing_plus_argument_conversion(vector1, vector2))
%   ans = 1.2326e-04

So as you can see, the calllib is plenty fast. However, if you add the Matlab code for converting the double arrays to pointers and extracting the summed data afterwards, the FFI is noticeably slower than a Mex file.

However, If I ask myself whether I would sacrifice 0.00007 seconds of computational overhead for hours of my life not spent with Mex, there really is no competition. I will choose Matlab's FFI over writing Mex files every time.

30 May 2016

Updating the Matplotlib Font Cache

When publishing papers or articles, I want my plots to integrate with the text surrounding them. I want them to use the correct font size, and the correct font.

This is easy to do with Matplotlib:

import matplotlib
matplotlib.rcParams['font.size'] = 12
matplotlib.rcParams['font.family'] = 'Calibri'

However, sometimes, Matplotlib won't find the correct, even though it is clearly installed. This happens when Matplotlib's internal font cache is out of date.

To refresh the font cache, use

matplotlib.font_manager._rebuild()

Happy Plotting!

Older posts