Posts tagged "matlab":

10 Apr 2018

Appending to Matlab Arrays

The variable $var appears to change size on every loop iteration. Consider preallocating for speed.

So sayeth Matlab.

Let's try it:

x_prealloc = cell(10000, 1);
x_end = {};
x_append = {};
for n=1:10000
    % variant 1: preallocate
    x_prealloc(n) = {42};
    % variant 2: end+1
    x_end(end+1) = {42};
    % variant 3: append
    x_append = [x_append {42}];
end

Which variant do you think is fastest?

Unsurprisingly, preallocation is indeed faster than growing an array. What is surprising is that it is faster by a constant factor of about 2 instead of scaling with the array length. Only appending by x = [x {42}] actually becomes slower for larger arrays. (The same thing happens for numerical arrays, struct arrays, and object arrays.)

TL;DR: Do not use x = [x $something], ever. Instead, use x(end+1) = $something. Preallocation is generally overrated.

Tags: matlab

12 Apr 2017

Matlab Metaprogramming

Why is it, that I find Matlab to be a fine teaching tool and a fine tool for solving engineering problems, but at the same time, extremely cumbersome for my own work? Recently, the answer struck me: metaprogramming in Matlab sucks.

Matlab is marketed as a tool for engineers to solve engineering problems. There are convenient data structures for numerical data (arrays, tables), less convenient data structures for non-numeric data (cells, structs, chars), and a host of expensive but powerful functions and methods for working with this kind of data. This is the happy path.

But don't stray too far from the happy path; horrors lurk where The Mathworks don't dare going. Basic stuff like talking to sockets or interacting with other programs is very cumbersome in Matlab, and sometimes even downright impossible for the lack of threads, pipes, and similar infrastructure. But this is common knowledge, and consistent with Matlab's goals as an engineering tool, not a general purpose programming language. These are first-order problems, and they are rarely insurmountable.

The more insidious problem is metaprogramming, i.e. when the objects of your code are code objects themselves. The first order use of programming is to solve real-world problems. If these problems are numeric in nature, Matlab has got you covered. But as every programmer discovers at some point, the second order use of programming is to solve programming problems. And by golly, will Matlab let you down when you try that!

As soon as you climb that ladder of abstraction, and the objects of your code become code objects themselves, you will enter weird country. You think exist will tell you whether a variable name is taken? Try calling it on a method. You think nargout will always give you a number? Again, methods will enlighten you. Quick, how do you capture all output arguments of a function call in a variable? x = cell(nargout(fun), 1); [x{:}] = fun(...), obviously (this sometimes fails). And don't even think of trying to overload subsref to create something generic. Those subsref semantics are crazy talk!

I could go on. The real power of code is abstraction. We use programming to repeatedly and reliably solve similar problems. The logical next step is to use those same programming tools to solve similar kinds of problems. This happens to all of us, and Matlab makes it extremely hard to deal with. Thus, it puts a ceiling on the level of abstraction that is reasonably achievable, and limits engineers to first-order solutions. And after a few years of acclimatization, it will put that same ceiling on those engineers' thinking, because you can't reason about what you can't express.

Tags: matlab

07 Sep 2016

MATLAB Syntax

In a recent project, I tried to parse MATLAB code. During this trying exercise, I stumbled upon a few… unique design decisions of the MATLAB language:

Use of apostrophes (`'`)

Apostrophes can mean one of two things: If applied as a unary postfix operator, it means transpose. If used as a unary prefix operator, it marks the start of a string. While not a big problem for human readers, this makes code surprisingly hard to parse. The interesting bit about this, though, is the fact that there would have been a much easier way to do this: Why not use double quotation marks for strings, and apostrophes for transpose? The double quotation mark is never used in MATLAB, so this would have been a very easy choice.

Use of parens (`()`)

If Parens follow a variable name, they can mean one of two things: If the variable is a function, the parens denote a function call. If the variable is anything else, this is an indexing operation. This can actually be very confusing to readers, since it makes it entirely unclear what kind of operation foo(5) will execute without knowledge about foo (which might not be available until runtime). Again, this could have been easily solved by using brackets ([]) for indexing, and parens (()) for function calls.

Use of braces (`{}`) and cell arrays

Cell arrays are multi-dimensional, ordered, heterogeneous collections of things. But in contrast to every other collection (structs, objects, maps, tables, matrices), they are not indexed using parens, but braces. Why? I don't know. In fact, you can index cell arrays using parens, but this only yields a new cell array with only one value. Why would this ever be useful? I have no explanation. This constantly leads to errors, and for the life of me I can not think of a reason for this behavior.

Use of line breaks

In MATLAB, your line can end on one of three characters: A newline character, a semicolon (;), and a comma (,). As we all know, the semicolon suppresses program output, while the newline character does not. The comma ends the logical line, but does not suppress program output. This is a relatively little-known feature, so I thought it would be useful to share it. Except, the meaning of ; and , changes in literals (like [1, 2; 3, 4] or {'a', 'b'; 3, 4}). Here, commas separate values on the same row and are optional, and semicolons end the current row. Interestingly, literals also change the meaning of the newline character: Inside a literal, a newline acts just like a semicolon, overrides a preceding comma, and you don't have to use ellipsis (...) for line continuations.

Syntax rules for commands

Commands are function calls without the parenthesis, like help disp, which is syntactically equivalent to help('disp'). You see, if you just specify a function name (can't be a compound expression or a function handle), and don't use parenthesis, all following words will be interpreted as strings, and passed to the function. This is actually kind of a neat feature. However, how do you differentiate between variable_name + 5 and help + 5? The answer is: Commands are actually a bit more complex. A command starts with a function name, followed by a space, which is not followed by an operator and a space. Thus, help +5 + 4 is a command, while help + 5 + 4 is an addition. Tricky!

The more-than-one-value value

If you want to save more than one value in a variable, you can use a collection (structs, matrices, maps, tables, cell arrays). In addition though, MATLAB knows another way of handling more than one values at once: The thing you get when you index a cell array with {:} or assign a function call with more than one result. In that case, you get something that is assignable to several variables, but that is not itself a collection. Just another quirk of MATLAB's indexing logic. However, you can capture these values into matrices or cell arrays using brackets or braces, like this: {x{:}} or [x{:}]. Note that this also works in assignments in a confusing way: [z{:}] = x{:} (if both x and z have the same length). Incidentally, this is often a neat way of converting between different kinds of collections (but utterly unreadable, because type information is hopelessly lost).

Tags: matlab

21 Jun 2016

Transplant, revisited

A few months ago, I talked about the performance of calling Matlab from Python. Since then, I implemented a few optimizations that make working with Transplant a lot faster:

The workload consisted of generating a bunch of random numbers (not included in the times), and sending them to Matlab for computation. This task is entirely dominated by the time it takes to transfer the data to Matlab (see table at the end for intra-language benchmarks of the same task).

As you can see, the new Transplant is significantly faster ~~for small workloads, and still a factor of two faster for larger amounts of data~~. It is now almost always a faster solution than the Matlab Engine for Python (MEfP) and Oct2Py. ~~For very large datasets, Oct2Py might be preferable, though.~~

This improvement comes from three major changes: Matlab functions are now returned as callable objects instead of ad-hoc functions, Transplant now uses MsgPack instead of JSON, and loadlibrary instead of a Mex file to call into libzmq. All of these changes are entirely under the hood, though, and the public API remains unchanged.

The callable object thing is the big one for small workloads. The advantage is that the objects will only fetch documentation if __doc__ is actually asked for. As it turns out, running help('funcname') for every function call is kind of a big overhead.

Bigger workloads however are dominated by the time it takes Matlab to decode the data. String parsing is very slow in Matlab, which is a bad thing indeed if you're planning to read a couple hundred megabytes of JSON. Thus, I replaced JSON with MsgPack, which eliminates the parsing overhead almost entirely. JSON messaging is still available, though, if you pass msgformat='json' to the constructor. Edit: Additionally, binary data is no longer encoded as base64 strings, but passed directly through MsgPack. This yields about a ten-fold performance improvement, especially for larger data sets.

Lastly, I rewrote the ZeroMQ interaction to use loadlibrary instead of a Mex file. This has no impact on processing speed at all, but you don't have to worry about compiling that C code any more.

Oh, and Transplant now works on Windows!

Here is the above data again in tabular form:

Task	New Transplant	Old Transplant	Oct2Py	MEfP	Matlab	Numpy	Octave
startup	4.8 s	5.8 s	11 ms	4.6 s
sum(randn(1,1))	3.36 ms	34.2 ms	29.6 ms	1.8 ms	9.6 μs	1.8 μs	6 μs
sum(randn(1,10))	3.71 ms	35.8 ms	30.5 ms	1.8 ms	1.8 μs	1.8 μs	9 μs
sum(randn(1,100))	3.27 ms	33.9 ms	29.5 ms	2.06 ms	2.2 μs	1.8 μs	9 μs
sum(randn(1,1000))	4.26 ms	32.7 ms	30.6 ms	9.1 ms	4.1 μs	2.3 μs	12 μs
sum(randn(1,1e4))	4.35 ms	34.5 ms	30 ms	72.2 ms	25 μs	5.8 μs	38 μs
sum(randn(1,1e5))	5.45 ms	86.1 ms	31.2 ms	712 ms	55 μs	38.6 μs	280 μs
sum(randn(1,1e6))	44.1 ms	874 ms	45.7 ms	7.21 s	430 μs	355 μs	2.2 ms
sum(randn(1,1e7))	285 ms	10.6 s	643 ms	72 s	3.5 ms	5.04 ms	22 ms

Tags: matlab python

15 Jun 2016

Teaching with Matlab Live Scripts

For a few years now, I have been teaching programming courses using notebooks. A notebook is an interactive document that can contain code, results, graphs, math, and prose. It is the perfect teaching tool:

You can combine introductory resources with application examples, assignments, and results. And after the lecture, students can refer to these notebooks at their leisure, and re-run example code, or try different approaches with known data.

The first time I saw this was with the Jupyter notebook (née IPython notebook). I immediately used it for teaching an introductory programming course in Python.

Later, I took over a Matlab course, but Matlab lacked a notebook. So for the next two years of teaching Matlab, I hacked up a small IPython extension that allowed me to run Matlab code in an Jupyter notebook as a cell magic.

Now, with 2016a, Matlab introduced Live Scripts, which is Mathworkian for notebook. This blog post is about how Live Scripts compare to Jupyter notebooks.

First off, Live Scripts work. The basic functionality is there: Code, prose, figures, and math can be saved in one document; The notebook can be exported as PDF and HTML, and Students can download the notebook and play with it. This latter part was not possible with my homegrown solution earlier.

However, Live Scripts are new, and still contain a number of bugs. You can't customize figure sizes, formatting options are very basic, image rendering is terrible, and math rendering using LaTeX is of poor quality and limited. Also, using Live Scripts on a retina Mac is borderline impossible: Matlab crashes on screen resolution changes (i.e. connecting a projector), Live Scripts render REALLY slowly (type a word, watch the characters crawl onto the screen one by one), and all figures export in twice their intended size (fixed in 2016b). You can work around some of these issues by starting Matlab in Low Resolution Mode.

No doubt some of these issues are going to get addressed in future releases. 2016b added script-local functions, which I read mostly as "you can now write functions in Live Scripts", and autocorrection-like text replacements that convert Markdown formatting into formatted text. This is highly appreciated.

Additionally though, here are a few features I would love to see:

Nested lists, and lists entries that contain newlines (i.e. differentiate between line breaks and paragraph breaks).
Indented text, for quoting things, or to work around the lack of multi-line list entries.
More headline levels.
Magics. This is probably a long shot, but line/cell magics in Jupyter notebooks are really useful.

Still, all griping aside, I want to reiterate that Live Scripts work. They aren't quite as nice as Jupyter notebooks, but they serve their purpose, and are a tremendously useful teaching tool.

Tags: matlab

31 May 2016

Matlab has an FFI and it is not Mex

Sometimes, you just have to use C code. There's no way around it. C is the lingua franca and bedrock of our computational world. Even in Matlab, sometimes, you just have to call into a C library.

So, you grab your towel, you bite the bullet, you strap into your K&R, and get down to it: You start writing a Mex file. And you curse, and you cry, because writing C is hard, and Mex doesn't exactly make it any better. But you know what? There is a better way!

Because, unbeknownst to many, Matlab includes a Foreign Function Interface. The technique was probably pioneered by Common Lisp, and has since been widely adopted everywhere: calling functions in a C library without writing any C code and without invoking a compiler!

Mind you, there remains a large and essential impedance mismatch between C's statically typed calling conventions and the vagaries of a dynamically typed language such as Matlab, and even the nicest FFI can't completely hide that fact. But anything is better than the abomination that is Mex.

So here goes, a very simple C library that adds two arrays:

// test.c:
#include test.h

void add_arrays(float *out, float* in, size_t length) {
    for (size_t n=0; n<length; n++) {
        out[n] += in[n];
    }
}

// test.h:
#include <stddef.h>
void add_arrays(float *out, float *in, size_t length);

Let's compile it! gcc -shared -std=c99 -o test.so test.c will do the trick.

Now, let's load that library into Matlab with loadlibrary:

if not(libisloaded('test'))
    [notfound, warnings] = loadlibrary('test.so', 'test.h');
    assert(isempty(notfound), 'could not load test library')
end

Note that loadlibrary can't parse many things you would commonly find in header files, so you will likely have to strip them down to the bare essentials. Additionally, loadlibrary doesn't throw errors if it can't load a library, so we always have to check the notfound output argument to see if the library was actually loaded successfully.

With that, we can call functions in that library using calllib. But we can't just pass in Matlab vectors, that would be too easy. We first have to convert them to something C can understand: Pointers

vector1 = [1 2 3 4 5];
vector2 = [9 8 7 6 5];

vector1ptr = libpointer('singlePointer', vector1);
vector2ptr = libpointer('singlePointer', vector2);

What is nice about this is that this automatically converts the vectors from double to float. What is less nice is that it uses its weird singlePtr notation instead of the more canonical float* that you would expect from a self-respecting C header.

Then, finally, let's call our function:

calllib('test', 'add_arrays', vector1ptr, vector2ptr, length(vector1));

If you see no errors, everything went smoothly, and you will now have changed the content of vector1ptr, which we can have a look at like this:

added_vectors = vector1ptr.Value;

Note that this didn't change the contents of vector1, only of the newly created pointer. So there will always be some memory overhead to this technique in comparison to Mex files. However, runtime overhead seems pretty fine:

timeit(@() calllib('test', 'add_arrays', vector1ptr, vector2ptr, length(vector1)))
%   ans = 1.9155e-05
timeit(@() the_same_thing_but_as_a_mex_file(single(vector1), single(vector2)))
%   ans = 4.6262e-05
timeit(@() the_same_thing_plus_argument_conversion(vector1, vector2))
%   ans = 1.2326e-04

So as you can see, the calllib is plenty fast. However, if you add the Matlab code for converting the double arrays to pointers and extracting the summed data afterwards, the FFI is noticeably slower than a Mex file.

However, If I ask myself whether I would sacrifice 0.00007 seconds of computational overhead for hours of my life not spent with Mex, there really is no competition. I will choose Matlab's FFI over writing Mex files every time.