Basti's Scratchpad on the Internet
24 Nov 2016

Pip is great

Installing Python packages used to be a pain. Very surprisingly, this is no longer the case. Nowadays, pip install whatever will reliably install pretty much anything without any trouble.

To recap, the problem is that many Python packages rely on C code, which needs to be compiled before installation. In the past, this burden was mostly on the user. Depending on the user's knowledge of C and compilers, and the user's operating system, this could become almost arbitrarily hairy.

This problem was solved, to some extent, using pre-compiled packages, first as binary installers on the package websites, then pre-packaged Python distributions, and later through third-party package managers such as conda.

This worked well, but it fractured the ecosystem into several different mostly-compatible package sources. This was no big problem, but users had to decide on a package-by-package basis whether to download an installer, use conda, or use pip.

But I am happy to announce that these dark days are behind us now. Pip now automatically installs binary Python packages called wheels, and most package developers do now provide wheels on PyPI. This is so obviously superior to the past that many high-profile packages don't even provide binary installers any more.

For me, this means I don't have to rely on conda any longer. I don't have to keep a mental list of which packages to install though conda and which packages to install through pip any longer. I can go back to using virtualenv instead of conda-envs1 again. I don't have to tell students to download pre-packaged Python distributions any longer. Pip is great!

Footnotes:

1

Not because I dislike conda, just to simplify my development environment.

22 Nov 2016

Soma

For a long time, I was afraid of picking up Soma, since it came from the developers who did Amnesia. Amnesia was the first game that scared me so thoroughly that I just couldn't bring myself to pick it up again after the first session. I admired it for its amazing world design, interactivity, and story, but it was too scary for me. I don't enjoy being scared.

I was thus of two minds when I heard about Soma: Soma was supposed to be less scary, and SciFi. But still, let me get this out of the way; This game scared me whitless. I had to use an online game guide to warn me of monsters and tell me how to evade them. I was too scared to figure this stuff out by myself, all alone in this creepy, nasty, underwater space station.

But at the same time, this game delivered one of the best stories I have ever seen in a video game. The story left me shaken and thoughtful for days, and the story was what dragged me back into the game even though I was really struggling with the monster sections. Even the delivery through conversations, environmental clues, and audio logs, was truly outstanding. I just love grimy Alien-style SciFi, and this is one of the very few video game stories that could fill more than a paragraph or two in a well-written book. I love it.

Apparently, if you like horror games, this is a very mild one. Boring even, if you can believe my favorite online critics. But believe me, it was right on the edge of what I was able to take. I still can't quite decide whether I would have preferred this game without the monsters, or whether the monsters were a necessary tool for evoking this intense sense of dread on the lonely SciFi ocean floor. Maybe the monsters lent some much-needed life and interactivity to the walk-em-up formula.

At any rate, I thoroughly enjoyed this experience, and can't wait for what this developer will create next. I wholeheartedly recommend this, as one of the best SciFi stories of this year, in any medium. ★★★★★

07 Sep 2016

MATLAB Syntax

In a recent project, I tried to parse MATLAB code. During this trying exercise, I stumbled upon a few… unique design decisions of the MATLAB language:

Use of apostrophes (')

Apostrophes can mean one of two things: If applied as a unary postfix operator, it means transpose. If used as a unary prefix operator, it marks the start of a string. While not a big problem for human readers, this makes code surprisingly hard to parse. The interesting bit about this, though, is the fact that there would have been a much easier way to do this: Why not use double quotation marks for strings, and apostrophes for transpose? The double quotation mark is never used in MATLAB, so this would have been a very easy choice.

Use of parens (())

If Parens follow a variable name, they can mean one of two things: If the variable is a function, the parens denote a function call. If the variable is anything else, this is an indexing operation. This can actually be very confusing to readers, since it makes it entirely unclear what kind of operation foo(5) will execute without knowledge about foo (which might not be available until runtime). Again, this could have been easily solved by using brackets ([]) for indexing, and parens (()) for function calls.

Use of braces ({}) and cell arrays

Cell arrays are multi-dimensional, ordered, heterogeneous collections of things. But in contrast to every other collection (structs, objects, maps, tables, matrices), they are not indexed using parens, but braces. Why? I don't know. In fact, you can index cell arrays using parens, but this only yields a new cell array with only one value. Why would this ever be useful? I have no explanation. This constantly leads to errors, and for the life of me I can not think of a reason for this behavior.

Use of line breaks

In MATLAB, your line can end on one of three characters: A newline character, a semicolon (;), and a comma (,). As we all know, the semicolon suppresses program output, while the newline character does not. The comma ends the logical line, but does not suppress program output. This is a relatively little-known feature, so I thought it would be useful to share it. Except, the meaning of ; and , changes in literals (like [1, 2; 3, 4] or {'a', 'b'; 3, 4}). Here, commas separate values on the same row and are optional, and semicolons end the current row. Interestingly, literals also change the meaning of the newline character: Inside a literal, a newline acts just like a semicolon, overrides a preceding comma, and you don't have to use ellipsis (...) for line continuations.

Syntax rules for commands

Commands are function calls without the parenthesis, like help disp, which is syntactically equivalent to help('disp'). You see, if you just specify a function name (can't be a compound expression or a function handle), and don't use parenthesis, all following words will be interpreted as strings, and passed to the function. This is actually kind of a neat feature. However, how do you differentiate between variable_name + 5 and help + 5? The answer is: Commands are actually a bit more complex. A command starts with a function name, followed by a space, which is not followed by an operator and a space. Thus, help +5 + 4 is a command, while help + 5 + 4 is an addition. Tricky!

The more-than-one-value value

If you want to save more than one value in a variable, you can use a collection (structs, matrices, maps, tables, cell arrays). In addition though, MATLAB knows another way of handling more than one values at once: The thing you get when you index a cell array with {:} or assign a function call with more than one result. In that case, you get something that is assignable to several variables, but that is not itself a collection. Just another quirk of MATLAB's indexing logic. However, you can capture these values into matrices or cell arrays using brackets or braces, like this: {x{:}} or [x{:}]. Note that this also works in assignments in a confusing way: [z{:}] = x{:} (if both x and z have the same length). Incidentally, this is often a neat way of converting between different kinds of collections (but utterly unreadable, because type information is hopelessly lost).

26 Jun 2016

P⋅O⋅L⋅L⋅E⋅N and Walk 'em ups

I like boring games. I like games that give me time to think. Like flight simulators, truck simulators, history simulators, and (don't call them walking simulators) walk 'em ups. And this weekend, this year's Steam Summer Sale started, and thus it was time to get some gaming done!

First, I played Everybody's Gone to the Rapture, a walk 'em up that got good reviews, and was especially praised for it's story. And it dutifully enraptured me, with its British landscapes, and personal story lines. But somehow it didn't quite connect. Maybe I'm not British enough, and I certainly didn't get all of the story. Still, this is well worth picking up, and visiting a digital Shropshire after just having visited the real one earlier this year was a real treat!

Second, I played P⋅O⋅L⋅L⋅E⋅N. It did not start well: The download took ages, I had to unplug all that fancy flight simulation gear to get it to recognize my controller, and performance was rather lackluster. Oh, and reviews were rather mediocre, too. (Also, here's a free tip for you if you games developers: putting fancy Unicode characters into your game title does not improve SEO).

But, I entirely fell in love with P⋅O⋅L⋅L⋅E⋅N! You arrive on Titan in your dinky near-future space capsule, take a short walk on the surface of the moon, and then explore the local space station. Of course, something went wrong, and the space station is deserted. Oh, and there is something about space bees.

What really worked for me here was the environmental story telling. This is a space station much in the tradition of 2001: A Space Odyssey and Alien. It's that perfect utilitarian and inhumanely sterile work place that human corporation always strive for, and human employees always mess up. The stark industrial bleakness of these space corridors is the perfect backdrop to make those little human irregularities really stand out.

There is no conflict, only light puzzles, and the story is developed only through voice diaries and environmental clutter. But in contrast to Rapture, it all makes sense. The futurist setting, the deep humanity of the people dealing with it. The creepiness of machines without operators. The struggle of the scientists and engineers against the forces of nature and greedy corporations. The nods —nay— genuflections to everything that has ever been great in Scifi. Even that Kubrick ending. I loved every inch of this! ★★★★★

21 Jun 2016

Transplant, revisited

A few months ago, I talked about the performance of calling Matlab from Python. Since then, I implemented a few optimizations that make working with Transplant a lot faster:

timings.png

The workload consisted of generating a bunch of random numbers (not included in the times), and sending them to Matlab for computation. This task is entirely dominated by the time it takes to transfer the data to Matlab (see table at the end for intra-language benchmarks of the same task).

As you can see, the new Transplant is significantly faster for small workloads, and still a factor of two faster for larger amounts of data. It is now almost always a faster solution than the Matlab Engine for Python (MEfP) and Oct2Py. For very large datasets, Oct2Py might be preferable, though.

This improvement comes from three major changes: Matlab functions are now returned as callable objects instead of ad-hoc functions, Transplant now uses MsgPack instead of JSON, and loadlibrary instead of a Mex file to call into libzmq. All of these changes are entirely under the hood, though, and the public API remains unchanged.

The callable object thing is the big one for small workloads. The advantage is that the objects will only fetch documentation if __doc__ is actually asked for. As it turns out, running help('funcname') for every function call is kind of a big overhead.

Bigger workloads however are dominated by the time it takes Matlab to decode the data. String parsing is very slow in Matlab, which is a bad thing indeed if you're planning to read a couple hundred megabytes of JSON. Thus, I replaced JSON with MsgPack, which eliminates the parsing overhead almost entirely. JSON messaging is still available, though, if you pass msgformat='json' to the constructor. Edit: Additionally, binary data is no longer encoded as base64 strings, but passed directly through MsgPack. This yields about a ten-fold performance improvement, especially for larger data sets.

Lastly, I rewrote the ZeroMQ interaction to use loadlibrary instead of a Mex file. This has no impact on processing speed at all, but you don't have to worry about compiling that C code any more.

Oh, and Transplant now works on Windows!

Here is the above data again in tabular form:

Task New Transplant Old Transplant Oct2Py MEfP Matlab Numpy Octave
startup 4.8 s 5.8 s 11 ms 4.6 s      
sum(randn(1,1)) 3.36 ms 34.2 ms 29.6 ms 1.8 ms 9.6 μs 1.8 μs 6 μs
sum(randn(1,10)) 3.71 ms 35.8 ms 30.5 ms 1.8 ms 1.8 μs 1.8 μs 9 μs
sum(randn(1,100)) 3.27 ms 33.9 ms 29.5 ms 2.06 ms 2.2 μs 1.8 μs 9 μs
sum(randn(1,1000)) 4.26 ms 32.7 ms 30.6 ms 9.1 ms 4.1 μs 2.3 μs 12 μs
sum(randn(1,1e4)) 4.35 ms 34.5 ms 30 ms 72.2 ms 25 μs 5.8 μs 38 μs
sum(randn(1,1e5)) 5.45 ms 86.1 ms 31.2 ms 712 ms 55 μs 38.6 μs 280 μs
sum(randn(1,1e6)) 44.1 ms 874 ms 45.7 ms 7.21 s 430 μs 355 μs 2.2 ms
sum(randn(1,1e7)) 285 ms 10.6 s 643 ms 72 s 3.5 ms 5.04 ms 22 ms
Older posts