Basti's Scratchpad on the Internet
19 Sep 2017

Multi-Font Themes in Emacs

Traditionally, text editor themes are all about colors, right? In programming, we use color to tell variables from type declarations, comments, or strings. However, any other text document uses typography instead of color to distinguish between headlines, list items, and keywords. I think that our current approach to highlighting code is misguided.

I think that color themes are an accident of history. Terminal text editors are unable to switch fonts. All they can do is switch colors, so colors is all we use. But in todays graphical world, we are no longer bound by the shackles of terminals, and Emacs can switch fonts just as easily as colors1.

It all started about a year ago, when I fell in love with a font called PragmataPro. One of the coolest features of Pragmata was its native bold and italic letters, and its wide support for unicode symbols. I had to find a use for these features! And so, down the rabbit hole I went. Keywords should be bold! # Comments should be italic! And while we're at it, why not add some underlines? And so on.

The logical next step was then to get rid of colors altogether. At first, as an experiment. Do I really need colors in code? The very pretty eink theme seemed to claim otherwise. After a few months of this lunacy, I realized that while I didn't strictly need colors, the stylistic variations of just one font aren't quite sufficient for source code. In particular, it wasn't always easy to distinguish between italic and roman type in PragmataPro, which lead to some confusion.

But then, inspiration hit me: Who says that I could only use one single font? No one!

color%20theme.png

The tricky bit is to find fonts that work well together. In this example, I'm using PragmataPro for all regular code, Iosevka Slab2 for strings, and oblique3 Iosevka for comments. Ubuntu Mono and InputCompressed work well, too. You can find my current theme on Github. The only downside is that while these fonts share the character width, the heights differ slightly, which sometimes leads to uneven line heights.

Still, I love the look of this! (Of course it won't work in a terminal, or most other text editors.)

Footnotes:

1

: Can other text editors do this? I honestly have no idea.

2

: I just love the look of the slab-serif characters in Iosevka! Look at the beautiful "}" in the screenshot!

3

: not to be confused with italic, which changes glyphs in addition to slanting.

01 Sep 2017

EuroScipy 2017

The second conference I attended this year was the EuroScipy 2017 in Erlangen. I gave a Talk about Audio in Python and a Lightning Talk about my Python/Matlab bridge.

My most striking impression of EuroScipy is that every person I talked to was working on something interesting, and could talk about his/her topic clearly and with enthusiasm. This mirrors my feelings from last year's Chaos Communication Congress, where the short scientific section stood out for its clarity and passion. I also enjoyed the fact that attendees were international and diverse, and exuded a heart-warming sense of community.

Even though each scientific discipline has their own data sets, features, and models, everyone seemed to use a common set of methods (statistics, signal processing, machine learning) for working with that data. And, absolutely everybody used Jupyter Notebooks for tutorials and teaching, and almost all data analyses were done in Pandas. This is particularly heartening since these technologies are geared towards reproducible research and open data.

The hot topic of the conference clearly was machine learning and neural networks. However, the current confusion of competing frameworks and network architectures does not seem to be a good long-term solution. I hope that this ecosystem will eventually reach its NumPy moment, and collapse into a single, unified package. Then, neural networks might find their place as just another machine learning method with a few reusable parametrized prototype implementations in scikit-learn. Tools like Keras look like good steps towards this goal.

Finally, there was a lot of talk about “the reproducibility crisis” in science, and possible steps to improve the scientific process. In particular, I learned that it is absolutely necessary to not look at your test data before the final evaluation, to avoid overfitting your brain. You need to split your data into a development set for training, a validation set for parameter tuning, and a totally separate evaluation set for the final evaluation. In a similar way, it is important to state your hypotheses in writing before you test them, to avoid “HARKing” (Hypothesis After the Results are Known; aka “Noise Mining”, “P-Hacking”, or “Procedural Overfitting”). I dearly hope that Registered Reports will catch on, and absolve us from these all too human biases.

In conclusion, EuroScipy 2017 was a ton of fun, and educational in many ways that I did not expect. If you are a scientific programmer, or if you maintain a scientific Python module, or if you are plain interested in scientific Python, I can highly recommend going to EuroScipy next year.

27 Aug 2017

Web Audio Conference and JavaScript

I am not much of a web programmer. I have written the odd website, I have supervised a few student projects, but I have never worked on any nontrivial JavaScript code base. Nevertheless, last week, I attended the Web Audio Conference 2017 in London. To put it succinctly: The web is home to fascinating people, but the technology is full of problems.

Those people sure were amazing, though. I talked to a musician/programmer, who spent the last few years writing his own sequencer web app, and did an amazing live performance in his web browser! I witnessed another guy live coding a synthesizer in a web app on stage as a music piece. We attended a gastronomical concert with smartphone-synchronized distributed olfactory and audible experiences. And we played the piano, with each participant controlling one key from his smartphone. It was all truly inspiring!

All of this was built on JavaScript, however. One musical performance reached its climax when the musician queued too many samples and synthesizers and JavaScript crashed. The crowd cheered, but this still highlights how JavaScript, garbage collection pauses, and heap exhaustion are real problems. And there was not a single talk that did not long for dedicated Audio Workers to fix some of these problems.

But more than that, I learned that most of these apps have to pile frameworks upon frameworks just to get a simple demo app going. This is inescapable, since JavaScript is just not usable as an application platform without these frameworks. I take this as a warning: One should build a project from scratch every now and then, just to keep a realistic feel of how deep one's technology stack really is.

All of these people are using JavaScript as an application platform. I would wager that an application platform is basically something like ①: layout, ②: widgets, ③: data bindings, and ④: event handling. If you think about it, JavaScript without libraries currently covers at most ① (CSS) and maybe ② (using HTML Custom Elements), though not at all at the same level as true GUI frameworks such as Qt or Cocoa. For ③ and ④, you have to add something like Ember, React, or Angular. It is strange to me that the JavaScript ecosystem does not seem to gravitate towards integrated frameworks but seems to prefer this wild west of codependent libraries.

Then again, limitations breed genius, and the limits of the platform no doubt fueled the amazing feats these web audio people pulled off at WAC. It is truly inspiring what miracles dedicated people can pull off if they set their mind on them. And let's not forget that the Web Audio API itself is the work of a single person (per browser) as well.

So, in summary, I learned a lot at WAC. I learned that web technologies are not ideal as an application platform. I learned that deep framework stacks are not desirable, yet often necessary. But above all, I learned that all this does not stop people from writing astonishing applications with those technologies. More power to the crazy ones!

10 Jul 2017

Audio APIs, Part 3: WASAPI / Windows

This is part three of a three-part series on the native audio APIs for Windows, Linux, and macOS. This third part is about WASAPI on Windows.

It has long been a major frustration for my work that Python does not have a great package for playing and recording audio. My first step to improve this situation was a small contribution to PyAudio, a CPython extension that exposes the C library PortAudio to Python. However, I soon realized that PyAudio mirrors PortAudio's C API a bit too closely for comfort. Thus, I set out to write PySoundCard, which is a higher-level wrapper for PortAudio that tries to be more pythonic and uses NumPy arrays instead of untyped bytes buffers for audio data. However, I then realized that PortAudio itself had some inherent problems that a wrapper would not be able to solve, and a truly great solution would need to do it the hard way:

Instead of relying on PortAudio, I would have to use the native audio APIs of the three major platforms directly, and implement a simple, cross-platform, high-level, NumPy-aware Python API myself. This effort resulted in PythonAudio, a new pure-Python package that uses CFFI to talk to PulseAudio on Linux, Core Audio on macOS, and WASAPI1 on Windows.

This series of blog posts summarizes my experiences with these three APIs and outlines the basic structure of how to use them. For reference, the singular use case in PythonAudio is block-wise playing/recording of float data at arbitrary sampling rates and block sizes. All available sound cards should be listable and selectable, with correct detection of the system default sound cards (a feature that is very unreliable in PortAudio).


WASAPI

WASAPI is one of several native audio libraries in Windows. PortAudio actually supports five of them: Windows Multimedia (MME), the first built-in audio API for Windows 3.1x; DirectSound, the audio subsystem of DirectX for Windows 95; Windows Driver Model / Kernel Streaming (WDM/KS), the improved audio system for Windows 98; ASIO, a third-party API developed by Steinberg to make pro audio possible on Windows; and finally, Windows Audio Session API (WASAPI), introduced in Windows Vista to bring a modern audio API to Windows.

In other words, audio on Windows has a long and troubled history, and has had a lot of opportunity for experimentation. It should then be no surprise that WASAPI is a clean and well-documented audio API that avoids many of the pitfalls of its predecessors and brethren. After having experienced the audio APIs of Windows, Linux, and macOS, I am beginning to understand why some programmers love Windows.

But let's take a step back, and give an overview over the API. First of all, this is a cross-language API that is meant to be used from C#, with a solid bridge for C++, and a somewhat funky bridge for C. This is crucial to understand. The whole API is designed for a high-level, object-oriented runtime, but I am accessing it from a low-level language that has no concept of objects, methods, or exceptions.

Objects are implemented as pointers to opaque structs, with an associated list of function pointers to methods. Every method accepts the object pointer as its first argument, and returns an error value if an exception occurred. Both inputs and outputs are function arguments, with outputs being implemented as pointer-to-pointer values. While this looks convoluted to a C programmer, it is actually a very clean mapping of object oriented concepts to C that never gave me any headaches.

However, there are a few edge cases that did take me a while to understand: Since the C API is inherently not polymorphic, you sometimes have to manually specify types as cryptic UUID structs. Figuring out how to convert the UUID strings from the header files to such structs was not easy. Similarly, it took me a while to reverse-engineer that strings in Windows are actually uint16, despite being declared char. But issues such as these are to be expected in a cross-language API.

In general, I did not find a good overview on how to interpret high-level C#-concepts in C. For example, it took a long time until I learned that objects in C# are reference counted, and that I would have to manage reference counts manually. Similarly, I had one rather thorny issue with memory allocations: in rare occasions (PROPVARIANT), C# is expected to re-allocate memory of an object if the object does not have enough memory when passed into a method. This does not work as intended if you don't use C#'s memory allocator to create the memory. This was really painful to figure out.

Another result of the API's cross-language heritage are its headers: There are hundreds. And they all contain both the C API and the C++ API, separated by the occasional #ifdef __cplusplus and extern C. Worse yet, pretty much every data type and declaration is wrapped in multiple levels of preprocessor macros and typedef. There are no doubt good reasons and a rich history for this, but it took me many hours to assemble all the necessary symbols from dozens of header files to even begin to call WASAPI functions.

Nevertheless, once these hurdles are overcome, the actual WASAPI API itself is well-structured and reasonably simple. You acquire an IMMDeviceEnumerator, which returns IMMDeviceCollections for microphones and speakers. These contain IMMDevices, which represent sound cards and their properties. You activate an IMMDevice with a desired data format to get an IAudioClient, which in turns produces an IAudioRenderClient or IAudioCaptureClient for playback or recording, respectively. Playback and recording themselves are done by requesting a buffer, and reading or writing raw data to that buffer. This is about as straight-forward as APIs get.

The documentation deserves even more praise: I have rarely seen such a well-documented API. There are high-level overview articles, there is commented example code, every object is described abstractly, and every method is described in detail and in reference to related methods and example code. There is no corner case that is left undescribed, and no error code without a detailed explanation. Truly, this is exceptional documentation that is a joy to work with!

In conclusion, WASAPI leaves me in a situation I am very unfamiliar with: praising Windows. There is a non-trivial impedance mismatch between C and C# that has to be overcome to use WASAPI from C. But once I understood this, the API itself and its documentation were easy to use and understand. Impressive!

Footnotes:

1

: WASAPI is part of the Windows Core Audio APIs. To avoid confusion with the macOS API of the same name, I will always to refer to it as WASAPI.

27 Jun 2017

Audio APIs, Part 2: Pulseaudio / Linux

This is part two of a three-part series on the native audio APIs for Windows, Linux, and macOS. This second part is about PulseAudio on Linux.

It has long been a major frustration for my work that Python does not have a great package for playing and recording audio. My first step to improve this situation was a small contribution to PyAudio, a CPython extension that exposes the C library PortAudio to Python. However, I soon realized that PyAudio mirrors PortAudio a bit too closely for comfort. Thus, I set out to write PySoundCard, which is a higher-level wrapper for PortAudio that tries to be more pythonic and uses NumPy arrays instead of untyped bytes buffers for audio data. However, I then realized that PortAudio itself had some inherent problems that a wrapper would not be able to solve, and a truly great solution would need to do it the hard way:

Instead of relying on PortAudio, I would have to use the native audio APIs of the three major platforms directly, and implement a simple, cross-platform, high-level, NumPy-aware Python API myself. This effort resulted in PythonAudio, a new pure-Python package that uses CFFI to talk to PulseAudio on Linux, Core Audio on macOS, and WASAPI1 on Windows.

This series of blog posts summarizes my experiences with these three APIs and outlines the basic structure of how to use them. For reference, the singular use case in PythonAudio is block-wise playing/recording of float data at arbitrary sampling rates and block sizes. All available sound cards should be listable and selectable, with correct detection of the system default sound cards (a feature that is very unreliable in PortAudio).


PulseAudio

PulseAudio is not the only audio API on Linux. There is the grandfather OSS, the more modern ALSA, the more pro-focused JACK, and the user-focused PulseAudio. Under the hood, PulseAudio uses ALSA for its actual audio input/output, but presents the user and applications with a much nicer API and UI.

The very nice thing about PulseAudio is that it is a native C API. It provides several levels of abstraction, the highest of which takes only a handful of lines of C to get audio playing. For the purposes of PythonAudio however, I had to look at the more in-depth asynchronous API. Still, the API itself is relatively simple, and compactly defined in one simple header file.

It all starts with a mainloop and an associated context. While the mainloop is running, you can query the context for sources and sinks (aka microphones and speakers). The context can also create a stream that can be read or written (aka recorded or played). From a high level, this is all there is to it.

Most PulseAudio functions are asynchronous: Function calls return immediately, and call user-provided callback functions when they are ready to return results. While this may be a good structure for high-performance multithreaded C-code, it is somewhat cumbersome in Python. For PythonAudio, I wrapped this structure in regular Python functions that wait for the callback and return its data as normal return values.

Doing this shows just how old Python really is. Python is old-school in that it still thinks that concurrency is better solved with multiple, communicating processes, than with shared-memory threads. With such a mind set, there is a certain impedance mismatch to overcome when using PulseAudio. Every function call has to lock the main loop, and block while waiting for the callback to be called. After that, clean up by decrementing a reference count. This procedure is cumbersome, but not difficult.

What is difficult however, is the documentation. The API documentation is fine, as far as it goes. It could go into more detail with regards to edge cases and error conditions; But it truly lacks high-level overviews and examples. It took an unnecessarily long time to figure out the code path for audio playback and recording, simply because there is no document anywhere that details the sequence of events needed to get there. In the end, I followed some marginally-related example on the internet to get to that point, because the two examples provided by PulseAudio don't even use the asynchronous API.

Perhaps I am missing something, but it strikes me as strange that an API meant for audio recording and playback would not include an example that plays back and records audio.

On an application level, it can be problematic that PulseAudio seems to only value block sizes and latency requirements approximately. In particular, if computing resources become scarce, PulseAudio would rather increase latency/block sizes in the background than risk skipping. This might be convenient for a desktop application, but it is not ideal for signal processing, where latency can be crucial. It seems that I can work around these issues to an extent, but this is an inconvenience nontheless.

In general, I found PulseAudio reasonably easy to use, though. The documentation could use some work, and I don't particularly like the asynchronous programming style, but the API is simple and functional. Out of the three APIs of WASAPI/Windows, Core Audio/macOS, and PulseAudio/Linux, this one was probably the easiest to get working.

Footnotes:

1

: WASAPI is part of the Windows Core Audio APIs. To avoid confusion with the macOS API of the same name, I will always to refer to it as WASAPI.

Older posts