Posts tagged "windows":
OS Customization and MacOS
I was an Apple fanboy some years ago. Back then, whenever something was odd on my computer, I was surely just using it wrong. Nowadays, I see things the other way around: We're not "holding it wrong", the computer is just defective. Computers should do our bidding, not vice versa. So here's a bunch of things that I do to my computers to mold them to my way of working.
Keyboard Layout
I switch constantly between a German and English keyboard layout, and regularly between various machines. My physical keyboards are German, and my fingers are used to the Windows-default German and (international) US keyboard layout. These are available by default on Windows and Linux, but MacOS goes its own way.
However, keyboard layouts on MacOS are saved in relatively simple text files, and can be modified
with relative ease. The process goes like this: Download Ukelele (free) to create a new keyboard
layout bundle for your base layout1. Inside that bundle, there's a *.keylayout file,
which is an XML file that defines the characters that each key-modifier combination produces. I
changed that to create a Windows-like US keyboard layout. And I replaced the keyboard icon with
something sane (not "A") by creating a 256x256 pixel PNG, opening it in Preview, holding alt while
saving to select the ICNS format. Save the keyboard bundle to ~/Library/Keyboard Layouts
and
reboot. Then I remove the unremovable default ("A") German layout by selecting another one, then
plutil -convert xml1 ~/Library/Preferences/com.apple.HIToolbox.plist
, and delete the entry from
AppleEnabledInputSources
. Now reboot. Almost easy. Almost.
One the one hand, this was quite the ordeal. On the other, I have tried to do this sort of thing on Windows and Linux before, and for the life of me could not do it. So I actually think this is great!
Keyboard Shortcuts
My main text editor is Emacs, and I am very used to its keyboard shortcuts. Of particular note are CTRL-A/E for going to the beginning/end of a line, and Alt-B/F for navigating forward/backwards by word. I have long wanted to use these shortcuts not just in Emacs and readline-enabled terminal applications, but everywhere else, too. And with MacOS, this is finally possible: Install BetterTouchTool ($22), and create keyboard shortcuts that maps, e.g. Alt-B/F to Alt-←/→. Ideally, put this in a new activation group that excludes Emacs. It may be necessary to remove the keyboard character for Alt-B/F from your keyboard layout before this works. I've spent an embarrassing number of hours trying to get this to work on Windows and Linux, and really got nowhere2. Actually, however, most readline shortcuts such as Ctrl-A/E/B/F/K/Y already work out of the box on MacOS!
Mouse Configuration
I generally use a trackpad, and occasionally a traditional mouse for image editing, and have used a trackball. I find that any one specific device will lead to wrist pain if used constantly, so I switch it up every now and then. The trackpad and trackball, however, need configuration to be usable.
After experimenting with many a trackpad device, I have found Apple touch pads the best trackpads on the market3. On MacOS, they lacks a middle mouse click. So I created a trackpad shortcut in the aforementioned BetterTouchTool ($22) to map the middle click on a three-finger tap (can also be had for free with MiddleClick (OSS)). For Windows, Magic Utilities ($17/y) provides a wonderful third-party driver for Apple devices that also supports the three-finger tap. I have not gotten the Apple touch pad to pair reliably on Linux, and have generally found their touch pad driver libinput a bit lacking.
My trackball is a Kensington SlimBlade. To scroll, you rotate the ball around its vertical axis. This is tedious for longer scroll distances, however. But there's an alternative scrolling method called "button scrolling", where you hold one button on the trackball, and move the ball to scroll. You need to install the Kensington driver to enable this on Windows and MacOS. Button scrolling is available on Linux as well using xinput4, but I haven't gotten the setting to stick across reboots or sleep cycles. So I wrote a background task that checks xinput every five seconds, which does the trick.
Window Management
Frankly, Windows does window management correctly. Win-←/→ moves windows to the left and right edge of the screen, as does dragging the window to the screen border. Further Win-←/→ then moves the window to the next half-screen in that direction, even across display boundaries. KDE does this correctly out of the box as well, Gnome does not do the latter, and it drives me mad. MacOS doesn't do any of these things. But Rectangle (OSS) does. Easy fix. (BetterTouchTool can do it, too, but Rectangle is prettier)
Furthermore, I want Alt-Tab to switch between windows. Again, MacOS is the odd one out, which uses CMD-Tab to switch between apps, now windows. And then there's another shortcut for switching between windows of the same app, but the shortcut really doesn't work at all on a German keyboard. Who came up with this nonsense? Anyway, Witch ($14) implements window switching properly.
Application Management
In Windows and Linux, I hit the Windows key and start typing to select and start an app. In MacOS, this is usually Cmd-Space, but BetterTouchTool can map it to a single short Cmd, if you prefer.
More annoying are the various docks and task bars. I always shove the dock off to the right edge of the screen, where it stays out of the way. Windows 10 had a sane dock, but then 11 came and forced it to the bottom of the screen. Dear OS makers, every modern screen has plenty of horizontal space. But vertical space is somewhat limited. So why on earth would you make a rarely used menu such as the dock consume that precious vertical space by default? And Microsoft, specifically, why not make it movable? Thankfully, there's StartAllBack ($5), which replaces the Windows task bar with something sensible, and additionally cleans up the start menu if you so desire. On KDE, I fractionally prefer Latte (OSS) over KDE's native dock. The MacOS dock is uniquely dumb, offering no start menu, and allowing no window selections. But it's unobtrusive and can be moved to the right edge, so it's not much of a bother.
File Management
One of the most crucial tasks in computer work in general is file management. I am not satisfied with most file managers. Dolphin on KDE works pretty well, it has tabs, can bulk-rename files, can display large directories without crashing, and updates in real time when new files are added to the current directory. Gnome Nautilus is so bad it is the main reason I switched to KDE on my Linux machines. Finder on MacOS is passable, I suppose, although the left sidebar is unnecessarily restrictive (why can't I add a shortcut to a network drive?). Windows Explorer is really rather terrible, lacking a bulk-rename tool, and crucially, tabs. In Windows 10, these can be added with Groupy ($12) (set it to only apply to explorer.exe). Windows 11 has very recently added native tabs, which work OK, but can't be detached from the window.
The sad thing is that there are plenty of very good file manager replacements out there, but none of the OSs have a mechanism for replacing their native file manager in a consistent way, so we're mostly stuck with the defaults.
Oh, and I always remove the iCloud/OneDrive sidebar entries, which is surprisingly tedious on Windows.
Hardware Control
On laptops, you can control screen brightness from your keyboard. On desktops, you can not. However, some clever hackers have put together BetterDisplay (OSS for screen brightness), which adds this capability to MacOS. That's actually a capability I have wanted for quite a while, and apparently it is only available in MacOS. Great stuff!
Less great is that MacOS does not allow volume control on external sound cards. SoundSource ($47) adds this rather crucial functionality back, once you go through the unnecessarily excruciating process of enabling custom kernel extensions. Windows and Linux of course natively support this.
Another necessary functionality for me is access to a non-sucky (i.e. no FAT) cross-platform file system. At the moment, the most portable file system seems to be NTFS, of all things. Regrettably, MacOS only supports reading NTFS, but no writing. Paragon NTFS (€20) adds this with another kernel extension, and promptly kernel-panicked my computer. Oh joy. At least it's only panicking for file transfers initiated by DigiKam, which I can work around. Paragon Support says they're working on it. I'm not holding my breath. Windows and Linux of course natively support NTFS.
System Management
I have learned from experience not to trust graphical backup programs. TimeMachine in particular has eaten my backups many times already, and can not be trusted. But I have used Borg (OSS) for years, and it has so far performed flawlessly. Even more impressive, my Borg backups have a continuous history despite moving operating systems several times. It truly is wonderful software!
On Windows, I run Borg inside the WSL, and schedule its backups with the Windows Task Scheduler. On Linux, I schedule them with systemd units. On MacOS, I install Borg with Homebrew (OSS) and schedule the backups with launchd tasks. It's all pretty equivalent. One nice thing about launchd, however, is how the OS immediately pops up a notification if there's a new task file added, and adds the task to the graphical system settings.
I have to emphasize what a game-changer the WSL is on Windows. Where previously, such simple automations where a pain in the neck to do reliably, they're now the same simple shell scripts as on other OSes. And it perfectly integrates with Windows programs as well, including passing pipes between Linux and Windows programs. It's truly amazing! At the moment, I'd rate Windows a better Unix system than MacOS for this reason. Homebrew is a passable package manager on MacOS, but the way it's ill-integrated into the main system (not in system PATH) is a bit off-putting.
App Compatibility
I generally use my computer for three tasks: General document stuff, photo editing, and video games.
One major downside of Apple computers is that video games aren't available. This has become less of a problem to me since I bought a Steam Deck, which has taken over gaming duties from my main PC. Absolutely astonishingly, the Steam Deck runs Windows games on Linux through emulation, which works almost flawlessly, making video games no longer a Windows-only proposition.
What doesn't work well on Linux are commercial applications. Wine generally does not play well with them, and frustratingly for my photo editing, neither VMWare Workstation Player (free) nor VirtualBox (OSS) support hardware-accelerated VMs on up-to-date Linux5. So where MacOS lacks games, Linux lacks Photoshop. Desktop applications in general tend to be unnecessarily cumbersome to manage and update on Linux. Flatpak is helping in this regard, by installing user-facing applications outside of the OS package managers, but it remains more work than on Windows or MacOS. The occasional scanner driver or camera interface app can also be troublesome on Linux, but that's easily handled with a VirtualBox VM (with the proprietary Extension Pack for USB2 support), and hasn't really bothered me too much.
Luckily for me, my most-used apps are generally OSS tools such as Darktable (OSS) and DigiKam (OSS), or cross-platform programs like Fish (OSS), Git (OSS), and Emacs (OSS). This is however, where Windows has a bit of a sore spot, as these programs tend to perform noticeably worse on Windows than on other platforms. Emacs and git in particular are just terribly slow on Windows, taking several seconds for routine operations that are instant on other platforms. That's probably due to Windows' rather slow file system and malware scanner for the many-small-files style of file management that these Unix tools implement. It is very annoying.
Conclusions
So there's just no perfect solution. MacOS can't do games, Linux can't run commercial applications, and Windows is annoyingly slow for OSS applications. Regardless, I regularly use all three systems productively. My job is mostly done on Windows, my home computer runs MacOS, and my Steam Deck and automations run Linux.
Overall, I currently prefer MacOS as my desktop OS. It is surprisingly flexible, and more scriptable than I thought, and in some ways is actually more functional than Linux or Windows. The integrated calendar and contacts apps are nice, too, and not nearly as terrible as their Windows/Linux counterparts. To say nothing of the amazing M1 hardware with its minuscule power draw and total silence, while maintaining astonishing performance.
Linux is where I prefer to program, due to its sane command line and tremendously good compiler/debugger/library infrastructure. As a desktop OS, it does have some rough edges, however, especially for its lack of access to commercial applications. While Linux should be the most customizable of these three, I find things tend to break too easily, and customizations are often scattered widely between many different subsystems, making them very hard to get right.
Theoretically, Windows is the most capable OS, supporting apps, OSS, and games. But it also feels the most user-hostile of these three, and the least performant. And then there's the intrusive ads everywhere, its spying on my every move, and at work there's inevitably a heavy-handed administrator security setup that gets in the way of productivity. It's honestly fine on my home computer, at least since they introduced the WSL. But using it for work every day is quite enough, so I don't want to use it at home, too.
Footnotes:
if there's an easier way to create a keyboard layout bundle, please let me know. I didn't use ukelele for anything but the bundle creation.
It occurs to me that it might actually be possible to do something like on Windows this with AutoHotkey (free). I'll have to try that!
The Logitech T650 is actually not bad, but the drivers are a travesty. Some Wacom tablets include touch, too, but they palm rejection is abysmal and they don't support gestures.
xinput set-prop "Kensington Slimblade Trackball" "libinput Scroll Method Enabled" 0, 0, 1 # button scrolling
xinput set-prop "Kensington Slimblade Trackball" "libinput Button Scrolling Button" 8 # top-right button
VirtualBox does not have a good accelerated driver, and VMWare does not support recent kernels. Qemu should be able to solve this problem, but I couldn't get it to work reliably.
Fixing AMD OpenCL on Windows
When I recently installed Windows 11 on my desktop, my photo editor Darktable suddenly got much slower than it used to be. When I looked into its preferences, I noticed OpenCL was no longer available.
As it turns out, some versions of the AMD graphics driver apparently no longer ship with OpenCL support on Windows. However, they do ship with the necessary libraries, it's just that these libraries are not registered any longer.
To register them, open the Registry Editor (aka regedit.exe), navigate to the key HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors
(if the key does not exist, create it), and create a new DWORD of Value 0
. Now rename the DWORD to be the path to your amdocl64.dll. Mine is C:\Windows\System32\DriverStore\FileRepository\u0344717.inf_amd64_d38cec78c83eee99\B343886\amdocl64.dll
. Search C:\Windows\System32\ for amdocl64.dll to find the correct path on your computer.
With that, OpenCL was once again recognized by Darktable and the rest of my photo editing programs.
(source)
Dear Computer, We Need to Talk
After years of using Linux on my desktop, I decided to install Windows on my computer, to get access to a few commercial photo editing applications. I'll go into my grievances with Linux later, but for now:
I tried to install Windows, you won't believe what happened next
Like I have done many times with Linux, I download a Windows image from my university, and write it to a USB drive, then reboot into the USB drive. The USB drive can't be booted. A quick internet search leads me to a Microsoft Support page on how to Install Windows from a USB Flash Drive, which says that
Instead, one has to format the stick as FAT32, make it active, then copy the files from the image to it. So I follow the instructions and open the Disk Management program. It does not offer an option of FAT32, nor for making the partition active. I settle on (inactive) exFAT instead. It doesn't boot.
I switch over to Linux, where I can indeed make a FAT32 partition, and I can mark it as bootable, which I take as the equivalent of active. But Linux can not open the Windows image to copy the files onto the USB stick. So back to Windows, for copying the files. Except they can't be copied, because some of them are larger than 4Gb, which can't be written to a FAT32 partition. What now?
While researching how to download a different version of Windows 10, I stumble upon the Media Creation Tool, which automatically downloads Windows 10 and writes it to the USB stick correctly. Why was this not pointed out in the article above? Who knows. At any rate, it works. I can finally install Windows.
The installation process requires the usual dozen-or-so refusals of tracking, ads, privacy intrusions, and voice assistants. I wish I could simply reject them all at once. And then the install hangs, while "polishing up a few things". Pressing the helpful back button, and then immediately the forward button unhangs it, and the installation completes.
Next up are drivers. It feels anachronistic to have to install drivers manually in this day and age, but oh well. The new GPU driver to make screen tearing go away, a driver for my trackball to recognize the third mouse button, a wacom driver, ten or so Intel drivers of unknown utility. The trackball driver is not signed. I install it anyway. The GPU driver does not recognize my GPU and can't be installed. A quick Internet search reveals that my particular AMD/Intel GPU/CPU was discontinued from support by both AMD and Intel, and does not have a current driver. But fora suggest that up to version 20.2.1 of the AMD driver work fine. They don't, the driver crashes when I open images in my photo editor. An even older version published by Intel does work correctly. So now I am running an AMD GPU with an Intel driver from 2018.
Installing and setting up Firefox and my photo editors works without issue, thank goodness. Emacs has a Windows installer now, which is greatly appreciated. OpenCL and network shares just work. This is why I'm installing Windows next to my Linux.
But Windows is still not activated. I copy my university's product key in the appropriate text box, but hesitate: That's for Windows Enterprise, and I'd be just fine with Home. So I cancel the activation without activating. A helpful link in the activation systems sends me to the Microsoft Store to get my very own version of Windows Home for €145, which normally retails for around €95, so that's a no-go. Whatever, I'll go with my university's Enterprise edition. Except the activation box now says my product key is invalid. And the Store now literally says "We don't know how you got here but you shouldn't be here" instead of selling me Windows. After a restart it installs and activates Windows Enterprise, even though I never actually completed the activation.
I install Git, but in order to access my Github I need to copy over my SSH key from the Linux install. Which I can't boot at the moment, because installing Windows overwrites the boot loader. This is normal. So I download Ubuntu, write it to the USB stick, boot into it, recover the bootloader, boot into my old install, reformat the stick, copy the files to the stick, boot back into Windows, and the files aren't on the stick. Tough. Boot back into Linux, copy the files onto the stick, eject the stick, boot back into Windows, copy the files to the computer. Great user experience.
Now that I have my SSH key, I open a Git Bash to download a project. It says my credentials are incorrect. I execute the same commands in a regular CMD instead of Git Bash, and now my credentials are correct. Obviously.
There are several programs that claim to be able to read Linux file systems from Windows. They do not work. But Microsoft has just announced that you will be able to mount Linux file systems from WSL in a few weeks or months. So maybe that will work!
I set my lock screen to a slideshow of my pictures. Except my pictures do not show up, and I get to see Window's default pictures instead. An internet search reveals that this is a wide-spread problem. Many "solutions" are offered in the support fora. What works for me is to first set the lock screen to "Windows Spotlight", then to "Slideshow". Only in that order will my pictures be shown.
I will stop here. I could probably go on ad infinitum if I wanted to. This was my experience of using Windows for one day. I consider these problems relatively benign, in that all of them had solutions, if non-obvious ones.
Why install Windows in the first place?
Part of the reason for installing Windows was my growing frustration with Linux. I have been a happy user of KDE of various flavors for about seven years now. But ever since I got into photo editing, things began to become problematic:
My photo editor requires OpenCL, but the graphics driver situation on Linux is problematic, to say the least. I generally managed to get RocM running most of the time, but kernel updates frequently broke it, or required down- or upgrading RocM. It was a constant struggle.
I wanted to work with some of my data on a network share, but KDE's implementation of network shares does not simply mount them for applications to use, but instead requires each application to be able to open network locations on their own. Needless to say, this worked almost never, requiring many unnecessary file copies. Perhaps Gnome handles network shares better, but let's not open that can of worms.
Printing photos simply never worked right for me. The colors were off, photo papers were not supported, the networked printer was rarely recognized. Both for a Samsung printer and an Epson and a Canon. One time a commercial printer driver for Linux printed with so much ink it dripped off the paper afterwards. Neither Darktable nor Gimp nor Digikam have a robust printing mode. I generally resorted to Windows for printing.
I ran that Windows in a virtual machine. With Virtualbox, the virtual machine would be extremely slow, to the point where it had a delay of several seconds between typing and seeing letters on the screen. VMWare did better, but would suddenly freeze and hang for minutes at a time. Disabling hugepages helped sometimes, for a short while. The virtual machine network was extremely unreliable. Some of these issues were probably related to my using a 4K screen.
Speaking of screens, I have two screens, one HighDPI 4k and one normal 1440p. Using X, the system can be either in HighDPI mode, or in normal mode. But it can't drive the two displays in different modes. Thus the second monitor was almost useless and I generally worked only on the 4k screen. With Wayland I would have been able to use both screens in different modes, but not be able to color-calibrate them or record screen casts. Which is completely unacceptable. So I stuck with one screen and X. In Windows, I can use both screens and calibrate them.
Additionally, Linux hardware support is still a bit spotty. My SD card reader couldn't read some SD cards because of driver issues. It would sometimes corrupt the SD card's file systems. USB-connected cameras were generally not accessible. The web cam did not work reliably. The CPU fan ran too hot most of the time.
So there had been numerous grievances in Linux that had no solutions. Still I stuck with it because so many more smaller issues were actually fixable if I put in the work. In fact I had accumulated quite a number of small hacks and scripts for various issues. I feared that Windows would leave me without recourse in these situations. And it doesn't. But at least the bigger features generally work as advertised.
Where do we go from here?
Just for completion's sake, I should really find an Apple computer and run it through its paces. From my experience of occasionally using a Macbook for teaching over the last few years, I am confident that it fares no better than Linux or Windows.
Were things always this broken? How are normal people expected to deal with these things? No wonder every sane person now prefers a smartphone or tablet to their computers. Limited as they may be, at least they generally work.
There is no joy in technology any more.
Installing Emacs on Windows
The official website states that you need to download Emacs from a nearby GNU mirror. However, this does not install gnutls, which is required for installing packages from melpa or marmalade. The documentation says that this can be obtained from ezwinports.
However, I have found that this does not work any more: As of Emacs 25, Emacs is available in 64 bit, but ezwinport only supplies 32 bit binaries. I had to search a bit, but the (in retrospect, obvious) solution is to download the required binaries from GnuTLS's website, directly. Then unpack all *.dll
from the bin
directory to your Emacs's bin
directory, and you are good to go.
This situation is really a bit sad. Installing Emacs should not be this hard. But there is talk about providing a Windows installer for Emacs in one of the next versions, which will hopefully fix these issues.
Audio APIs, Part 3: WASAPI / Windows
This is part three of a three-part series on the native audio APIs for Windows, Linux, and macOS. This third part is about WASAPI on Windows.
It has long been a major frustration for my work that Python does not have a great package for playing and recording audio. My first step to improve this situation was a small contribution to PyAudio, a CPython extension that exposes the C library PortAudio to Python. However, I soon realized that PyAudio mirrors PortAudio's C API a bit too closely for comfort. Thus, I set out to write PySoundCard, which is a higher-level wrapper for PortAudio that tries to be more pythonic and uses NumPy arrays instead of untyped bytes
buffers for audio data. However, I then realized that PortAudio itself had some inherent problems that a wrapper would not be able to solve, and a truly great solution would need to do it the hard way:
Instead of relying on PortAudio, I would have to use the native audio APIs of the three major platforms directly, and implement a simple, cross-platform, high-level, NumPy-aware Python API myself. This effort resulted in PythonAudio, a new pure-Python package that uses CFFI to talk to PulseAudio on Linux, Core Audio on macOS, and WASAPI[1] on Windows.
This series of blog posts summarizes my experiences with these three APIs and outlines the basic structure of how to use them. For reference, the singular use case in PythonAudio is block-wise playing/recording of float
data at arbitrary sampling rates and block sizes. All available sound cards should be listable and selectable, with correct detection of the system default sound cards (a feature that is very unreliable in PortAudio).
[1]: WASAPI is part of the Windows Core Audio APIs. To avoid confusion with the macOS API of the same name, I will always to refer to it as WASAPI.
WASAPI
WASAPI is one of several native audio libraries in Windows. PortAudio actually supports five of them: Windows Multimedia (MME), the first built-in audio API for Windows 3.1x; DirectSound, the audio subsystem of DirectX for Windows 95; Windows Driver Model / Kernel Streaming (WDM/KS), the improved audio system for Windows 98; ASIO, a third-party API developed by Steinberg to make pro audio possible on Windows; and finally, Windows Audio Session API (WASAPI), introduced in Windows Vista to bring a modern audio API to Windows.
In other words, audio on Windows has a long and troubled history, and has had a lot of opportunity for experimentation. It should then be no surprise that WASAPI is a clean and well-documented audio API that avoids many of the pitfalls of its predecessors and brethren. After having experienced the audio APIs of Windows, Linux, and macOS, I am beginning to understand why some programmers love Windows.
But let's take a step back, and give an overview over the API. First of all, this is a cross-language API that is meant to be used from C#, with a solid bridge for C++, and a somewhat funky bridge for C. This is crucial to understand. The whole API is designed for a high-level, object-oriented runtime, but I am accessing it from a low-level language that has no concept of objects, methods, or exceptions.
Objects are implemented as pointers to opaque structs, with an associated list of function pointers to methods. Every method accepts the object pointer as its first argument, and returns an error value if an exception occurred. Both inputs and outputs are function arguments, with outputs being implemented as pointer-to-pointer values. While this looks convoluted to a C programmer, it is actually a very clean mapping of object oriented concepts to C that never gave me any headaches.
However, there are a few edge cases that did take me a while to understand: Since the C API is inherently not polymorphic, you sometimes have to manually specify types as cryptic UUID structs. Figuring out how to convert the UUID strings from the header files to such structs was not easy. Similarly, it took me a while to reverse-engineer that strings in Windows are actually uint16
, despite being declared char
. But issues such as these are to be expected in a cross-language API.
In general, I did not find a good overview on how to interpret high-level C#-concepts in C. For example, it took a long time until I learned that objects in C# are reference counted, and that I would have to manage reference counts manually. Similarly, I had one rather thorny issue with memory allocations: in rare occasions (PROPVARIANT
), C# is expected to re-allocate memory of an object if the object does not have enough memory when passed into a method. This does not work as intended if you don't use C#'s memory allocator to create the memory. This was really painful to figure out.
Another result of the API's cross-language heritage are its headers: There are hundreds. And they all contain both the C API and the C++ API, separated by the occasional #ifdef __cplusplus
and extern C
. Worse yet, pretty much every data type and declaration is wrapped in multiple levels of preprocessor macros and typedef
. There are no doubt good reasons and a rich history for this, but it took me many hours to assemble all the necessary symbols from dozens of header files to even begin to call WASAPI functions.
Nevertheless, once these hurdles are overcome, the actual WASAPI API itself is well-structured and reasonably simple. You acquire an IMMDeviceEnumerator
, which returns IMMDeviceCollections
for microphones and speakers. These contain IMMDevices
, which represent sound cards and their properties. You activate an IMMDevice
with a desired data format to get an IAudioClient
, which in turns produces an IAudioRenderClient
or IAudioCaptureClient
for playback or recording, respectively. Playback and recording themselves are done by requesting a buffer, and reading or writing raw data to that buffer. This is about as straight-forward as APIs get.
The documentation deserves even more praise: I have rarely seen such a well-documented API. There are high-level overview articles, there is commented example code, every object is described abstractly, and every method is described in detail and in reference to related methods and example code. There is no corner case that is left undescribed, and no error code without a detailed explanation. Truly, this is exceptional documentation that is a joy to work with!
In conclusion, WASAPI leaves me in a situation I am very unfamiliar with: praising Windows. There is a non-trivial impedance mismatch between C and C# that has to be overcome to use WASAPI from C. But once I understood this, the API itself and its documentation were easy to use and understand. Impressive!
Debugging und GCC auf Windows
So, jetzt habe ich mein Mex-File zum Einlesen beliebiger Audiodateien endlich lauffähig auf Windows und Mac. Leider werde ich nicht dafür bezahlt, auch noch eine Linux-Version zu bauen, aber falls Interesse besteht, versuche ich mich vielleicht einmal daran.
The State of The Union: Kleine Dateien einlesen, kein Problem. Exotische Formate einlesen, kein Problem. Metadaten auslesen, kein Problem. Dateigröße, Bitrate und Samplerate auslesen, ein kleines Problem, da diese Parameter bei komprimierten Formaten nicht unbedingt fest stehen. Große Dateien einlösen, auf dem Mac kein Problem, auf Windows… nun ja, es dauert. Eine WAV-Datei von 5:30 min einzulesen, dauert mit Windows momentan ca. eine Stunde. Das kann nicht sein, in der Zeit habe ich die Datei dem Programm vorgelesen, wenn es sein muss.
Also, was ist da faul? Jetzt heißt es debuggen: GDB ist mein Freund, aber leider spreche ich seine Sprache nicht, also Oldschool-Debugging mit printf() (bzw. mexPrintf(); Aber da `#define printf mexPrintf` ist das das selbe). Blöd nur, dass Matlab selbst entscheidet, wann es meine Printfs auf den Bildschirm schreibt und es sich dazu entschlossen hat, dies immer erst nach dem Ausführen der Datei, also erst nachdem es bereits eine Stunde gearbeitet hat, zu tun. Einiges Hirnen später konnte ich Matlab endlich über eine Kombination aus Typecasts, sprintf und mexWarnMsgTxt dazu überreden, wenigstens sporadisch ein paar Informationen herauszugeben.
Das Ergebnis:
- Die Datei funktioniert tadellos, ist nur ein wenig langsam (s.o.)
- Wer ist schuld? Realloc ist schuld!
Das kam überraschend! Offenbar ist realloc auf dem Mac um mehrere Größenordnungen performanter als auf MinGW/Windows, denn die selbe Anwendung, die auf dem Mac ca. eine Sekunde braucht, braucht auf Windows eine Stunde! Und das allein wegen realloc! (Eigentlich: eine halbe Stunde wegen realloc, der Rest ist der Tatsache geschuldet, dass Windows in einer VM läuft)
Bei WAV-Dateien werden immer 2048 Samples an einem Stück ausgelesen. Danach verwende ich ein realloc, um meinen haupt-Speicherpuffer um diese Größe zu vergrößern und kopiere die neuen Daten dort hinein. Bei meinen 5:30 min macht das bei einer Samplerate von 44100 kHz und zwei Kanälen ca. 15000 Aufrufe von realloc. Komprimierte Datenformate haben üblicherweise kleinere Frames und damit noch einmal wesentlich mehr realloc-Aufrufe. Der Plan ist also, jetzt statt häufiger, kleiner realloc-Aufrufe, seltenere, größere Aufrufe zu machen. Zeit für ein paar Experimente:
realloc()-Größe | realloc()-Aufrufe | benötigte Zeit |
211 = 2048 | 15000 | ~1 h |
216 = 65536 | 470 | ~2 min |
217 = 131072 | 240 | ~1 min |
218 = 262144 | 120 | 30 s |
219 = 524288 | 60 | 18 s |
220 = 1048576 | 30 | 10.5 s |
221 = 2097152 | 15 | 7.3 s |
222 = 4194304 | 7 | 5.1 s |
223 = 8388608 | 3 | 4.2 s |
Das Spannende ist: Ich ändere durch meine Methodik praktisch nichts außer der Anzahl und Größe der realloc-Aufrufe, aber man erkennt einen eindeutigen Zusammenhang zwischen Performance und Anzahl der Aufrufe, ergo ist realloc der alleinige Schuldige für mein Performanceproblem auf Windows.
An dieser Stelle fiel mir ein, dass ich bereits an früherer Stelle einmal die gesamte Länge des Audio-Streams anhand der Metadaten geschätzt hatte. Durch eine somit vorgenommene Prä-Allokation des gesamten Speichers lässt sie die Laufzeit weiter auf 2.2 s drücken. Das ist immernoch nicht einmal halb so schnell wie auf OSX (0.9 s), aber das mag auch an der virtuellen Maschine liegen.
Mehr als diesen anecdotal Evidence kann ich nicht anbieten, aber ich bin mir sicher, dass ich ab jetzt die Finger von inkrementiellen Speichervergrößerungen auf MinGW/Windows lassen werde. Ist das in MSVC ähnlich schlimm, oder habe ich da etwa einen Bug entdeckt?
Kompilieren auf Windows
Seit einigen Wochen arbeite ich an einem kleinen Projekt: Eine Matlab-Funktion, die, ähnlich wie die standard-Funktion wavread(), Audiodateien einlesen kann. Aber nicht irgendwelche Audiofiles, sondern ALLE MÖGLICHEN Audiofiles. Wie geht das? Jeder kennt VLC, den Video-Player, der so ziemlich jedes Video öffnen kann, das man ihm vorsetzt, selbst wenn man überhaupt keine Codecs installiert hat. VLC basiert auf FFmpeg, einem Open-Source Programm, welches Funktionen bereit stellt, um eben alle möglichen Mediendaten zu öffnen.
Und da FFmpeg freie Software ist, kann man sie auch für andere Dinge verwenden, etwa, um mit Matlab Audiodateien zu öffnen. Fehlt noch eine Verbindung zwischen Matlab und den FFmpeg-C-Bibliotheken, und die gibt es in Form von Mex, der C-Schnittstelle von Matlab. Feine Sache, zwar hat es eine Weile gedauert, bis ich mich in libavformat und libavcodec eingearbeitet hatte (die beiden wichtigsten FFmpeg-Bibliotheken), aber im Endeffekt lief das alles sehr schmerzfrei – und das, obwohl ich bisher Mex-Kompilieren mit Matlab immer als eine grausige Beschäftigung in Erinnerung hatte, gespickt von kryptischen Kompiler-Fehlern und hässlichen Notlösungen.
Bumms, Zack, kaum hatte ich mich versehen, hatte ich ein lauffähiges, tadellos funktionierendes Mex-File auf meinem Mac liegen. Damit hatte ich nicht gerechnet. Also sofort die momentane Euphorie ausnutzen und weiter zu Schritt 2, das Ganze nochmal auf Windows. Meine Probleme, Windows so einzurichten, dass ich endlich Kompilieren kann, hatte ich ja schon berichtet. Ich hatte also Visual Studio 2005 installiert, um Matlab zufrieden zu stellen und einen anständigen Kompiler auf dem System zu haben. Aber war ja klar, MSVC macht wieder sein eigenes Ding und nichts ist mit Standardkonformität und Trallalla: Keine C99-Unterstützung, also keine Variablendeklarationen mitten im Code und keine stdint.h oder inttype.h. Ein Glück, es gibt wieder ein wenig mehr Free Software, die wenigstens letztere Lücke schließt. Dennoch; Ich bekomme mein mex-File nicht zum Laufen. Es ist wie verflucht, kaum setze ich mich an eine Windows-Maschine zum Programmieren, fällt meine Produktivität auf das Niveau eines Backsteins.
Enter gnumex, noch ein weiteres Stück FOSS, das es ermöglicht, GCC als Mex-Kompiler zu verwenden, AUF WINDOWS. Um die Dinge zu vereinfachen, verwendete ich die MinGW-Variante und kaum war diese Hürde genommen… lief alles. Einfach so. Wahrscheinlich bin ich ein Dickschädel und habe einfach nicht die Geistesschärfe, mit Windows-Kompilern zu arbeiten, aber mir scheint, alles was ich diesbezüglich anfasse und das nicht GCC heißt ist zum Scheitern verurteilt. Ein Glück, dass es die vielen klugen Jungen und Mädchen gibt, die so wunderbare freie Software schreiben, die mir das Leben so viel einfacher macht!
Eine Fortsetzung kommt noch…