08 Nov 2018

# Cool Python Libraries: TQDM and Resampy

In my recent post about appreciation for open source software, I mentioned that we should praise our open source heros more often. So here are two lesser-known libraries that I use daily, and which are unabashedly awesome:

## TQDM

TQDM draws text progress bars for long-running processes, simply by wrapping your iterator in tqdm(iterator). And this, alone, would be awesome. But, TQDM is one of those libraries that aren't just a good idea, but then go the extra mile, and add fantastic documentation, contingencies for all kinds of weird use cases, and integration with notebooks and GUIs.

I use TQDM all the time, for running my scientific experiments and data analysis, and it just works. For long-running tasks, I recommend using tqdm(iterator, smoothing=0, desc='calculating'), which adds a meaningful description to the progress bar, and an accurate runtime estimate.

## Resampy

Resampy resamples numpy signals. Resample your data with resample(signal, old_samplerate, new_samplerate). Just like with TQDM, this simple interface hides a lot of complexity and flexibility under the hood, yet remains conceptually simple and easy to use.

But beyond simplicity, resampy uses a clever implementation that is a far cry better than scipy.signal.resample, while still being easy to install and fast. For a more thorough comparison of resampling algorithms, visit Joachim Thiemann's blog.

26 Oct 2018

# Dealing with Unreliable Software

About a year ago, I started working on a big comparison study between a bunch of scientific algorithms. Many of these have open-source software available, and I wanted to evaluate them with a large variety of input signals. The problem is, this is scientific code, i.e. the worst code imaginable.

Things this code has done to my computer:

• it crashed, throwing an error, and shutting down nicely
• it crashed with a segfault, taking the owning process with it
• it crashed, leaving temporary files lying around
• it crashed, leaving zombie processes lying around
• it spin-locked, and never returned
• it spin-locked, and forked uncontrollably until all process decriptors were exhausted
• it spin-locked, and ate memory uncontrollably until all memory was consumed
• it crashed other, unrelated programs (no idea how it managed that)
• it corrupted the root file system (no idea how it managed that)

Note that the code did not do any of this intentionally. It was merely code written by non-expert programmers, the problems often a side effect of performance optimizations. The code mostly works fine if called only once or twice. My problems only become apparent if I ran it, say, a few hundred thousand times, with dozens of processes in parallel.

So, how do you deal with this? Multi-threading is not an option, since a segfault would kill the whole program. So it has to be multi-processing. But all the multi-processing frameworks I know will lose all progress if one of the more sinister scenarios from the above list hard-crashed one or more of its processes. I needed a more robust solution.

Basically, the only hope of survival at this point is the kernel. Only the kernel has enough power to rein in rogue processes, and deal with hard crashes. So in my purpose-built multi-processing framework, every task runs in its own process, with inputs and outputs written to unique files. And crucially, if any task does not finish within a set amount of time, it and all of its children are killed.

It took me quite a while to figure out how to do this, so here's the deal:

# start your process with a new process group:
process = Popen(..., start_new_session=True)

# after a timeout, kill the whole process group:
process_group_id = os.getpgid(process.pid)
os.killpg(process_group_id, signal.SIGKILL)


This is the nuclear option. I tried SIGTERM and SIGHUP instead, but programs would happily ignore it. I tried killing or terminating only the process, but that would leave zombie children. Sending SIGKILL to the process group does not take prisoners. The processes do not get a chance to respond or clean up after themselves. But you know what, after months of dealing with this stuff, this is the first time that my experiments actually run reliably for a few days without crashing or exhausting some resource. If that's what it takes, so be it.

14 Oct 2018

# Appreciation for Open Source and Commercial Software

I recently released my first-ever piece of commercial software, a plugin for the X-Plane flight simulator. I wrote this primarily to scratch my own itch, but thought other users might like it, too, so I put it up on the store. What struck me however, were the stark difference between the kinds of responses I got to this, as compared to my open source projects: They were astonishingly, resoundingly, positive!

You see, I have a bunch of open source projects, with a few thousand downloads per month, and a dozen or so issues on Github per week. Most of my interactions with my users are utilitarian, and efficient. Someone reports a bug or asks for help, I ask for clarification or a pull request, we iterate a few times until the issue is resolved. The process is mechanical and the tone of our conversation is equally unemotional. This is as it should be.

After having released my flight simulator plugin, however, people thanked me! They congratulated me! They extolled about the greatness of what I had built! And they did this despite the fact that the initial release had quite a few major bugs, and even flat-out did not work for some people. Yet even people who couldn't get it to work were grateful for my help in resolving their issue!

This blew my mind, in comparison with the drab "I found a bug", "Could you implement…" I was used to from my open source work. There, the feedback I got was mostly neutral (bug reports, feature requests), and sometimes even negative ("You broke something!"). So I release my software for free, as a gift, and get average-negative feedback. My commercial work, in contrast, costs money, and yet the feedback I get is resoundingly positive! I can not overstate how motivating it is to get praise, and love, from my users.

I think this is a huge problem for our Open Source community. I had my run-ins with burnout, when all the pull requests came to be too much, and I started dreading the little notification icon on Github. And I think the negativity inherent in bug reports and feature requests has a huge part to do with this. In the future, I will try to add more praise to my bug reports from now on, just to put things into perspective.

But I think we should go further than that. We should create tools for praising stuff, beyond the impersonal Stars on Github. We should be able to write reviews on Github, and recommendations, and blog posts about cool libraries we find.

I recently got my first github issue that was just a thank-you note. I loved it! We need more positivity like that.

03 Jun 2018

# Syncing Org-Journal with your Calendar

A month ago, org-journal learned to deal with future journal entries. I use future journal entries for appointments or not-yet-actionable tasks that I don't want in my current TODO list just yet. This works really well while I am at my computer, and really does not work at all when I am not (Orgzly does not work with my 1k-file journal directory).

But, as I keep re-discovering, org-mode already has a solution for this: org-mode can export your agenda to an iCalendar file! Most calendar applications can then subscribe to that file, and show your future journal entries right in your calendar. And if you set it up right, this will even sync changes to your calendar!

First, you need to set up some kind of regular export job. I use a cron job that regularly runs an Emacs batch job emacs --batch --script ~/bin/calendar_init.el with the following code in calendar​_init.el:

;; no init file is loaded, so provide everything here:
(setq org-journal-dir "~/journal/"            ; where my journal files are
org-journal-file-format "%Y-%m-%d.org"  ; their file names
org-journal-enable-agenda-integration t ; so entries are on the agenda
org-icalendar-store-UID t               ; so changes sync correctly
org-icalendar-include-todo "all"        ; include TODOs and DONEs
org-icalendar-combined-agenda-file "~/calendar/org-journal.ics")

(require 'org-journal)
(org-journal-update-org-agenda-files) ; put future entries on the agenda
(org-icalendar-combine-agenda-files)  ; export the ICS file
(save-buffers-kill-emacs t)           ; save all modified files and exit


It is important to set org-icalendar-store-UID, as otherwise every change to a future entry would result in a duplicated calendar entry. It will clutter up your journal entries with an UID property, though.

I do this on my web server, with my journal files syncthinged from my other computers. With that, I can subscribe to the calendar file from any internet-connected computer or mobile phone (using ICSdroid). But you could just as well sync only the ICS file, or just subscribe to the local file, if you don't want to upload your complete yournal to a web server.

(Incidentally, I first implemented my own ICS export, before realizing that this functionality already existed in org-mode. It was a fun little project, and I learned a lot about org-mode's internal data structures and the weirdness that are iCalendar files.)

25 May 2018

# GDPR Compliance

Personal email is dead. The signal-to-noise ratio of my personal email account has been deeply negative for years. But the last few days have been especially riveting, with a torrent of GDPR-compliance emails from just about every company that has ever gotten their hands on my email address. Anecdotally, if spam makes up about 90% of all email traffic, and the last few days have seen a ten-fold increase in traffic due to GDPR emails, we might even have "defeated spam" for a few days! Yay internet!

But sadly, I can't add to the signal-to-noise ratio, because this website does not collect any email addresses. And even more sadly, it doesn't even collect IP addresses, or use any kind of analytics at all. Sorry about that. I honestly do not know how many people read my stuff, and I can not rank my blog posts by their popularity. And I like it that way. It prevents me from getting on any kind of treadmill to please any kind of imagined audience.

That is… with one exception: Comments on this website are powered by Disqus, and I'm sure Disqus collects all kinds of data about all kinds of things. That's why comments are now hidden behind a "Load Disqus Comments" button. I promise you that no foreign Javascript is executed unless you press that button1. And honestly, GDPR didn't even have anything to do with that. I just didn't want any foreign JavaScript.

1