Basti's Scratchpad on the Internet
20 Nov 2017

PyEnv is the new Conda

How to install Python? If your platform has a package manager, you might be tempted to use that to install Python. But I don't like that. That version is often outdated, and you risk messing with an integral part of your operating system. Instead, I like to install a separate Python in my home directory. I used to use Anaconda (or WinPython, or EPD) to do this. But now there is a better way: PyEnv

The thing is: PyEnv installs (any version of) Python. That's all it does.

So why would I choose PyEnv over the more popular Anaconda? Because Anaconda is a Python distribution, a package manager, an environment manager, and a platform for paid packages. In the past, they once did break pip because they wanted to promote conda instead. Some features of conda require a login, some require a paid subscription. When you install packages through conda, you get binaries and source code from anaconda's servers, not the official packages from PyPi, which might or might not be up-to-date and feature-complete. For every package you install, you have to make a choice of using pip or conda, and the same goes for specifying your dependencies.

As an aside, many of these complaints are just as true for package-manager-provided Python packages (which often break pip, too!). Just like Anaconda, package managers want to be the true and only source of packages, and don't like to interact with Python's own package manager.

In contrast, with PyEnv, you install a Python. This can be a version of CPython, PyPy, IronPython, Jython, Pyston, stackless, miniconda, or even Anaconda. It downloads the sources from the official repos, and compiles them on your machine 1. Plus, it provides an easy and transparent way of switching between installed versions (including any system-installed versions). After that, you use Python's own venv and pip.

I find this process much simpler, and easier to manage, because it relies on small, orthogonal tools (pyenv, venv, pip) instead of one integrated conda that kind of does everything. I also like to use these official tools and packages instead of conda's parallel universe of mostly-open, mostly-free, mostly-standard replacements.

Mind you, conda solved real problems back in the day (binary package distributions, Python version management, and environment management), and arguably still does (MKL et al, paid packages). But ever since wheels became ubiquitous and painless, and virtualenv was integrated into Python, and the development of PyEnv, these issues now have better solutions, and conda is no longer needed for my applications.

Footnotes:

1

the downside of compilation is: no Windows support.

19 Nov 2017

Installing Emacs on Windows

The official website states that you need to download Emacs from a nearby GNU mirror. However, this does not install gnutls, which is required for installing packages from melpa or marmalade. The documentation says that this can be obtained from ezwinports.

However, I have found that this does not work any more: As of Emacs 25, Emacs is available in 64 bit, but ezwinport only supplies 32 bit binaries. I had to search a bit, but the (in retrospect, obvious) solution is to download the required binaries from GnuTLS's website, directly. Then unpack all *.dll from the bin directory to your Emacs's bin directory, and you are good to go.

This situation is really a bit sad. Installing Emacs should not be this hard. But there is talk about providing a Windows installer for Emacs in one of the next versions, which will hopefully fix these issues.

24 Oct 2017

The Long Game

Here's a thing I do: I write open source libraries that solve my problems. Bad libraries. And then, over the course of a year or two, I slowly improve them, until they are not so bad any more.

I have gone through this pattern a few times now. SoundFile started out buggy and slow. Transplant crashed and froze very often. Org-Journal deleted journal entries every now and then. But then users found errors, and errors were fixed. And slowly, over time, these libraries transformed from buggy prototypes into dependable tools. After a year or so, I saw Github Issues gradually drying up, and feedback changing from "X is broken" to "How can I do X?". It is an oddly gratifying process.

Interestingly though, I rarely hear people talking about this, that building a thing is only the beginning, and the meat of open source development is sticking with it for a long time, and gradually, slowly, weeding out the bugs. All great open source libraries I know evolved this way, yet it is rarely written about.

Playing the long game means anticipating and managing the amount of maintenance I am going to do on a project. When releasing a new library, the first few months and users will shake out bugs, and create work for the maintainers. This can be a dangerous source of stress and anxiety. I've had my brush with burnout in the past, and my only defense to this is to be honest about my availability. Often, this means stating outright that I probably won't have the time to work on a thing.

However, I have also learned that it is OK to defer issues to the future. There have been many problems that I didn't have the time to address when they first came up, but did address a few months later, probably in response to a separate bug report. Important problems have a way of getting fixed eventually. Other times, a solution just takes a long time to mature. There have been multiple issues that lay dormant for months until I stumbled upon a good solution.

And good solutions are important for open source software: If a bug fix adds too much complexity, it will inevitably lead to more bugs. This can easily snowball into a situation where each hour I invest into a project creates more hours of future maintenance burden. A situation like this feels hopeless and terrible. It feels like work. It is therefore important for my sanity to control the complexity of my projects, even if that means not adding features or not addressing problems. Open source development is not beholden to the dollar sign, and gives you the freedom to do it right.

This is not universally understood by programmers or users. And it doesn't just involve technical questions: Some users have expected me to provide commercial-grade support, and got surprisingly unpleasant when I didn't. But it is crucial to realize that publishing open source software does not imply an obligation to fix every issue right now. Quite on the contrary, many open source developers necessarily work on their own schedule, and part of the enjoyment of open source development is derived from this exact freedom!

At this point, my Github lists a bunch of open source projects. Some of them have gotten attention, which lead to bug reports, and, over time, let them grow to maturity. Some turned out to be useful to me, but not to others. Some are useful to others, but not to me. Some turned out to be not useful at all, and are now largely abandoned. And some of them, like RunForrest and SoundCard, are still new and raw and terrible, but will mature over time.

That's a thing I do: I write open source libraries. Good libraries, but as a wise man once said "When you feel how depressingly / slowly you climb, / it's well to remember that / Things Take Time". Working on open source grants you this freedom, and is joyful for it.

19 Sep 2017

Multi-Font Themes in Emacs

Traditionally, text editor themes are all about colors, right? In programming, we use color to tell variables from type declarations, comments, or strings. However, any other text document uses typography instead of color to distinguish between headlines, list items, and keywords. I think that our current approach to highlighting code is misguided.

I think that color themes are an accident of history. Terminal text editors are unable to switch fonts. All they can do is switch colors, so colors is all we use. But in todays graphical world, we are no longer bound by the shackles of terminals, and Emacs can switch fonts just as easily as colors1.

It all started about a year ago, when I fell in love with a font called PragmataPro. One of the coolest features of Pragmata was its native bold and italic letters, and its wide support for unicode symbols. I had to find a use for these features! And so, down the rabbit hole I went. Keywords should be bold! # Comments should be italic! And while we're at it, why not add some underlines? And so on.

The logical next step was then to get rid of colors altogether. At first, as an experiment. Do I really need colors in code? The very pretty eink theme seemed to claim otherwise. After a few months of this lunacy, I realized that while I didn't strictly need colors, the stylistic variations of just one font aren't quite sufficient for source code. In particular, it wasn't always easy to distinguish between italic and roman type in PragmataPro, which lead to some confusion.

But then, inspiration hit me: Who says that I could only use one single font? No one!

color%20theme.png

The tricky bit is to find fonts that work well together. In this example, I'm using PragmataPro for all regular code, Iosevka Slab2 for strings, and oblique3 Iosevka for comments. Ubuntu Mono and InputCompressed work well, too. You can find my current theme on Github. The only downside is that while these fonts share the character width, the heights differ slightly, which sometimes leads to uneven line heights.

Still, I love the look of this! (Of course it won't work in a terminal, or most other text editors.)

Footnotes:

1

: Can other text editors do this? I honestly have no idea.

2

: I just love the look of the slab-serif characters in Iosevka! Look at the beautiful "}" in the screenshot!

3

: not to be confused with italic, which changes glyphs in addition to slanting.

01 Sep 2017

EuroScipy 2017

The second conference I attended this year was the EuroScipy 2017 in Erlangen. I gave a Talk about Audio in Python and a Lightning Talk about my Python/Matlab bridge.

My most striking impression of EuroScipy is that every person I talked to was working on something interesting, and could talk about his/her topic clearly and with enthusiasm. This mirrors my feelings from last year's Chaos Communication Congress, where the short scientific section stood out for its clarity and passion. I also enjoyed the fact that attendees were international and diverse, and exuded a heart-warming sense of community.

Even though each scientific discipline has their own data sets, features, and models, everyone seemed to use a common set of methods (statistics, signal processing, machine learning) for working with that data. And, absolutely everybody used Jupyter Notebooks for tutorials and teaching, and almost all data analyses were done in Pandas. This is particularly heartening since these technologies are geared towards reproducible research and open data.

The hot topic of the conference clearly was machine learning and neural networks. However, the current confusion of competing frameworks and network architectures does not seem to be a good long-term solution. I hope that this ecosystem will eventually reach its NumPy moment, and collapse into a single, unified package. Then, neural networks might find their place as just another machine learning method with a few reusable parametrized prototype implementations in scikit-learn. Tools like Keras look like good steps towards this goal.

Finally, there was a lot of talk about “the reproducibility crisis” in science, and possible steps to improve the scientific process. In particular, I learned that it is absolutely necessary to not look at your test data before the final evaluation, to avoid overfitting your brain. You need to split your data into a development set for training, a validation set for parameter tuning, and a totally separate evaluation set for the final evaluation. In a similar way, it is important to state your hypotheses in writing before you test them, to avoid “HARKing” (Hypothesis After the Results are Known; aka “Noise Mining”, “P-Hacking”, or “Procedural Overfitting”). I dearly hope that Registered Reports will catch on, and absolve us from these all too human biases.

In conclusion, EuroScipy 2017 was a ton of fun, and educational in many ways that I did not expect. If you are a scientific programmer, or if you maintain a scientific Python module, or if you are plain interested in scientific Python, I can highly recommend going to EuroScipy next year.

Older posts