01 Sep 2017

EuroScipy 2017

The second conference I attended this year was the EuroScipy 2017 in Erlangen. I gave a Talk about Audio in Python and a Lightning Talk about my Python/Matlab bridge.

My most striking impression of EuroScipy is that every person I talked to was working on something interesting, and could talk about his/her topic clearly and with enthusiasm. This mirrors my feelings from last year's Chaos Communication Congress, where the short scientific section stood out for its clarity and passion. I also enjoyed the fact that attendees were international and diverse, and exuded a heart-warming sense of community.

Even though each scientific discipline has their own data sets, features, and models, everyone seemed to use a common set of methods (statistics, signal processing, machine learning) for working with that data. And, absolutely everybody used Jupyter Notebooks for tutorials and teaching, and almost all data analyses were done in Pandas. This is particularly heartening since these technologies are geared towards reproducible research and open data.

The hot topic of the conference clearly was machine learning and neural networks. However, the current confusion of competing frameworks and network architectures does not seem to be a good long-term solution. I hope that this ecosystem will eventually reach its NumPy moment, and collapse into a single, unified package. Then, neural networks might find their place as just another machine learning method with a few reusable parametrized prototype implementations in scikit-learn. Tools like Keras look like good steps towards this goal.

Finally, there was a lot of talk about “the reproducibility crisis” in science, and possible steps to improve the scientific process. In particular, I learned that it is absolutely necessary to not look at your test data before the final evaluation, to avoid overfitting your brain. You need to split your data into a development set for training, a validation set for parameter tuning, and a totally separate evaluation set for the final evaluation. In a similar way, it is important to state your hypotheses in writing before you test them, to avoid “HARKing” (Hypothesis After the Results are Known; aka “Noise Mining”, “P-Hacking”, or “Procedural Overfitting”). I dearly hope that Registered Reports will catch on, and absolve us from these all too human biases.

In conclusion, EuroScipy 2017 was a ton of fun, and educational in many ways that I did not expect. If you are a scientific programmer, or if you maintain a scientific Python module, or if you are plain interested in scientific Python, I can highly recommend going to EuroScipy next year.