Posts tagged "workflow":
Assembling scientific documents is a complex task. My documents are a combination of graphs, data, and text, written in LaTeX. This post is about combining these elements, keeping them up to date, while not losing your mind. My techniques work on any Unix system on Linux, macOS, or the WSL.
For engineering or science work, my deliverables are PDFs, typically rendered from LaTeX. But writing LaTeX is not the most pleasant of writing environments. So I've tried my hand at org-mode and Markdown, compiled them to LaTeX, and then to PDF. In general, this worked well, but there always came a point where the abstraction broke, and the LaTeX leaked up the stack into my document. At which point I'd essentially write LaTeX anyway, just with a different syntax. After a few years of this, I decided to cut the middle-man, bite the bullet, and just write LaTeX.
That said, modern LaTeX is not so bad any more: XeLaTeX supports normal OpenType fonts, mixed languages, proper unicode, and natively renders to PDF. It also renders pretty quickly. My entire dissertation renders in less than three seconds, which is plenty fast enough for me.
To render, I run a simple makefile in an infinite loop that recompiles my PDF whenever the TeX source changes, giving live feedback while writing:
diss.pdf: diss.tex makefile $(graph_pdfs) xelatex -interaction nonstopmode diss.tex
We'll get back to
$(graph_pdfs) in a second.
A major challenge in writing a technical document is keeping all the source data in sync with the document. To make sure that all graphs are up to date, I plug them into the same makefile as above, but with a twist: All my graphs are created from Python scripts of the same name in the
But you don't want to simply execute all the scripts in
graphs, as some of them might be shared dependencies that do not produce PDFs. So instead, I only execute scripts that start with a chapter number, which conveniently sorts them by chapter in the file manager, as well.
Thus all graphs render into the main PDF and update automatically, just like the main document:
graph_sources = $(shell find graphs -regex "graphs/[0-9]-.*\.py") graph_pdfs = $(patsubst %.py,%.pdf,$(graph_sources)) graphs/%.pdf: graphs/%.py cd graphs; .venv/bin/python $(notdir $<)
The first two lines build a list of all graph scripts in the
graphs directory, and their matching PDFs. The last two lines are a makefile recipy that compiles any graph script into a PDF, using the virtualenv in
graphs/.venv/. How elegant these makefiles are, with recipe definitions independent of targets.
This system is surprisingly flexible, and absolutely trivial to debug. For example, I sometimes use those graph scripts as glorified shell scripts, for converting an SVG to PDF with Inkscape or some similar task. Or I compile some intermediate data before actually building the graph, and cache them for later use. Just make sure to set an appropriate exit code in the graph script, to signal to the makefile whether the graph was successfully created. An additional makefile target
graphs: $(graph_pdfs) can also come in handy if you want ignore the TeX side of things for a bit.
All of the graph scripts and TeX are of course backed by a Git repository. But my dissertation also contains a number of databases that are far too big for Git. Instead, I rely on git-annex to synchronize data across machines from a simple webdav host.
To set up a new writing environment from scratch, all I need is the following series of commands:
git clone git://mygitserver/dissertation.git dissertation cd dissertation git annex init env WEBDAV_USERNAME=xxx WEBDAV_PASSWORD=yyy git annex enableremote mywebdavserver git annex copy --from mywebdavserver (cd graphs; pipenv install) make all
This will download my graphs and text from
mygitserver, download my databases from
mywebdavserver, build my Python environment with
pipenv, recreate all the graph PDFs, and compile the TeX. A process that can take a few hours, but is completely automated and reliable.
And that is truly the key part; The last thing you want to do while writing is being distracted by technical issues such as "where did I put that database again?", "didn't that graph show something different the other day?", or "I forgot to my database file at work and now I'm stuck at home during the pandemic and can't progress". Not that any of those would have ever happened to me, of course.
I want to consume the news, both because it is genuinely relevant for my work, and because conversations about news are part of my social life. But I do not want to be consumed by news, and end up scanning news websites over and over for new content, even though you know that the likelihood of finding anything interesting is small.
Over the last few months, I have tried hard to find all instances of this repeated-scanning behavior, and eliminate it. The key is to automate the scanning such that I am only ever presented with new content, but do not get hooked on the addictive variable-reward cycle of checking websites for changes over and over again.
And it all works thank to the magic of RSS:
- News Sources: I read several blogs, newspapers, and webcomics. All of them have RSS feeds. Easy.
- Hacker News: The brilliant service hnrss.org provides RSS feeds for Hacker News, and filters them to for example only include posts that made it to the front page, and have accumulated at least 100 points.
- Reddit: Every subreddit has its own feed, at reddit.com/r/subreddit.rss. Sadly there is no way to filter for a minimum number of upvotes.
- YouTube: Again, every YouTube channel has its own RSS feed, but Google is trying very hard to make it as cumbersome as possible to get at those feeds. You need to go to your Subscription Manager, then scroll all the way down, and "Export Subscriptions". The resulting file helpfully does not have a file extension, which you will have to add before you can import it into your RSS reader. I honestly can't reconstruct how I found that subscription manager, either, but presumably there is some series of clicks that would take you there.
With all this settled, I have a veritable firehose of news every day. I estimate that only 1 % of this is actually worth reading. So in the next step, I filter this list for spam. For this purpose, I use Feedbin, which aggegates all these feeds, and remembers whether I have read an article. The remaining ham I either read immediately, or forward it to Pinboard for later consumption.
With this system, I never miss anything, but once I consume all the news in my feed reader, I know I am done, and there is no point in checking and re-checking various websites over and over again.