October 30, 2025
Most important: Reminder that publications aren’t your only or biggest metric of success
Share a bit of my own career path
Highlight gaps that you can specialise in for your own career
In addition to my research…
PROMISE data: Organized data into an R package structure.
UofTCoders: Graduate student-led group to teach R and Python to other peers.
We are woefully behind on basic data engineering, reproducibility, and programming practices.
ukbAid: R package and website for UK Biobank data management and analysis.
DARTER Project: Website of application to and documentation on a Denmark Statistics project.
registers2parquet: Package to convert Danish health registries into Parquet format for easier use with big data tools.
Introductory course: r-cubed-intro.rostools.org
Intermediate course: r-cubed-intermediate.rostools.org
Advanced course: r-cubed-advanced.rostools.org
After my PhD, no first author publications.
Too many deep problems that needed fixing.
Focused on building software, tools, and teaching.
It wasn’t a hard choice, I couldn’t ignore the problems.
For Data Infrastructure grant from Novo Nordisk Foundation
Seedcase Project, a framework for building infrastructure for research data.
Why? We have big problems that need fixing!


Some trust is needed, but shouldn’t be dependent on it.
As in, their main work is programming, not publishing papers.
“But the code runs!”
From this academic presentation.
With massive data, you really need to know how to code and programming.
But, researchers are not trained for this kind of skill.
So many research-specific problems that could be fixed with software.
Data is hard to build, manage, and maintain. It takes time and effort.
And they clear even less about publications.
Even if that means less publications.
Licensed under CC-BY 4.0.
Slides at slides.lwjohnst.com