We'll go through this outline in this order, not expecting to cover them all.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
NNF-funded Data Infrastructure Framework (DIF) Project
Reproducible Research in R hands-on courses with Danish Diabetes Academy
We'll go through this outline in this order, not expecting to cover them all.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Setting the stage
Imagine that you are a new professor, just starting getting a group and research programme going... or solo researcher or a small research group starts a study to have data for their PhD students, but have limited funds and technical expertise.
Or, you are a small startup company trying get investment and build income quickly... in the research realm so need to follow best practices/requirements for data management... relies on data collection for business. Needs to get operational quickly, but doesn't yet have funds to hire technical personnel.
Or, you are a large, multi-national/center consortium that wants to keep better track of who's working on what, and how to discover and share data added to the project... or has an aim of widely disseminating their data for maximal, and cost-effective, use by their collaborators and others.
All of these could use the framework to abide by the best practices in FAIR data management.
We're still working out a better name, but for now we're calling it DIF
These aims may seem vague, but bare with me.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Check out DIF Project Website for more details.
We're still working out a better name, but for now we're calling it DIF
These aims may seem vague, but bare with me.
Just for some clarification, infrastructure here meaning the computational structure of the data and all its support structures, for instance, how the files and folders are structured, where the data files are saved and what file format, how to connect to data. In many ways like the roads and buildings of a city, where data is the people moving about.
"Framework" on the other hand is the bundle or package that contains the instructions to create an infrastructure, that someone can take and use to create the infrastructure somewhere else. You can think of this as the blueprint for building a city.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Check out DIF Project Website for more details.
We're still working out a better name, but for now we're calling it DIF
These aims may seem vague, but bare with me.
Just for some clarification, infrastructure here meaning the computational structure of the data and all its support structures, for instance, how the files and folders are structured, where the data files are saved and what file format, how to connect to data. In many ways like the roads and buildings of a city, where data is the people moving about.
"Framework" on the other hand is the bundle or package that contains the instructions to create an infrastructure, that someone can take and use to create the infrastructure somewhere else. You can think of this as the blueprint for building a city.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Check out DIF Project Website for more details.
In short: Make a software product that makes it easier to find, store, and use data for research projects that abide by best practices, and make it so that it is easy and free to use for others.
We're still working out a better name, but for now we're calling it DIF
These aims may seem vague, but bare with me.
Just for some clarification, infrastructure here meaning the computational structure of the data and all its support structures, for instance, how the files and folders are structured, where the data files are saved and what file format, how to connect to data. In many ways like the roads and buildings of a city, where data is the people moving about.
"Framework" on the other hand is the bundle or package that contains the instructions to create an infrastructure, that someone can take and use to create the infrastructure somewhere else. You can think of this as the blueprint for building a city.
Again, these might not be really tangible to grasp what this actually means.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Large trends across science in computing, data quantity, accountability, transparency
Increasing need in science for...
Questions like:
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Which lead to this DIF Project and getting the funding for it 🤩
Development of new ... methods and technologies within data science, ..., data engineering, ...
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Follow and enable FAIR principles
Openly licensed and re-usable (e.g. CC-BY, MIT)
State-of-the-art principles and tools in software and UI design
Built from software that may be more familiar to researchers/academia
Friendly to beginner and non-technical users
FAIR = Findable Accessible Interoperable Reusable
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Check out DIF Project Website.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
There are few studies on the extent of code and data availability, and whether study results can be reproduced. Figure shows results of some of them: 1) 10.1177/2515245920918872, 2) 10.1007/s11306-017-1299-3, 3) 10.1371/journal.pone.0251194.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
While I've been teaching these general topics since my Masters, this course specifically I started during my postdoc because one, there was a need for more computational skills in my field and two, because the awareness around reproducibility and open science was very lacking.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
The course is teaching reproducible research in R to PhD students and postdocs who do biomedical research, largely diabetes research. Participants are working/full-time researchers (including PhD students), not necessarily in an undergraduate context and related to learning data analysis or more practical type skills.
This course is 3 full days, composing of 5 code along sessions where the instructor types and the learners follow along, a few lectures, and a final group project. For more info on the course, check out the links below.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
Multiple activities to learning in class (reading, doing, listening, discussing, teaching, group, and solo)
Openly licensed and easily accessible online
Written not just for participants but also (future) instructors
Largely hands-on (code-along), limit lectures and slides
Briefly discuss before showing website.
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
And to end, please, if you try out the material, lets us know! We'd love more feedback on it! Thanks for listening!
Slides: slides.lwjohnst.com/misc/2022-08-17 Licensed under CC-BY
NNF-funded Data Infrastructure Framework (DIF) Project
Reproducible Research in R hands-on courses with Danish Diabetes Academy
We'll go through this outline in this order, not expecting to cover them all.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |