class: center, middle, inverse, title-slide .title[ # Implementing an open and scalable infrastructure for the DD2 data ] --- layout: true <style type="text/css"> .footer-right { background-color: #FFFFFF; position: absolute; bottom: 10px; right: 8px; height: 60px; width: 30%; font-size: 11pt; } </style> .footer-right[ Website: [steno-aarhus.github.io/dif-project](https://steno-aarhus.github.io/dif-project) Slides: [slides.lwjohnst.com/misc/2022-08-15](https://slides.lwjohnst.com/misc/2022-08-15/) ] ---
??? <!-- Details: - ~20 min presentation - Specific to DD2 - Informal, ask questions - More discussion based Outline: - Aims - General description of DIF (image) - General timeline - Short-term plan - Questions - In discussion with company to help, Jens give details? - How can we fit in with those plans? - Timeline for hiring data manager? - Data manager to map all data, resources, and documentation --> I assume you all are familiar enough with the general purpose of this project, that is, to help maximize the utility and general usage of the DD2 resource for researchers and eventually for diabetes patients. So I'll get into the aims right away, briefly describe what this project looks like from a conceptual level, go briefly over the timeline, and where we are right now and the next immediate steps. Then I have a couple questions that I'd like us to discuss a bit. We'll keep this really informal, so just jump in and ask questions whenever you want. --- ## Aims of the Data Infrastructure Framework (DIF) Project ??? We're still working out a better name, but for now we're calling it DIF These aims are for the full project itself, and may seem vague, but bare with me. -- 1. **Primary aim**: Create and implement an efficient, scalable, and open source data infrastructure framework that connects multiple stakeholders with the data, documentation, and findings ??? Just for some clarification, infrastructure here meaning the computational structure of the data and all its support structures, for instance, how the files and folders are structured, where the data files are saved and what file format, how to connect to data. In many ways like the roads and buildings of a city, where data is the people moving about. "Framework" on the other hand is the bundle or package that contains the instructions to create an infrastructure, that someone can take and use to create the infrastructure somewhere else. You can think of this as the blueprint for building a city. -- 2. **Secondary aim**: Create this framework so that *other research groups and companies*, who are unable or can't build something similar, can relatively easily implement it and modify as needed for their own purposes. -- > In short: Make a software product that makes it easier to find, store, and use data for research projects that abide by best practices, and make it so that it is easy and free to use for others. --- ##
Guiding principles 1. Follow and enable FAIR principles 2. Openly licensed and re-usable (e.g. CC-BY, MIT) 3. State-of-the-art principles and tools in software and UI design 4. Built from software that may be more familiar to researchers/academia 5. Friendly to beginner and non-technical users ??? FAIR = Findable Accessible Interoperable Reusable --- <img src="images/detailed-schematic.png" width="58%" style="display: block; margin: auto;" /> --- <img src="images/layers.png" width="58%" style="display: block; margin: auto;" /> --- <img src="images/user-1.png" width="58%" style="display: block; margin: auto;" /> --- <img src="images/user-2.png" width="58%" style="display: block; margin: auto;" /> --- ## General timeline > [Full 5 year timeline found on website.](https://steno-aarhus.github.io/dif-project/#fig-gantt-chart) ??? By User 1 I mean any process to get the data into the format needed for the backend. I'm aware there are already well developed pipelines for getting DD2 data into a database format, right? So we'll need some way of connecting a pipeline to feed into the DIF's backend. This timeline was stretched to account for various potential delays, so estimates could realistically be shortened by maybe 30-40%... but best to keep conservative. Once we've gotten to the User 1 MVP, we'd like to start testing it out on DD2, to find any potential issues and so on. So within the next year, year and a half we could begin meaningfully contributing to the DD2 database. --- ## Next steps - Already hired RSE and DBA starting Sept - Onboard the team, have two-day welcome and brainstorming session in Sept - Detail and agree on tasks for next several months - Agree on longer term plan - Aim for "Minimum Viable Product" of first component within ~1 to 1.5 years --- ## General questions on next steps ??? Thinking of questions for next steps, some big, immediate ones that come to mind are: -- - Company helping out with this? - How can we fit in with those plans? - Co-hiring data manager for DD2 and this project, any timelines or things to discuss/consider? ??? Data manager to map all data, resources, and documentation.