Implementing an open and scalable infrastructure for the DD2 data

class: center, middle, inverse, title-slide

.title[
# Implementing an open and scalable infrastructure for the DD2 data
]

---

layout: true

.footer-right[
Website: [steno-aarhus.github.io/dif-project](https://steno-aarhus.github.io/dif-project)
Slides: [slides.lwjohnst.com/misc/2022-08-15](https://slides.lwjohnst.com/misc/2022-08-15/)
]

---

<div>
<style type="text/css">.xaringan-extra-logo {
width: 60px;
height: 128px;
z-index: 0;
background-image: url(../../common/sdca-logo.png);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.title-slide):not(.inverse):not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'https://www.stenoaarhus.dk/'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

???

<!--
Details:

- ~20 min presentation
- Specific to DD2
- Informal, ask questions
- More discussion based

Outline:

- Aims
- General description of DIF (image)
- General timeline
- Short-term plan
- Questions
    - In discussion with company to help, Jens give details?
        - How can we fit in with those plans?
    - Timeline for hiring data manager?
        - Data manager to map all data, resources, and documentation
-->

I assume you all are familiar enough with the general purpose of this project,
that is, to help maximize the utility and general usage of the DD2 resource for
researchers and eventually for diabetes patients. So I'll get into the aims
right away, briefly describe what this project looks like from a conceptual level,
go briefly over the timeline, and where we are right now and the next immediate steps.
Then I have a couple questions that I'd like us to discuss a bit.

We'll keep this really informal, so just jump in and ask questions whenever you want.

---

## Aims of the Data Infrastructure Framework (DIF) Project

???

We're still working out a better name, but for now we're calling it DIF

These aims are for the full project itself, and may seem vague, but bare with me.

1. **Primary aim**: Create and implement an efficient, scalable, and open source
data infrastructure framework that connects multiple stakeholders with the data,
documentation, and findings

???

Just for some clarification, infrastructure here meaning the computational
structure of the data and all its support structures, for instance, how the files
and folders are structured, where the data files are saved and what file format,
how to connect to data. In many ways like the roads and buildings of a city,
where data is the people moving about.

"Framework" on the other hand is the bundle or package that contains the
instructions to create an infrastructure, that someone can take and use to
create the infrastructure somewhere else. You can think of this as the blueprint
for building a city.

2. **Secondary aim**: Create this framework so that *other research groups and
companies*, who are unable or can't build something similar, can relatively
easily implement it and modify as needed for their own purposes.

> In short: Make a software product that makes it easier to find, store, and
use data for research projects that abide by best practices, and make it so
that it is easy and free to use for others.

---

## <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#da9100;overflow:visible;position:relative;"><path d="M288 256C288 273.7 273.7 288 256 288C238.3 288 224 273.7 224 256C224 238.3 238.3 224 256 224C273.7 224 288 238.3 288 256zM0 256C0 114.6 114.6 0 256 0C397.4 0 512 114.6 512 256C512 397.4 397.4 512 256 512C114.6 512 0 397.4 0 256zM325.1 306.7L380.6 162.4C388.1 142.1 369 123.9 349.6 131.4L205.3 186.9C196.8 190.1 190.1 196.8 186.9 205.3L131.4 349.6C123.9 369 142.1 388.1 162.4 380.6L306.7 325.1C315.2 321.9 321.9 315.2 325.1 306.7V306.7z"/></svg> Guiding principles

1. Follow and enable FAIR principles

2. Openly licensed and re-usable (e.g. CC-BY, MIT)

3. State-of-the-art principles and tools in software and UI design

4. Built from software that may be more familiar to researchers/academia

5. Friendly to beginner and non-technical users

???

FAIR = Findable Accessible Interoperable Reusable

---

---

---

---

---

## General timeline

> [Full 5 year timeline found on website.](https://steno-aarhus.github.io/dif-project/#fig-gantt-chart)

???

By User 1 I mean any process to get the data into the format needed for the backend.
I'm aware there are already well developed pipelines for getting DD2 data into a
database format, right? So we'll need some way of connecting a pipeline to feed
into the DIF's backend.

This timeline was stretched to account for various potential delays, so estimates
could realistically be shortened by maybe 30-40%... but best to keep conservative.

Once we've gotten to the User 1 MVP, we'd like to start testing it out on DD2, to
find any potential issues and so on. So within the next year, year and a half
we could begin meaningfully contributing to the DD2 database.

---

## Next steps

- Already hired RSE and DBA starting Sept

- Onboard the team, have two-day welcome and brainstorming session in Sept
    - Detail and agree on tasks for next several months
    - Agree on longer term plan

- Aim for "Minimum Viable Product" of first component within ~1 to 1.5 years

---

## General questions on next steps

???

Thinking of questions for next steps, some big, immediate ones that come to mind
are:

- Company helping out with this?
    - How can we fit in with those plans?

- Co-hiring data manager for DD2 and this project, any timelines or things to
discuss/consider?

???

Data manager to map all data, resources, and documentation.