+ - 0:00:00
Notes for current slide
Notes for next slide

Setting the stage

Imagine that you are a new professor, just starting getting a group and research programme going... or solo researcher or a small research group starts a study to have data for their PhD students, but have limited funds and technical expertise.

  • Or, you are a small startup company trying get investment and build income quickly... in the research realm so need to follow best practices/requirements for data management... relies on data collection for business. Needs to get operational quickly, but doesn't yet have funds to hire technical personnel.

  • Or, you are a large, multi-national/center consortium that wants to keep better track of who's working on what, and how to discover and share data added to the project... or has an aim of widely disseminating their data for maximal, and cost-effective, use by their collaborators and others.

All of these could use the framework to abide by the best practices in FAIR data management.

A framework for an open and scalable infrastructure for health data exemplified by the DD2 initiative

Setting the stage

Imagine that you are a new professor, just starting getting a group and research programme going... or solo researcher or a small research group starts a study to have data for their PhD students, but have limited funds and technical expertise.

  • Or, you are a small startup company trying get investment and build income quickly... in the research realm so need to follow best practices/requirements for data management... relies on data collection for business. Needs to get operational quickly, but doesn't yet have funds to hire technical personnel.

  • Or, you are a large, multi-national/center consortium that wants to keep better track of who's working on what, and how to discover and share data added to the project... or has an aim of widely disseminating their data for maximal, and cost-effective, use by their collaborators and others.

All of these could use the framework to abide by the best practices in FAIR data management.

Data Infrastructure Framework (DIF) Project

We're still working out a better name, but for now we're calling it DIF

These aims may seem vague, but bare with me.

Data Infrastructure Framework (DIF) Project

  1. Primary aim: Create and implement an efficient, scalable, and open source data infrastructure framework that connects data collectors, researchers, clinicians, and other stakeholders, with the data, documentation, and findings (starting within the DD2 study)

We're still working out a better name, but for now we're calling it DIF

These aims may seem vague, but bare with me.

Data Infrastructure Framework (DIF) Project

  1. Primary aim: Create and implement an efficient, scalable, and open source data infrastructure framework that connects data collectors, researchers, clinicians, and other stakeholders, with the data, documentation, and findings (starting within the DD2 study)

  2. Secondary aim: Create this framework so that other research groups and companies, who are unable or can't build something similar, can relatively easily implement it and modify as needed for their own purposes.

We're still working out a better name, but for now we're calling it DIF

These aims may seem vague, but bare with me.

Data Infrastructure Framework (DIF) Project

  1. Primary aim: Create and implement an efficient, scalable, and open source data infrastructure framework that connects data collectors, researchers, clinicians, and other stakeholders, with the data, documentation, and findings (starting within the DD2 study)

  2. Secondary aim: Create this framework so that other research groups and companies, who are unable or can't build something similar, can relatively easily implement it and modify as needed for their own purposes.

In short: Make a software product that makes it easier to find, store, and use data for research projects that abide by best practices, and make it so that it is easy and free to use for others.

We're still working out a better name, but for now we're calling it DIF

These aims may seem vague, but bare with me.

Again, these might not be really tangible to grasp what this actually means.

Why is this important? 🤔

Large trends across science in computing, data quantity, accountability, transparency

Increasing need in science for...

  • Computational tools and technologies
  • Secure and reliable IT infrastructure
  • Greater openness and transparency
  • More reproducibility of studies
  • Highly technical skills and knowledge ... especially in relation to data management.

Questions like:

  • How do store your data? In what file format?
  • Where do you store your data and how do you name the files?
  • How do you keep track of changes to the data?
  • (For multi-center studies) Who has which datasets and how do you combine them together?
  • How do you or your collaborators find out what variables there are in the data, what do they mean?
  • When there are errors or problems in your data, and you've already published with or analyzed on it, how can you easily determine which publications used the in correct data and how can you easily update the publications with the correct data?
  • How can you easily share your data with colleagues or reviewers to check your findings?

Past and current barriers : Lack of funding, awareness, understanding, skill, and knowledge

  • Funding agencies don't fully recognize these challenges, so don't provide funding
  • Researchers aren't aware of or understand the issues, or don't have skills to tackle them
  • People with needed technical skills leave for industry

Recent new funding 💰: NNF Data Science Research Infrastructure 5 year grant

Which lead to this DIF Project and getting the funding for it 🤩

Development of new ... methods and technologies within data science, ..., data engineering, ...

Guiding principles

  1. Follow and enable FAIR principles

  2. Openly licensed and re-usable (e.g. CC-BY, MIT)

  3. State-of-the-art principles and tools in software and UI design

  4. Friendly to beginner and non-technical users

What similar infrastructures exist?

Found in most large companies, some research based ones (UK Biobank)...

... but few have the product be the infrastructure itself

Show it off?

One plan is to do as much of a search as possible for similar projects. Unlike scientific papers, it's not as easy to find software projects.

We know of two similar projects, one in Oslo related to a brain mapping project and another in the US called gen3 that's managed by the University of Chicago. Depending on how they fit our needs and aims, we might "fork" their projects and contribute back to them. (Explain "forking").

Short-term plan

Full 5 year timeline found on website.

  • Hire software/data engineers and build team as soon as possible

  • Developing "Minimum Viable Product" of first component within ~2 years

  • Emphasize making training and documentation targeted to non-technical users throughout project

Interested in being involved or learning more? 🤓 Let us know! 🙋

Setting the stage

Imagine that you are a new professor, just starting getting a group and research programme going... or solo researcher or a small research group starts a study to have data for their PhD students, but have limited funds and technical expertise.

  • Or, you are a small startup company trying get investment and build income quickly... in the research realm so need to follow best practices/requirements for data management... relies on data collection for business. Needs to get operational quickly, but doesn't yet have funds to hire technical personnel.

  • Or, you are a large, multi-national/center consortium that wants to keep better track of who's working on what, and how to discover and share data added to the project... or has an aim of widely disseminating their data for maximal, and cost-effective, use by their collaborators and others.

All of these could use the framework to abide by the best practices in FAIR data management.

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow