The forgotten child in research: Data engineering and infrastructure

Luke W. Johnston

September 13, 2023

The forgotten child in research: Data engineering and infrastructure Luke W. Johnston September 13, 2023

The forgotten child in research: Data engineering and infrastructure
Some questions before starting 🤔
🙋 How many of you have worked with or tried to discover data for projects?
🙋 How many of you wish you spent less time on menial/manual tasks related to data?
Who am I? 👋
Two main goals of this (informal) talk 🔈
Spreading awareness…
The stages and lifecycle of research are like a big family
Unmet and (often) unaware basic needs in health research
Personal past experiences: Data management tasks often given to unskilled MSc/PhD students
In research organizations: Focused on beginning and end of lifecycle (collecting data and publishing), not the middle
In small- to medium-sized companies: Don’t have maturity and/or funds to have internal data engineering team
NovoNordiskFonden: Increase impact of funding by expanding use of data from funded projects
Many, substantial negative effects of this unmet need
Examples often distill down to wasted time and money
Limited options and solutions for data infrastructure within research world
… they are often custom-built
… they are often designed for industry, expensive, or “over-engineered”
… they are often heavy on the tech jargon
Our solution: A framework for building a modern data infrastructure
Seedcase: Improving discoverability, structure, and management of research data
Designed for typical use cases of doing research
Central philosophies and value
Who are we: The team
Future steps: Ensuring financial sustainability
Creating a company around Seedcase

The forgotten child in research: Data engineering and infrastructure

Some questions before starting 🤔

🙋 How many of you have worked with or tried to discover data for projects?

Who am I? 👋

Two main goals of this (informal) talk 🔈

Spreading awareness…

… on the vital importance of the foundation of our data-driven world

… that innovation and commercialization can come from anywhere

The stages and lifecycle of research are like a big family

Unmet and (often) unaware basic needs in health research

Personal past experiences: Data management tasks often given to unskilled MSc/PhD students

In research organizations: Focused on beginning and end of lifecycle (collecting data and publishing), not the middle

In small- to medium-sized companies: Don’t have maturity and/or funds to have internal data engineering team

NovoNordiskFonden: Increase impact of funding by expanding use of data from funded projects

Many, substantial negative effects of this unmet need

Examples often distill down to wasted time and money

Limited options and solutions for data infrastructure within research world

… they are often custom-built

… they are often designed for industry, expensive, or “over-engineered”

… they are often heavy on the tech jargon

Our solution: A framework for building a modern data infrastructure

Seedcase: Improving discoverability, structure, and management of research data

Designed for typical use cases of doing research

Central philosophies and value

Who are we: The team

Future steps: Ensuring financial sustainability

Creating a company around Seedcase