+ - 0:00:00
Notes for current slide
Notes for next slide

Workshop on reproducibility and open science

Luke W. Johnston

Outline

  • Learning objectives

  • Main take-home message

  • Introduce and describe reproducibility and open science

  • Activities to discuss and brainstorm (~40 min)

  • Resources

Learning objectives of this workshop:

  1. Become aware of what reproducible and open practices are

Learning objectives of this workshop:

  1. Become aware of what reproducible and open practices are

  2. Hear and learn about what the experiences and thoughts of your colleagues are

Learning objectives of this workshop:

  1. Become aware of what reproducible and open practices are

  2. Hear and learn about what the experiences and thoughts of your colleagues are

  3. Learn about some simple ways to become more open and reproducible

Learning objectives of this workshop:

  1. Become aware of what reproducible and open practices are

  2. Hear and learn about what the experiences and thoughts of your colleagues are

  3. Learn about some simple ways to become more open and reproducible

  4. Identify ways to adopt some of these practices at NCRR (1)

(1): NCRR and any work on DST has unique challenges.

Take-home message:

Publicly share your code from your research

And I will add, share you code at any stage of your research. From beginning, to end, to middle.

Why are reproducible and open practices important right now? 🤔

They are part of multiple large trends like team science, computing, meta-research, higher quality/rigor

(Raise of hands) How many could describe the difference between reproducibility and replicability?

Reproducibility: Same data + analysis = same result?

  • Also called analytic/computational reproducibility or reproducible data analysis.
  • Independently reproduce results in paper based on same analysis and data.

Reproducibility is a spectrum

Just as much about inspectability as it is about actual reproducibility.

Part of this workshop and a lot of my teaching and work is to get that us as a community to move closer to this end.

Reproducibility is about HOW EXACTLY a finding was found in a study.

6 min activity: Think 💭about your research workflow, then share

  • For 4 min, to yourself, think what your workflow is exactly like for doing research. As you think of what you do, add them to the Mentimeter, until the time runs out.

    • Where do you save your files? How do you name your files?
    • What apps do you use?
    • How do you know which files to work on and where you left off?
    • How do you keep track of tasks to do?
    • How do you collaborate and coordinate with others? Email? Shared folders?
  • For 2 min, as the whole group we'll briefly go over some of the workflow items.

This helps to get us thinking and mentally primed for later activities.

Reproducibility as a spectrum

Reproducibility as a spectrum

+ One folder per manuscript (with associated files)
+ Relative file paths used (data/project-data.csv vs C:/User1/some-data-file.csv)
+ Version controlled (like with Git)
+ Automated and explicit pipeline management (re-generate results with single command)
+ Reproducible document system

  • I wanted you to think about your workflow so you can start appreciating how even small things can be changed to improve reproducibility.

  • And if more reproducible, if things change, its easier to update later work with literally a push of a button (or single command)

  • Some things are easier to do than others, and some require more technical knowledge and skill than most researchers have the time or motivation to acquire. (More about encouraging these skills).

  • But, we're missing key component here...

Key component: Using the same data and analysis to independently get the same result

If you were to give me your code and I had access to your data, would I know how you got any given result presented in your paper?

Open science occurs at all stages of research

Components of open science are in all stages of research.

Open science is also a spectrum

+ Open access (like preprints)
+ Open protocol
+ Open data/data format
+ Open analysis plan/code
+ Open source (like software used) (1)

(1): Example, AU institutionally approves and supports a closed source software (Stata), reducing reproducibility.

Focus down to reproducibility side (data, code).

E.g. AU has institutionally approved a closed source statistical software (Stata), which by definition, makes work less open and reproducible (someone else needs a Stata license to run your software), unlike open source software like R where anyone can install and run it.

5 min activity: Think 💭 and share about potential benefits

What might be some benefits to being more reproducible and open at NCRR at the organizational, group, and/or individual level (from the trainee to those more established)?

  • For 1 min, to yourself, think about the question.

  • For 2 min, discuss with your neighbour what you've thought about.

  • For 2 min, we'll have a group-wide sharing of some of the thoughts.

5 min activity: Think 💭 and share about potential barriers

What might be some barriers to being more reproducible and open at NCRR at the organizational, group, and/or individual level (from knowledge and technical capacity, to what gets supported and what doesn't)?

  • For 1 min, to yourself, think about the question.

  • For 2 min, discuss with your neighbour what you've thought about.

  • For 2 min, we'll have a group-wide sharing of some of the thoughts.

Possible social and technical actions to moving toward openness and reproducibility

Possible social and technical actions to moving toward openness and reproducibility

  • Do code reviews, pair programming/analysis

  • Reviewing (pre-analysis)

    • Data cleaning plan
    • Data analysis plan
  • Discuss with DST to support reproducible/open systems (within law)

Possible social and technical actions to moving toward openness and reproducibility

  • Do code reviews, pair programming/analysis

  • Reviewing (pre-analysis)

    • Data cleaning plan
    • Data analysis plan
  • Discuss with DST to support reproducible/open systems (within law)

  • Publicly link your code with your research output using GitHub and Zenodo

  • Decide on standard folder and file structure for each project/manuscript

Now that we're getting more into thinking about benefits and barriers, we can get into thinking about potential actions to take to be more reproducible and open. Before getting into brainstorming activities, I want to give you some ideas for actions to help prime your thinking.

  • Work with/pressure DST to make it easier to download code from server to eventually share (something we'll start doing more at SDCA)

10 min activity: Think 💭, brainstorm, then share short-term, easier options

What are some potential options that you, your group, and NCRR could implement relatively easily and within the short-term (<1 year) to be more open and reproducible?

  • For 2 min, think about ideas to this question (and write down if you want to)

  • For 3 min, brainstorm with your neighbour(s) on what you've all thought about

  • For 2 min, decide on one person to write down some of the ideas in the Mentimeter

  • For 3 min, we'll briefly go over and discuss some ideas from the Mentimeter

10 min activity: Think 💭, brainstorm, then share long-term, more difficult options

What are some potential options that you, your group, and NCRR could implement that are more difficult and within the longer-term (>1 years) to be more open and reproducible?

  • For 2 min, think about ideas to this question (and write down if you want to)

  • For 3 min, brainstorm with your neighbour(s) on what you've all thought about

  • For 2 min, decide on one person to write down some of the ideas in the Mentimeter

  • For 3 min, we'll briefly go over and discuss some ideas from the Mentimeter

Resources

Outline

  • Learning objectives

  • Main take-home message

  • Introduce and describe reproducibility and open science

  • Activities to discuss and brainstorm (~40 min)

  • Resources

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow