class: center, middle, inverse, title-slide # Workshop on reproducibility and open science ### Luke W. Johnston --- layout: true <style type="text/css"> .footer-left { background-color: #FFFFFF; position: absolute; bottom: 8px; left: 40px; height: 60px; width: 30%; font-size: 12pt; } </style> .footer-left[ Slides: [slides.lwjohnst.com/au/2022-04-25](https://slides.lwjohnst.com/au/2022-04-25/) Licensed under CC-BY
] ---
<!-- 70 min: 15 + 6 (activity) = 21 + 10 = 31 + 5*2 (activity) = 41 + 5 = 46 + 10*2 (activity) = 66 - 2 * 10 min activity - 2 * Introduce myself and my work. - My work and R courses? - DDA - I know that several people from NCRR have taken my R course --> ##
Outline - Learning objectives - Main take-home message - Introduce and describe reproducibility and open science - Activities to discuss and brainstorm (~40 min) - Resources --- ## Learning objectives of this workshop: 1. Become aware of what reproducible and open practices are -- 2. Hear and learn about what the experiences and thoughts of your colleagues are -- 3. Learn about some simple ways to become more open and reproducible -- 4. Identify ways to adopt some of these practices at NCRR (1) .footnote[(1): NCRR and any work on DST has unique challenges.] --- class: middle **Take-home message:** ## Publicly share your code from your research ??? And I will add, share you code at *any* stage of your research. From beginning, to end, to middle. --- class: middle ## Why are reproducible and open practices important right now? 🤔 **They are part of multiple large trends like team science, computing, meta-research, higher quality/rigor** ??? (Raise of hands) How many could describe the difference between reproducibility and replicability? --- ## Reproducibility: Same data + analysis = same result? <img src="images/fig-reproducibility-1.png" width="85%" style="display: block; margin: auto;" /> - Also called analytic/computational reproducibility or reproducible data analysis. - *Independently* reproduce results in paper based on same analysis and data. --- ## Reproducibility is a spectrum <img src="images/spectrum-reproducibility.png" width="80%" style="display: block; margin: auto;" /> .footnote[ Just as much about *inspectability* as it is about *actual* reproducibility. ] ??? Part of this workshop and a lot of my teaching and work is to get that us as a community to move closer to this end. Reproducibility is about HOW EXACTLY a finding was found in a study. --- ## 6 min activity: Think ðŸ’about your research workflow, then share
- For 4 min, *to yourself*, think what your workflow is *exactly* like for doing research. As you think of what you do, add them to the Mentimeter, until the time runs out. - Where do you save your files? How do you name your files? - What apps do you use? - How do you know which files to work on and where you left off? - How do you keep track of tasks to do? - How do you collaborate and coordinate with others? Email? Shared folders? - For 2 min, as the whole group we'll briefly go over some of the workflow items. ??? This helps to get us thinking and mentally primed for later activities. --- ## Reproducibility as a spectrum <img src="images/spectrum-reproducibility.png" width="80%" style="display: block; margin: auto;" /> -- \+ One folder per manuscript (with associated files) \+ Relative file paths used (`data/project-data.csv` vs `C:/User1/some-data-file.csv`) \+ Version controlled (like with Git) \+ Automated and explicit pipeline management (re-generate results with single command) \+ Reproducible document system ??? - I wanted you to think about your workflow so you can start appreciating how even small things can be changed to improve reproducibility. - And if more reproducible, if things change, its easier to update later work with literally a push of a button (or single command) - Some things are easier to do than others, and some require more technical knowledge and skill than most researchers have the time or motivation to acquire. (More about encouraging these skills). - But, we're missing key component here... --- class: middle ## Key
component: Using the same data and analysis to *independently* get the same result ??? If you were to give me your code and I had access to your data, would I know how you got any given result presented in your paper? --- ## Open science occurs at all stages of research <img src="images/fig-open-science-1.png" width="80%" style="display: block; margin: auto;" /> ??? Components of open science are in all stages of research. --- ## Open science is also a spectrum <img src="images/spectrum-open-science.png" width="80%" style="display: block; margin: auto;" /> \+ Open access (like preprints) \+ Open protocol \+ **Open data/data format** \+ **Open analysis plan/code** \+ **Open source (like software used)** (1) .footnote[(1): Example, AU institutionally approves and supports a closed source software (Stata), reducing reproducibility.] ??? Focus down to reproducibility side (data, code). E.g. AU has institutionally approved a closed source statistical software (Stata), which by definition, makes work less open and reproducible (someone else needs a Stata license to run your software), unlike open source software like R where anyone can install and run it. --- ## 5 min activity: Think 💠and share
about potential *benefits* > What might be some ***benefits*** to being more reproducible and open at NCRR at the organizational, group, and/or individual level (from the trainee to those more established)? - For 1 min, *to yourself*, think about the question. - For 2 min, discuss with your neighbour what you've thought about. - For 2 min, we'll have a group-wide sharing of some of the thoughts. --- ## 5 min activity: Think 💠and share
about potential *barriers* > What might be some ***barriers*** to being more reproducible and open at NCRR at the organizational, group, and/or individual level (from knowledge and technical capacity, to what gets supported and what doesn't)? - For 1 min, *to yourself*, think about the question. - For 2 min, discuss with your neighbour what you've thought about. - For 2 min, we'll have a group-wide sharing of some of the thoughts. --- ## Possible social and technical actions to moving toward openness and reproducibility -- .pull-left[ - Do code reviews, pair programming/analysis - Reviewing (pre-analysis) - Data cleaning plan - Data analysis plan - Discuss with DST to support reproducible/open systems (within law) ] -- .pull-left[ - **Publicly link your code with your research output** using GitHub and Zenodo - Decide on standard folder and file structure for each project/manuscript ] ??? Now that we're getting more into thinking about benefits and barriers, we can get into thinking about potential actions to take to be more reproducible and open. Before getting into brainstorming activities, I want to give you some ideas for actions to help prime your thinking. - Work with/pressure DST to make it easier to download code from server to eventually share (something we'll start doing more at SDCA) --- ## 10 min activity: Think ðŸ’, brainstorm, then share
*short-term, easier* options > What are some potential options that you, your group, and NCRR could implement ***relatively easily and within the short-term (<1 year)*** to be more open and reproducible? - For 2 min, think about ideas to this question (and write down if you want to) - For 3 min, brainstorm with your neighbour(s) on what you've all thought about - For 2 min, decide on one person to write down some of the ideas in the Mentimeter - For 3 min, we'll briefly go over and discuss some ideas from the Mentimeter --- ## 10 min activity: Think ðŸ’, brainstorm, then share
*long-term, more difficult* options > What are some potential options that you, your group, and NCRR could implement that are ***more difficult and within the longer-term (>1 years)*** to be more open and reproducible? - For 2 min, think about ideas to this question (and write down if you want to) - For 3 min, brainstorm with your neighbour(s) on what you've all thought about - For 2 min, decide on one person to write down some of the ideas in the Mentimeter - For 3 min, we'll briefly go over and discuss some ideas from the Mentimeter --- ## Resources - Introduction course to Reproducible Research in R: [r-cubed.rostools.org](https://r-cubed.rostools.org/) - [Further learning resources](https://r-cubed.rostools.org/resources.html#further-learning) - Intermediate course to Reproducible Research in R: [r-cubed-intermediate.rostools.org](https://r-cubed-intermediate.rostools.org/) - [Further learning resources](https://r-cubed-intermediate.rostools.org/resources.html#for-continued-learning)