class: center, middle, inverse, title-slide # Reproducibility and code sharing in science ## Why it’s important and how to do it --- layout: true <!-- Knit from document directory --> <div class="my-footer"> <span> <img src="../../common/dda-logo.png" alt="DDA", width="75"> <img src="../../common/sdca-logo.png" alt="SDCA", width="55"> <a href="https://slides.lwjohnst.com/steno/2021-02-08/">slides.lwjohnst.com/steno/2021-02-08</a> </span> </div> --- ## Talk overview - Reproducibility - Basic principle of scientific method - Same data, same code, different analyst, same results - Tools: R Markdown, documentation - Code sharing - Tied to reproducibility - For learning and critical review - Tools: GitHub and Zenodo --- class: middle, center # Reproducibility: Importance and practice --- <img src="index_files/figure-html/reproducibility-figure-1.png" style="display: block; margin: auto;" /> .footnote[ [Patil, 2019](https://doi.org/10.1038/s41562-019-0629-z); [Plessar, 2018](https://www.frontiersin.org/articles/10.3389/fninf.2017.00076/full#B9); American Statistical Association [statement](https://www.amstat.org/asa/files/pdfs/POL-ReproducibleResearchRecommendations.pdf) ] ??? - Detailed description - Includes *exactly how* analysis was done, ie. based on analysis code --- ## Reproducibility in biomedical science .pull-left[ - Already know *replication* is a major problem - e.g. Many Labs Project, OSC Project, Reproducibility Project - Don't know extent of *reproducibility* - Few studies share data [1] - Almost no study provides code [2] ] -- .pull-right[ <img src="index_files/figure-html/reg-reports-figure-1.png" style="display: block; margin: auto;" /> ] .footnote[ 1. [Wallach, 2018](https://pubmed.ncbi.nlm.nih.gov/30457984/) 2. [Leek, 2017](https://www.annualreviews.org/doi/10.1146/annurev-statistics-060116-054104), [Considine, 2017](https://link.springer.com/article/10.1007/s11306-017-1299-3) 3. [Obels, 2020](https://doi.org/10.1177/2515245920918872) ] ??? OSC project: Open Science Collaboration Project Except maybe bioinformatics, where about 60% of studies do. --- ## Why is it important? 🧐 - *Simplest*: It's a key pillar of scientific method. - With modern technology, easy to implement (relative to past) -- - In biomedical research, poorly implemented or not done at all .footnote[ 1. Selected articles: [Goldacre, 2019](https://doi.org/10.1136/bmj.l6365); [Munafó, 2017](https://www.nature.com/articles/s41562-016-0021); [TOP guidelines](https://science.sciencemag.org/content/348/6242/1422.full); [Transparency Checklist](https://www.nature.com/articles/s41562-019-0772-6); [Peng, 2011](https://science.sciencemag.org/content/334/6060/1226) ] ??? (about implementation) I have no training or education (PhD in Nutrition, BSc in Kinesiology) in these and I was able to learn. Though I am a bit obsessive about learning these things so... There are lots of reasons for this, likely due to: - Lack of awareness and training - Difficulty of adoption - No incentive or reward - Little to no culture to do it --- ## Keep in mind: *Reproducibility* is a spectrum .center[ <img src="../../steno/2020-11-26/images/reproducibility-spectrum.jpg" width="85%" height="85%" style="display: block; margin: auto;" /> ] .footnote[ - Should say "Full reproducibility". ] --- ## Practical ways of doing reproducibility .pull-left[ - Generic ways: - <svg style="height:0.8em;top:.04em;position:relative;fill:#214c78;" viewBox="0 0 384 512"><path d="M384 121.941V128H256V0h6.059c6.365 0 12.47 2.529 16.971 7.029l97.941 97.941A24.005 24.005 0 0 1 384 121.941zM248 160c-13.2 0-24-10.8-24-24V0H24C10.745 0 0 10.745 0 24v464c0 13.255 10.745 24 24 24h336c13.255 0 24-10.745 24-24V160H248zM123.206 400.505a5.4 5.4 0 0 1-7.633.246l-64.866-60.812a5.4 5.4 0 0 1 0-7.879l64.866-60.812a5.4 5.4 0 0 1 7.633.246l19.579 20.885a5.4 5.4 0 0 1-.372 7.747L101.65 336l40.763 35.874a5.4 5.4 0 0 1 .372 7.747l-19.579 20.884zm51.295 50.479l-27.453-7.97a5.402 5.402 0 0 1-3.681-6.692l61.44-211.626a5.402 5.402 0 0 1 6.692-3.681l27.452 7.97a5.4 5.4 0 0 1 3.68 6.692l-61.44 211.626a5.397 5.397 0 0 1-6.69 3.681zm160.792-111.045l-64.866 60.812a5.4 5.4 0 0 1-7.633-.246l-19.58-20.885a5.4 5.4 0 0 1 .372-7.747L284.35 336l-40.763-35.874a5.4 5.4 0 0 1-.372-7.747l19.58-20.885a5.4 5.4 0 0 1 7.633-.246l64.866 60.812a5.4 5.4 0 0 1-.001 7.879z"/></svg> Documenting scripts and their order, keep them together (in same folder) - <svg style="height:0.8em;top:.04em;position:relative;fill:#214c78;" viewBox="0 0 384 512"><path d="M224 136V0H24C10.7 0 0 10.7 0 24v464c0 13.3 10.7 24 24 24h336c13.3 0 24-10.7 24-24V160H248c-13.2 0-24-10.8-24-24zm64 236c0 6.6-5.4 12-12 12H108c-6.6 0-12-5.4-12-12v-8c0-6.6 5.4-12 12-12h168c6.6 0 12 5.4 12 12v8zm0-64c0 6.6-5.4 12-12 12H108c-6.6 0-12-5.4-12-12v-8c0-6.6 5.4-12 12-12h168c6.6 0 12 5.4 12 12v8zm0-72v8c0 6.6-5.4 12-12 12H108c-6.6 0-12-5.4-12-12v-8c0-6.6 5.4-12 12-12h168c6.6 0 12 5.4 12 12zm96-114.1v6.1H256V0h6.1c6.4 0 12.5 2.5 17 7l97.9 98c4.5 4.5 7 10.6 7 16.9z"/></svg> Reproducible document systems - <svg style="height:0.8em;top:.04em;position:relative;fill:#214c78;" viewBox="0 0 640 512"><path d="M384 320H256c-17.67 0-32 14.33-32 32v128c0 17.67 14.33 32 32 32h128c17.67 0 32-14.33 32-32V352c0-17.67-14.33-32-32-32zM192 32c0-17.67-14.33-32-32-32H32C14.33 0 0 14.33 0 32v128c0 17.67 14.33 32 32 32h95.72l73.16 128.04C211.98 300.98 232.4 288 256 288h.28L192 175.51V128h224V64H192V32zM608 0H480c-17.67 0-32 14.33-32 32v128c0 17.67 14.33 32 32 32h128c17.67 0 32-14.33 32-32V32c0-17.67-14.33-32-32-32z"/></svg> Pipeline management - <svg style="height:0.8em;top:.04em;position:relative;fill:#214c78;" viewBox="0 0 448 512"><path d="M448 73.143v45.714C448 159.143 347.667 192 224 192S0 159.143 0 118.857V73.143C0 32.857 100.333 0 224 0s224 32.857 224 73.143zM448 176v102.857C448 319.143 347.667 352 224 352S0 319.143 0 278.857V176c48.125 33.143 136.208 48.572 224 48.572S399.874 209.143 448 176zm0 160v102.857C448 479.143 347.667 512 224 512S0 479.143 0 438.857V336c48.125 33.143 136.208 48.572 224 48.572S399.874 369.143 448 336z"/></svg> Can't share data? Make fake dataset of the original one. ] -- .pull-right[ - <svg style="height:0.8em;top:.04em;position:relative;fill:#214c78;" viewBox="0 0 581 512"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> specific (easy to hard): - Documenting R scripts, ordering them, and using R Projects - R Markdown documents - Pipeline tools (📦: drake, targets) ] --- ## Demonstrations - [Code for a paper from my PhD](https://github.com/lwjohnst86/tagDiabetes) - [Documenting scripts approach](https://github.com/lwjohnst86/tagDiabetes/blob/code/R/generate_results.R) - [R Markdown approach](https://github.com/lwjohnst86/tagDiabetes/blob/code/doc/manuscript.Rmd) - [Pipelines approach](https://books.ropensci.org/targets/walkthrough.html#file-structure) --- class: middle, center # Code sharing: Importance and practice --- ## Why is it important? 🧐 - Tightly tied to reproducibility - Need code to reproduce results -- - Critically review a study's exact analysis -- - Builds common standards and best practices -- - Read others code to learn how to write better - Code is written to be read by yourself and others [1] .footnote[ [1]: Otherwise we'd all write in [Assembly](https://upload.wikimedia.org/wikipedia/commons/f/f3/Motorola_6800_Assembly_Language.png). ] ??? - Reproducibility: - Code is the exact steps done to data to get results - Transparent and clear - Easy to access to *any* researcher - Inspectable: (linked to accessibility, but also common language, simple to read, logical, well-reasoned) - standards: can't do that with hidden code - how do we get better at writing? By first reading. To get better at coding we need to read others code to know how to write. Like writing text, writing code is done for a reader. If it was purely for the computer, we'd all be writing in Assembly or binary (lowest level programming language) --- ## Practical ways of sharing code - Main and most commonly used: - [GitHub](https://github.com/)\* - [Zenodo](https://zenodo.org/)\* - [figshare](https://figshare.com/) - [OSF](https://osf.io/) .footnote[GitHub and Zenodo can be [connected](https://guides.github.com/activities/citable-code/)!]