class: center, middle, inverse, title-slide # NetCoupler: Inferring causal pathways between high-dimensional metabolic data and external factors ###
Luke W. Johnston
Steno Diabetes Center Aarhus, Denmark
--- layout: true <div class="my-footer"> <span> <img src="../../common/dda-logo.png" alt="DDA", width="75"> <img src="../../common/sdca-logo.png" alt="SDCA", width="55"> <a href="https://slides.lwjohnst.com/iarc/2020-12-16/">slides.lwjohnst.com/iarc/2020-12-16</a> </span> </div> <!-- <div class="my-header"> --> <!-- </div> --> --- # Outline: - Background on analytical problem - NetCoupler history and implementation - R package work and usage - Examples using NetCoupler - Current challenges ??? Overall though, I'm very much looking forward to feedback, thoughts, or comments on NetCoupler to help make it better. --- class: center, middle # Background: Analytic problem --- ## Modern studies generate a mass of metabolic data .pull-left[ - -omics type data - Metabolic biomarkers ] .pull-right[ - High dimensionality - Complex networks ] --- ## "Traditional" analysis may use: Dimensionality reduction <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#../../au-ph/2019-08-15/images/pca.png" alt="Reducing number of variables with PCA." width="280" /> <p class="caption">Reducing number of variables with PCA.</p> </div> ??? This has the advantage of making things simpler while trying to maximize variance in the data. Afterward you can do modelling on each principal component. The disadvantage of this approach is that it loses a lot of information since the interdependence and connections between variables it not maintained. --- ## "Traditional" analysis may use: Many regression-type models .center[ ``` O1 = M1 + covariates O1 = M2 + covariates ... O1 = M7 + covariates O1 = M8 + covariates ``` ] ??? Some ways you might go about analyzing this data is by running many regression models, one for each metabolic variable for instance. This of course has problems since you're simply running a bunch of models and not taking account of the inherent interdependencies between variables. --- ## "Traditional" analysis may use: Network analysis <img src="data:image/png;base64,#index_files/figure-html/img-traditional-network-analysis-1.png" width="50%" height="50%" style="display: block; margin: auto;" /> ??? This approach is nice in that you can extract information about the connection between metabolic variables. But there is no way to incorporate the disease outcome with this approach and in order to construct the network properly most methods require you provide a prespecified base network, which you might not know. --- ## But, what if we... - want info about network structure? -- - don't know the network structure? -- - have an exposure, metabolites, and outcome? -- - are interested in causal links? --- ## ... if what we want to know is something like this? <img src="data:image/png;base64,#../../au-ph/2019-08-15/images/network.png" width="75%" style="display: block; margin: auto;" /> --- class: middle, center # (Potential) solution: NetCoupler ## History and implementation --- ## Initial development .pull-left[ - Developed by Clemens Wittenbecher for his [PhD thesis](https://publishup.uni-potsdam.de/opus4-ubp/frontdoor/deliver/index/docId/40459/file/wittenbecher_diss.pdf) - Algorithm that: - Finds most likely network structure - Allows inclusion of exposure and outcome - Identifies causal links between and within network ] .pull-right[ ![Clemens Wittenbecher](data:image/png;base64,#https://avatars3.githubusercontent.com/u/33724052?size=200) ] --- ## Four basic phases of the algorithm <img src="data:image/png;base64,#../../iarc/2020-12-16/images/netcoupler-process.svg" width="90%" style="display: block; margin: auto;" /> --- ## Infer (potentially) causal pathways with graphical model output <img src="data:image/png;base64,#../../au-ph/2019-08-15/images/nc-causal-pathways.png" width="85%" style="display: block; margin: auto;" /> --- class: center, middle # Developing R package and usage ??? Met him at EDEG, asked if it was an R package. Started working together after that. --- ## Current state of NetCoupler as <svg style="height:0.8em;top:.04em;position:relative;fill:#214c78;" viewBox="0 0 581 512"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> 📦 .center[ <img src="data:image/png;base64,#../../iarc/2020-12-16/images/netcoupler-github.png" width="70%" style="display: block; margin: auto;" /> ] .footnote[ - [github.com/NetCoupler](https://github.com/NetCoupler/NetCoupler) - **Goal**: Submit to CRAN by mid-2021. ] --- ## General framework and features - Pipe ([magrittr](https://magrittr.tidyverse.org/) `%>%` or base R `|>`) friendly -- - Uses [tidyselect](https://tidyselect.r-lib.org/) helpers (e.g. `starts_with()`, `contains()`) -- - Auto-complete friendly (e.g. start function names with `nc_`) -- - Inputs/outputs generally [tibbles](https://tibble.tidyverse.org/)/dataframes or [tidygraph tibbles](https://tidygraph.data-imaginist.com/) -- - Flexible with type of model -- - Website with beginner-focused documentation --- ## Visual demo of input and output <img src="data:image/png;base64,#../../iarc/2020-12-16/images/netcoupler-input-output.png" width="395" height="70%" style="display: block; margin: auto;" /> --- ## Example code ```r std_data <- dataset %>% nc_standardize(starts_with("metabolite")) network <- std_data %>% nc_estimate_network(starts_with("metabolite")) %>% as_edge_tbl() ``` --- ## Example code ```r outcome_estimates <- std_data %>% nc_estimate_outcome_links( edge_tbl = network, outcome = "HbA1c", model_function = lm ) ``` --- ## Example code ```r outcome_estimates <- std_data %>% nc_estimate_outcome_links( edge_tbl = network, outcome = "HbA1c", model_function = lm, * adjustment_vars = c("age", "sex") ) ``` --- ## Example code ```r outcome_estimates <- std_data %>% nc_estimate_outcome_links( edge_tbl = network, * outcome = "incident_diabetes", * model_function = glm, adjustment_vars = c("age", "sex"), * model_arg_list = list( * family = binomial("logit") * ), * exponentiate = TRUE ) ``` --- ## Example plot of output <img src="data:image/png;base64,#https://netcoupler.github.io/NetCoupler/articles/NetCoupler_files/figure-html/plot-outcome-estimation-networks-1.png" width="50%" style="display: block; margin: auto;" /> --- class: center, middle # Example projects using NetCoupler --- ## Clemens' PhD research - **Aim**: Identify potential metabolic links between exposure to dietary risk factors and later type 2 diabetes incidence based on metabolomics networks. <img src="data:image/png;base64,#../../iarc/2020-12-16/images/aim-clemens-phd.png" width="60%" style="display: block; margin: auto;" /> --- ## Red meat, acylcarnitines, and incident diabetes in EPIC-Potsdam <img src="data:image/png;base64,#../../iarc/2020-12-16/images/carnitines-clemens-phd.png" width="50%" style="display: block; margin: auto;" /> --- ## UK Biobank: Metabolic pathways between components of stature and HbA1c <img src="data:image/png;base64,#../../iarc/2020-12-16/images/aim-ukbiobank.svg" height="90%" style="display: block; margin: auto;" /> --- ## UK Biobank characteristics .pull-left[ - Basics: - ~480,000 participants - Cross-sectional - Stature measures - Various demographics ] .pull-right[ - Metabolic variables: - Cholesterol - Albumin - Alanine Aminotransferase - Apolipoprotein A and B - Aspartate Aminotransferase - C-reactive Protein - Gamma Glutamyltransferase - HDL and LDL Cholesterol - Triglycerides ] --- ## Link between leg length, liver markers, HbA1c <img src="data:image/png;base64,#../../iarc/2020-12-16/images/ukbiobank-netcoupler-results.png" width="70%" style="display: block; margin: auto;" /> --- class: middle, center # Current challenges --- ## Several limitations or things to improve - **Mainly**, performance can be slow - E.g. larger data or networks -- - Untested on larger networks -- - Untested on non-cross-sectional/time-to-event data -- - Visualizing can be tricky -- - Interpreting estimates can be tricky ??? Before addressing many of these I want the API to be stable first and the general interface to be well-established before moving on to these things. --- class: middle, center # Thanks! Comments, thoughts, feedback?