+ - 0:00:00
Notes for current slide
Notes for next slide

Overall though, I'm very much looking forward to feedback, thoughts, or comments on NetCoupler to help make it better.

NetCoupler: Inferring causal pathways between high-dimensional metabolic data and external factors

Luke W. Johnston

Steno Diabetes Center Aarhus, Denmark

1 / 31

Outline:

  • Background on analytical problem

  • NetCoupler history and implementation

  • R package work and usage

  • Examples using NetCoupler

  • Current challenges

2 / 31

Overall though, I'm very much looking forward to feedback, thoughts, or comments on NetCoupler to help make it better.

Background: Analytic problem

3 / 31

Modern studies generate a mass of metabolic data

  • -omics type data

  • Metabolic biomarkers

  • High dimensionality

  • Complex networks

4 / 31

"Traditional" analysis may use: Dimensionality reduction

Reducing number of variables with PCA.

Reducing number of variables with PCA.

5 / 31

This has the advantage of making things simpler while trying to maximize variance in the data. Afterward you can do modelling on each principal component. The disadvantage of this approach is that it loses a lot of information since the interdependence and connections between variables it not maintained.

"Traditional" analysis may use: Many regression-type models

O1 = M1 + covariates
O1 = M2 + covariates
...
O1 = M7 + covariates
O1 = M8 + covariates
6 / 31

Some ways you might go about analyzing this data is by running many regression models, one for each metabolic variable for instance.

This of course has problems since you're simply running a bunch of models and not taking account of the inherent interdependencies between variables.

"Traditional" analysis may use: Network analysis

7 / 31

This approach is nice in that you can extract information about the connection between metabolic variables. But there is no way to incorporate the disease outcome with this approach and in order to construct the network properly most methods require you provide a prespecified base network, which you might not know.

But, what if we...

  • want info about network structure?
8 / 31

But, what if we...

  • want info about network structure?

  • don't know the network structure?

8 / 31

But, what if we...

  • want info about network structure?

  • don't know the network structure?

  • have an exposure, metabolites, and outcome?

8 / 31

But, what if we...

  • want info about network structure?

  • don't know the network structure?

  • have an exposure, metabolites, and outcome?

  • are interested in causal links?

8 / 31

... if what we want to know is something like this?

9 / 31

(Potential) solution: NetCoupler

History and implementation

10 / 31

Initial development

  • Developed by Clemens Wittenbecher for his PhD thesis

  • Algorithm that:

    • Finds most likely network structure
    • Allows inclusion of exposure and outcome
    • Identifies causal links between and within network

Clemens Wittenbecher

11 / 31

Four basic phases of the algorithm

12 / 31

Infer (potentially) causal pathways with graphical model output

13 / 31

Developing R package and usage

14 / 31

Met him at EDEG, asked if it was an R package. Started working together after that.

Current state of NetCoupler as 📦

15 / 31

General framework and features

  • Pipe (magrittr %>% or base R |>) friendly
16 / 31

General framework and features

  • Pipe (magrittr %>% or base R |>) friendly

  • Uses tidyselect helpers (e.g. starts_with(), contains())

16 / 31

General framework and features

  • Pipe (magrittr %>% or base R |>) friendly

  • Uses tidyselect helpers (e.g. starts_with(), contains())

  • Auto-complete friendly (e.g. start function names with nc_)

16 / 31

General framework and features

  • Pipe (magrittr %>% or base R |>) friendly

  • Uses tidyselect helpers (e.g. starts_with(), contains())

  • Auto-complete friendly (e.g. start function names with nc_)

  • Inputs/outputs generally tibbles/dataframes or tidygraph tibbles

16 / 31

General framework and features

  • Pipe (magrittr %>% or base R |>) friendly

  • Uses tidyselect helpers (e.g. starts_with(), contains())

  • Auto-complete friendly (e.g. start function names with nc_)

  • Inputs/outputs generally tibbles/dataframes or tidygraph tibbles

  • Flexible with type of model

16 / 31

General framework and features

  • Pipe (magrittr %>% or base R |>) friendly

  • Uses tidyselect helpers (e.g. starts_with(), contains())

  • Auto-complete friendly (e.g. start function names with nc_)

  • Inputs/outputs generally tibbles/dataframes or tidygraph tibbles

  • Flexible with type of model

  • Website with beginner-focused documentation

16 / 31

Visual demo of input and output

17 / 31

Example code

std_data <- dataset %>%
nc_standardize(starts_with("metabolite"))
network <- std_data %>%
nc_estimate_network(starts_with("metabolite")) %>%
as_edge_tbl()
18 / 31

Example code

outcome_estimates <- std_data %>%
nc_estimate_outcome_links(
edge_tbl = network,
outcome = "HbA1c",
model_function = lm
)
19 / 31

Example code

outcome_estimates <- std_data %>%
nc_estimate_outcome_links(
edge_tbl = network,
outcome = "HbA1c",
model_function = lm,
adjustment_vars = c("age", "sex")
)
20 / 31

Example code

outcome_estimates <- std_data %>%
nc_estimate_outcome_links(
edge_tbl = network,
outcome = "incident_diabetes",
model_function = glm,
adjustment_vars = c("age", "sex"),
model_arg_list = list(
family = binomial("logit")
),
exponentiate = TRUE
)
21 / 31

Example plot of output

22 / 31

Example projects using NetCoupler

23 / 31

Clemens' PhD research

  • Aim: Identify potential metabolic links between exposure to dietary risk factors and later type 2 diabetes incidence based on metabolomics networks.

24 / 31

Red meat, acylcarnitines, and incident diabetes in EPIC-Potsdam

25 / 31

UK Biobank: Metabolic pathways between components of stature and HbA1c

26 / 31

UK Biobank characteristics

  • Basics:
    • ~480,000 participants
    • Cross-sectional
    • Stature measures
    • Various demographics
  • Metabolic variables:
    • Cholesterol
    • Albumin
    • Alanine Aminotransferase
    • Apolipoprotein A and B
    • Aspartate Aminotransferase
    • C-reactive Protein
    • Gamma Glutamyltransferase
    • HDL and LDL Cholesterol
    • Triglycerides
27 / 31

28 / 31

Current challenges

29 / 31

Several limitations or things to improve

  • Mainly, performance can be slow
    • E.g. larger data or networks
30 / 31

Several limitations or things to improve

  • Mainly, performance can be slow
    • E.g. larger data or networks
  • Untested on larger networks
30 / 31

Several limitations or things to improve

  • Mainly, performance can be slow
    • E.g. larger data or networks
  • Untested on larger networks

  • Untested on non-cross-sectional/time-to-event data

30 / 31

Several limitations or things to improve

  • Mainly, performance can be slow
    • E.g. larger data or networks
  • Untested on larger networks

  • Untested on non-cross-sectional/time-to-event data

  • Visualizing can be tricky

30 / 31

Several limitations or things to improve

  • Mainly, performance can be slow
    • E.g. larger data or networks
  • Untested on larger networks

  • Untested on non-cross-sectional/time-to-event data

  • Visualizing can be tricky

  • Interpreting estimates can be tricky

30 / 31

Before addressing many of these I want the API to be stable first and the general interface to be well-established before moving on to these things.

Thanks! Comments, thoughts, feedback?

31 / 31

Outline:

  • Background on analytical problem

  • NetCoupler history and implementation

  • R package work and usage

  • Examples using NetCoupler

  • Current challenges

2 / 31

Overall though, I'm very much looking forward to feedback, thoughts, or comments on NetCoupler to help make it better.

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow