NetCoupler: Inferring causal pathways between high-dimensional metabolic data and external factors

Luke W. Johnston

Steno Diabetes Center Aarhus, Denmark

1 / 31

Outline:

Background on analytical problem
NetCoupler history and implementation
R package work and usage
Examples using NetCoupler
Current challenges

2 / 31

Overall though, I'm very much looking forward to feedback, thoughts, or comments on NetCoupler to help make it better.

Background: Analytic problem

3 / 31

Modern studies generate a mass of metabolic data

-omics type data
Metabolic biomarkers

High dimensionality
Complex networks

4 / 31

"Traditional" analysis may use: Dimensionality reduction

Reducing number of variables with PCA.

5 / 31

This has the advantage of making things simpler while trying to maximize variance in the data. Afterward you can do modelling on each principal component. The disadvantage of this approach is that it loses a lot of information since the interdependence and connections between variables it not maintained.

"Traditional" analysis may use: Many regression-type models

O1 = M1 + covariates
O1 = M2 + covariates
...
O1 = M7 + covariates
O1 = M8 + covariates

6 / 31

Some ways you might go about analyzing this data is by running many regression models, one for each metabolic variable for instance.

This of course has problems since you're simply running a bunch of models and not taking account of the inherent interdependencies between variables.

"Traditional" analysis may use: Network analysis

7 / 31

This approach is nice in that you can extract information about the connection between metabolic variables. But there is no way to incorporate the disease outcome with this approach and in order to construct the network properly most methods require you provide a prespecified base network, which you might not know.

But, what if we...

want info about network structure?

8 / 31

But, what if we...

want info about network structure?
don't know the network structure?

8 / 31

But, what if we...

want info about network structure?
don't know the network structure?
have an exposure, metabolites, and outcome?

8 / 31

But, what if we...

want info about network structure?
don't know the network structure?
have an exposure, metabolites, and outcome?
are interested in causal links?

8 / 31

... if what we want to know is something like this?

9 / 31

(Potential) solution: NetCoupler

History and implementation

10 / 31

Initial development

Developed by Clemens Wittenbecher for his PhD thesis
Algorithm that:
- Finds most likely network structure
- Allows inclusion of exposure and outcome
- Identifies causal links between and within network

Clemens Wittenbecher

11 / 31

Four basic phases of the algorithm

12 / 31

Infer (potentially) causal pathways with graphical model output

13 / 31

Developing R package and usage

14 / 31

Met him at EDEG, asked if it was an R package. Started working together after that.

Current state of NetCoupler as 📦

github.com/NetCoupler
Goal: Submit to CRAN by mid-2021.

15 / 31

General framework and features

Pipe (magrittr %>% or base R |>) friendly

16 / 31

General framework and features

Pipe (magrittr %>% or base R |>) friendly
Uses tidyselect helpers (e.g. starts_with(), contains())

16 / 31

General framework and features

Pipe (magrittr %>% or base R |>) friendly
Uses tidyselect helpers (e.g. starts_with(), contains())
Auto-complete friendly (e.g. start function names with nc_)

16 / 31

General framework and features

Pipe (magrittr %>% or base R |>) friendly
Uses tidyselect helpers (e.g. starts_with(), contains())
Auto-complete friendly (e.g. start function names with nc_)
Inputs/outputs generally tibbles/dataframes or tidygraph tibbles

16 / 31

General framework and features

Pipe (magrittr %>% or base R |>) friendly
Uses tidyselect helpers (e.g. starts_with(), contains())
Auto-complete friendly (e.g. start function names with nc_)
Inputs/outputs generally tibbles/dataframes or tidygraph tibbles
Flexible with type of model

16 / 31

General framework and features

Pipe (magrittr %>% or base R |>) friendly
Uses tidyselect helpers (e.g. starts_with(), contains())
Auto-complete friendly (e.g. start function names with nc_)
Inputs/outputs generally tibbles/dataframes or tidygraph tibbles
Flexible with type of model
Website with beginner-focused documentation

16 / 31

Visual demo of input and output

17 / 31

Example code

std_data <- dataset %>% 
    nc_standardize(starts_with("metabolite"))
network <- std_data %>% 
    nc_estimate_network(starts_with("metabolite")) %>% 
    as_edge_tbl()

18 / 31

Example code

outcome_estimates <- std_data %>%
    nc_estimate_outcome_links(
        edge_tbl = network,
        outcome = "HbA1c",
        model_function = lm
    )

19 / 31

Example code

outcome_estimates <- std_data %>%
    nc_estimate_outcome_links(
        edge_tbl = network,
        outcome = "HbA1c",
        model_function = lm,
        adjustment_vars = c("age", "sex")
    )

20 / 31

Example code

outcome_estimates <- std_data %>%
    nc_estimate_outcome_links(
        edge_tbl = network,
        outcome = "incident_diabetes",
        model_function = glm,
        adjustment_vars = c("age", "sex"),
        model_arg_list = list(
            family = binomial("logit")
        ),
        exponentiate = TRUE
    )

21 / 31

Example plot of output

22 / 31

Example projects using NetCoupler

23 / 31

Clemens' PhD research

Aim: Identify potential metabolic links between exposure to dietary risk factors and later type 2 diabetes incidence based on metabolomics networks.

24 / 31

Red meat, acylcarnitines, and incident diabetes in EPIC-Potsdam

25 / 31

UK Biobank: Metabolic pathways between components of stature and HbA1c

26 / 31

UK Biobank characteristics

Basics:
- ~480,000 participants
- Cross-sectional
- Stature measures
- Various demographics

Metabolic variables:
- Cholesterol
- Albumin
- Alanine Aminotransferase
- Apolipoprotein A and B
- Aspartate Aminotransferase
- C-reactive Protein
- Gamma Glutamyltransferase
- HDL and LDL Cholesterol
- Triglycerides

27 / 31

Link between leg length, liver markers, HbA1c

28 / 31

Current challenges

29 / 31

Several limitations or things to improve

Mainly, performance can be slow
- E.g. larger data or networks

30 / 31

Several limitations or things to improve

Mainly, performance can be slow
- E.g. larger data or networks

Untested on larger networks

30 / 31

Several limitations or things to improve

Mainly, performance can be slow
- E.g. larger data or networks

Untested on larger networks
Untested on non-cross-sectional/time-to-event data

30 / 31

Several limitations or things to improve

Mainly, performance can be slow
- E.g. larger data or networks

Untested on larger networks
Untested on non-cross-sectional/time-to-event data
Visualizing can be tricky

30 / 31

Several limitations or things to improve

Mainly, performance can be slow
- E.g. larger data or networks

Untested on larger networks
Untested on non-cross-sectional/time-to-event data
Visualizing can be tricky
Interpreting estimates can be tricky

30 / 31

Before addressing many of these I want the API to be stable first and the general interface to be well-established before moving on to these things.

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

NetCoupler: Inferring causal pathways between high-dimensional metabolic data and external factors

Luke W. Johnston

Outline:

Background: Analytic problem

Modern studies generate a mass of metabolic data

"Traditional" analysis may use: Dimensionality reduction

"Traditional" analysis may use: Many regression-type models

"Traditional" analysis may use: Network analysis

But, what if we...

But, what if we...

But, what if we...

But, what if we...

... if what we want to know is something like this?

(Potential) solution: NetCoupler

History and implementation

Initial development

Four basic phases of the algorithm

Infer (potentially) causal pathways with graphical model output

Developing R package and usage

Current state of NetCoupler as 📦

General framework and features

General framework and features

General framework and features

General framework and features

General framework and features

General framework and features

Visual demo of input and output

Example code

Example code

Example code

Example code

Example plot of output

Example projects using NetCoupler

Clemens' PhD research

Red meat, acylcarnitines, and incident diabetes in EPIC-Potsdam

UK Biobank: Metabolic pathways between components of stature and HbA1c

UK Biobank characteristics

Link between leg length, liver markers, HbA1c

Current challenges

Several limitations or things to improve

Several limitations or things to improve

Several limitations or things to improve

Several limitations or things to improve

Several limitations or things to improve

Thanks! Comments, thoughts, feedback?

Outline:

Help