Causality in branching time series

Theoretical Systems Biology Retreat (German Cancer Resarch Center, Heidelberg)
Ellwangen · 22 June 2016



F. Alexander Wolf | falexwolf.de

Institute of Computational Biology

Helmholtz Zentrum München

fullscreen: 'f' / navigation: arrow keys / black screen: 'b' / overview: 'o'

Gene expression data for branching lineages

  • branching pseudotime evolution of gene expression

Haghverdi ... Theis, bioRxiv (2016)
Moignard, Woodhoouse ... Göttgens, Nat. Biotechn. (2015)

Gene expression data for branching lineages

  • branching pseudotime evolution of gene expression

Haghverdi ... Theis, bioRxiv (2016)
Moignard, Woodhoouse ... Göttgens, Nat. Biotechn. (2015)

Gene expression data for branching lineages

  • branching pseudotime evolution of gene expression
  • here: ordered by switching time

Learn about gene regulation?

  • Gene that switches early and does not branch candidate for regulator?
  • Gene that branches in different fates candidate for regulated gene?

Haghverdi ... Theis, bioRxiv (2016)
Moignard, Woodhoouse ... Göttgens, Nat. Biotechn. (2015)

(Causal) time series analysis

Model fitting

  • ODE based model Oconce, Haghverdi, Müller & Theis, Bioinformatics (2015)
  • Boolean network Moignard, Woodhoouse ... Göttgens, Nat. Biotechn. (2015)

Granger Causality/Transfer Entropy Granger (1969) / Schreiber, Phys. Rev. Lett. (2000)

  • Granger Causality $\simeq$ Transfer Entropy Barnett, Phys. Rev. Lett. (2009)

Causal Inference Methods

  • PC algorithm Spirtes et al. (2000)
  • Structural equation models e.g. Peters, ..., Schölkopf, NIPS (2013)

Convergent Cross Mapping Sugihara et al., Science (2012)

  • Taken's Theorem Takens (1980)

Model fitting

  • ODE based model Oconce, Haghverdi, Müller & Theis, Bioinformatics (2015)
     ▷  edges using GENIE3, then optimize for $ u_g(\hat{\boldsymbol{x}})$
    $$\frac{d}{dt} \hat x_g = \alpha u_g(\hat{\boldsymbol{x}}) - \lambda \hat x_g,~~ \textstyle p(\mathcal{D}|\theta) \propto \prod_{t=1}^T \exp\big(-\frac{(x_{tg} - \hat x_g(t,\theta))^2}{2 \sigma^2}\big) $$
  • Discrete state space model Moignard, Woodhouse ... Goettgens, Nat. Biotech. (2015)
    • generate discrete state graph of 1-gene transitions
    • check consistency of trial Boolean networks

(Causal) time series analysis

Model fitting

  • ODE based model Oconce, Haghverdi, Müller & Theis, Bioinformatics (2015)
  • Boolean network Moignard, Woodhoouse ... Göttgens, Nat. Biotechn. (2015)

Granger Causality/Transfer Entropy Granger (1969) / Schreiber, Phys. Rev. Lett. (2000)

  • Granger Causality $\simeq$ Transfer Entropy Barnett, Phys. Rev. Lett. (2009)

Causal Inference Methods

  • PC algorithm Spirtes et al. (2000)
  • Structural equation models e.g. Peters, ..., Schölkopf, NIPS (2013)

Convergent Cross Mapping Sugihara et al., Science (2012)

  • Taken's Theorem Takens (1980)

Granger Causality / Causal Inference Methods

Conditional independence tests: $$ X_{ti} \perp\!\!\!\perp X_{(t-1)j} \;|\; X_{(t-1)k_1}, X_{(t-1)k_2}, ... $$

  • Only works well if there is a high amount of dynamic information and no error in time variable. Both is not the case!
  • Instead: time series with geometric constraints associated with branchings. Cannot be exploited by local independence tests.

(Causal) time series analysis

Model fitting

  • ODE based model Oconce, Haghverdi, Müller & Theis, Bioinformatics (2015)
  • Boolean network Moignard, Woodhoouse ... Göttgens, Nat. Biotechn. (2015)

Granger Causality/Transfer Entropy Granger (1969) / Schreiber, Phys. Rev. Lett. (2000)

  • Granger Causality $\simeq$ Transfer Entropy Barnett, Phys. Rev. Lett. (2009)

Causal Inference Methods

  • PC algorithm Spirtes et al. (2000)
  • Structural equation models e.g. Peters, ..., Schölkopf, NIPS (2013)

Convergent Cross Mapping Sugihara et al., Science (2012)

  • Taken's Theorem Takens (1980)

Convergent Cross Mapping

Idea: Reconstruct regulating gene from the history of the regulated gene.

Convergent Cross Mapping

Idea: Reconstruct regulating gene from the history of the regulated gene.

▷ Can exploit branching geometry!

But: no statistical formulation!

New method: statistical test for coupling functions

Rationale

\begin{align} X_{tj} & = f_j(X_{(t-1)j}, X_{(t-1)i_1}, X_{(t-1)i_2}, ...) \\ \Leftrightarrow \quad & F_j(X_{tj},X_{(t-1)j}, X_{(t-1)i_1}, X_{(t-1)i_2}, ...) = 0 \end{align}

Statistically test whether $X_{(t-1)i_1}$ contributes to $F_j$ by testing for the existence of a function $g$ $$ g(X_{tj},X_{(t-1)j}, X_{(t-1)i_2}, ...) = X_{(t-1)i_1}. $$

Implicit function theorem: if there is no $g$, $F_j$ is constant w.r.t $X_{(t-1)i_1}$.

Definition of test: bivariate case

Consider $X_{tj} = f_j(X_{(t-1)j},X_{(t-1)i})$. How to statistically test for the existence of mapping the $g : \mathcal{X}_j \rightarrow \mathcal{X}_i ?$

$$ z_{ji} = \frac{\overline{\Delta}_{ji} - \overline{\delta}_{i}}{\sqrt{(\hat\sigma_{ji}^{\Delta})^2/n + (\hat\sigma_{ji}^{\delta})^2/n}}$$ where $$ \Delta_{tji}^{(a)} = \frac{1}{|{\mathcal{N}}_{tj}^{(a)}|} \sum_{t' \in {\mathcal{N}}_{tj}^{(a)}} d_{tt'i} $$ is the variation across realizations for a fixed value in the domain $\mathcal{X}_j$ $$ {\delta}_{ti}^{(a)} = \frac{1}{2}({d}_{t(t-1)i}^{(a)} + {d}_{(t+1)ti}^{(a)}) $$ is the dynamic variation in the codomain.

Numerical experiment

\begin{align} X_{ti} = \; & (\alpha_{ii} X_{(t-1)i} + \alpha_{ij} X_{(t-1)j}) (X_{(t-1)i}-1) X_{(t-1)i} + N_{ti}, \nonumber\\ \text{where} ~ & i,j \in \{0,1\}, ~t \in \{1,2,\dots,t_\text{max} \}, \nonumber\\ & N_{ti} \sim \mathcal{N}(0,\sigma^2), \nonumber\\ & X_{0i} \sim \mathcal{U}(0.1,0.9), \nonumber\\ & \alpha_{ij} \sim \mathcal{U}(-1,1) ~\forall ~(i,j) \neq (1,0),~\alpha_{10}=0. \end{align}

Summary

  • Goal: infer gene regulation from pseudotime order.
  • Problem: very little dynamic information, traditional approaches fail, instead use branching geometry.
  • Convergent Cross Mapping: can exploit branching geometry but serious flaws.
  • New method: statistical test for coupling functions via implicit function theorem.
Thanks to you, Fabian and ICB-ML group.