1 / 44

The Measuring Technology

giles
Download Presentation

The Measuring Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Causal Discovery from Mass Cytometry DataPresenters: Ioannis Tsamardinos and Sofia TriantafillouInstitute of Computer Science, Foundation for Research and Technology, HellasComputer Science Department, University of Cretein collaboration with Computational Medicine Unit, KarolinskaInstitutet

  2. The Measuring Technology

  3. Mass Cytometry • Single cells measurements • Sample sizes in the millions, minimal cost • Public data available • Up to ~30 proteins measured at a time • Applications • Cell counting • Cell sorting (gating) • Identifying signaling responses • Drug screening • De novo, personalized pathway / causal discovery (?)

  4. Mass Cytometry [Image by Bendall et al., Science 2011]

  5. Cell Sorting (Gating) • Immune system cells can be distinguished based on specific surface markers. • Process resembles a decision tree [Image by Bodenmiller et al., Nat. Biotech. 2012]

  6. Identifying Signaling Responses • Immune responses are triggered by specific activators • Signaling responses are sub-population specific. • Mass cytometry for identifying signaling effects: • Functional proteins (non-surface) are also marked (e.g., pSTAT3 and pSTAT5) • Activators are applied to stimulate a response to disease • Cells are sorted by sub-population • Changes in protein abundance/phosphorylation in each subpopulation are quantified Columns correspond to Different subpopulations Difference in log2 mean intensity of the stimulated condition compared with the unstimulated control [Image by Bendall et al., Science 2011]

  7. Drug Screening • Unwanted signaling responses should be suppressed for disease treatment • Mass cytometry for drug screening • After stimulation, cells are treated with potential drugs (inhibitors) • Cells are sorted by sub-population • Dose-response curves are identified • Per activator • Per sub-population • Per inhibitor [Image by Bodenmiller et al., Nat. Biotech. 2012]

  8. The Public Data

  9. Bendall Data 13 activators no activator Donor 2 Donor 1 13 surface 18 functional variables 13 surface 18 functional variables several subpopulations several subpopulations [Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum, Bendall et al., Science 332, 687 (2011)]

  10. Bodenmiller Data: Time Course Each well produces a data set. 11 activators no activator 0 min 14 functional variables 10 surface 1 min 5 min 15 min 30 min 60 min A plate with 96 wells 120 min 10 surface 14 functional variables 240 min several subpopulations [Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators, Bodenmiller et al., Nature Biotechnology 30, 9 (2012) ]

  11. Bodenmiller Data: 8 donors Each well produces a data set. 11 activators no activator 8 donors A plate with 96 wells 10 surface 14 functional variables several subpopulations [Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators, Bodenmiller et al., Nature Biotechnology 30, 9 (2012) ]

  12. Bodenmiller Data: Inhibitors 27 inhibitors 11 activators Inhibitor (drug) in 7 dosages 10 surface 14 functional variables several subpopulations [Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators, Bodenmiller et al., Nature Biotechnology 30, 9 (2012) ]

  13. Data summary Collection of datasets with : All activators 1 time point (30’) 1 donor All Inhibitors All Subpopulations All 10+14 markers measured

  14. Data Summary Functional Proteins ACTIVATORS Both Bendall Bodenmiller

  15. Causal Discovery in Mass Cytometry Image courtesy of Dr. Brad Marsh • Feedback loops • Latent variables • Non-linear relations • Unfaithfulness A typical day in the cell

  16. A Basic Approach

  17. Local Causal Discovery X Y Z X X X X X X X X X Y Y Y Y Y Y Y Y Y Z Z Z Z Z Z Z Z Z Use stimulus as instrumental binary variable Assumptions: Causal Markov Condition Reichenbach’s Common Cause Principle No feedback cycles Nothing causes X

  18. Issue #1: Signaling is Sub-Population Specific • Gate data • Data were gated by the initial researchers in Cytobank.org • Analyze sub-populations independently • Gated sub-populations differ between Bodenmiller and Bendall • cd4+, cd8+, nk sub-populations in common.

  19. Issue #2:Dormant Relations • Relations may appear only during signaling • Pool together unstimulated and stimulated data • Different parts of the pathway maybe activated by different activators • Analyze data from different activators independently

  20. Issue #3: Testing Independence • Check (in)dependencies: • Choosing a test of conditional independence • One binary, two continuous variables • Relations typically non-linear • Options: • Discretization BUT: does not preserve conditional independencies • Rejected but promising candidates: • Maximal Information Coefficients (Reshef et al., Science 334, 2011) • Kernel-based Conditional Independence test (Zhang et al., UAI 2011) • Fisher z-test of independence + logistic regression P1 S P2

  21. Issue #4Make Reliable Predictions • Check ALL (in)dependencies: • Two thresholds, =0.05 for dependence, =0.15 for independence P1 S P2 dependent independent p-value

  22. Issue #5: Identify “Outlier” Experiments 27 inhibitors • Inhibitor data for “zero” dosage and 8 donor data should represent the same joint distribution • Do they? Inhibitor (drug) in 7 dosages 11 stimuli

  23. Issue #5: Identify “Outlier” Experiments • Inhibitor data for “zero” dosage and 8 donor data should represent the same joint distribution • Do they? Distance • Given a pair of plates: • For each activator, rank correlations (of markers), compute spearman correlation of ranking • Distance = 1-min correlation over activators

  24. Time Course Data Inhibitor Data • First activation of occurs before first activation of For every inhibitor dataset with zero inhibitor dosage and activator i + dataset with zero inhibitor dosage and no activator No Yes Yes , • is activated No Yes Return PREDICTIONS ranked by Yes • All necessary dependencies and independencies hold No Yes Pipeline for making causal predictions

  25. Causal Postulates cd14-hladr- 0.875 • A list of predicted causal pairs, each “tagged” for a specific population and activator, ranked according to a score quantifying the frequency of appearance. 288 predictions in 14 sub-populations cd14-hladrmid 0.75 0.75 0.7059 0.75 0.7143 0.8125 0.7857 0.8125 0.7037 cd14-hladrmid 0.6166 pSTAT3 pPlcg2 pPlcg2 pSHP2 pPlcg2 pSHP2 pS6 pSlp76 pSlp76 pPlcg2 0.6708 0.5482 0.4557 0.8526 0.7152 0.4557 0.5688 0.5512 0.5688 pSHP2 pSTAT3 pP38 pZap70 pZap70 pSTAT3 pZap70 pErk pBtk pZap70 PVO PVO PVO PVO PVO PVO PVO BCR PVO PVO dendritic cd14+hladr- cd14-hladr- cd14-hladr- cd14-hladrmid cd14-hladr- igm-

  26. Internal Validation Protein1 Protein2 Activator Protein2 Protein1 Protein3 Activator OR Protein2 Protein3 Protein1 Activator Protein3 Protein2 Activator Check whether predicted triplet has also been reported • 42% of the predicted triplets are also reported • Despite strict thresholds and multiple testing • Theory+algorithms: [Tillman et. al. 2008, Triantafillou et. al 2010, Tsamardinos et. al 2012]

  27. Validation on Bendall Data Bendall Data • Run FCI with • Bootstrap for robustness • Report • Conflicting structures: Structures where • Confirming Structures: Structures where 0.4815 0.4231 0.2411 0.3114 0.1802 0.2459 ERK S6 P38 SHP2 S6 ERK STAT5 ERK S6 BTK ZAP70 STAT3 STAT3 ZAP70 SHP2 STAT3 STAT3 STAT3 ZAP70 NFKb LPS PMA PMA PMA IFNa PMA PMA IFNa PMA PMA 0.5185 0.4074 CD4+ CD8+ NK 0.4444 0.5185 0.5185 0.4444 0.4074 0.4444 0.1476 0.4352 Measurements in Bendall data are taken 15 minutes after activation 0.3341 ! 0.2502 0.2236 0.5396

  28. Validation on Bendall Data Confirming Conflicting Correlation 0.2936 0.360 0.000 0.1561 0.490 0.130 0.0628 0.020 0.050 0.4231 0.4815 0.3114 0.2459 0.1802 0.2411 S6 ERK P38 BTK S6 S6 ERK SHP2 ERK STAT5 NFKb STAT3 STAT3 STAT3 STAT3 ZAP70 SHP2 ZAP70 STAT3 ZAP70 PMA PMA PMA PMA IFNa IFNa PMA LPS PMA PMA 0.0793 0.150 0.020 0.4074 0.5185 CD4+ CD8+ NK 0.4444 0.5185 0.4444 0.5185 0.2221 0.240 0.000 0.4444 0.4074 0.1476 0.1113 0.4352 0.050 0.000 0.3341 0.1843 0.350 0.010 0.2502 0.2498 0.360 0.130 0.2236 0.1758 0.450 0.050 0.5396 0.1891 0.230 0.230

  29. Results • Hundreds of predictions to-be-tested; Experiments under way! • Internal validation using non-trivial inferences • Promising validation on another collection of dataset (Bendall) • Evidence of batch effects and/or biological reasons of variability • Method based on the most basic causal discovery assumptions

  30. A Not So Basic Approach

  31. Co-analyzing data sets from different experimental conditions with overlapping variable sets • Data can not be pulled together because they come from different distributions • Principles of causality links them to the underlying causal graph Condition A Condition B • Different experimental conditions • Different variable sets Condition C Condition D

  32. Co-analyzing data sets from different experimental conditions with overlapping variable sets Condition A Identify a single causal graph that simultaneously fits all data Condition B Condition C Condition D

  33. What type of causal graph? • Semi-Markov causal models. • : causes directly in the context of observed variables. • : andshare a latent common cause. • Under faithfulness, -separation entails all and only conditional independencies that stem from Causal Markov Condition. • No learning algorithm. A B D C

  34. Manipulations in SMCMs • Values of are set solely by the manipulation procedure • Graph surgery: Remove all edges into the manipulated node. A B D C Manipulated SMCM Graph (SMCM)

  35. Reverse Engineering is latent A A A A B B B B D D D D is latent E E E E C C C C , is latent under manipulation and marginalization Observed (in) dependencies Unknown True SMCM

  36. Independencies as constraints • Suppose you don’t know anything about the structure of the three variables. • You find out that in : • In path terms: path in that is m-connecting and given • In SAT terms: A C B A-C does not exist AND (A-B does not exist OR A-B is into B OR B-C does not exist OR B-C is into B)

  37. Statistical errors • Constraints correspond to * • Dependencies • Independencies • e.g., • Compare a dependence to an independence • How? • Low p-value suggests dependence • High p-value suggests independence (in the respective data set) Conflicts make SAT instance unsatisfiable! What happens with statistical errors? Sort constraints! *well, not really

  38. Comparing p-values • , The proportion of p-values coming from • If you know you can find the MAP ratio • , E1 = 1/E0 • If , independence is more likely than dependence • Sort p-values by max(E0, E1) • Use (Storey and Tibshirani, 2003) to identify • Minimize negative log likelihood ofto identify • Rank constraints according to MAP ratio and satisfy them if possible in the given order.

  39. “COmbINE” Algorithm Data sets measuring overlapping variables under different experimental conditions COmbINE Algorithm that transforms independence constrains to SAT instance Summary of semi Markov Causal models that best fits all data sets simultaneously Eric Ellis

  40. Similar Algorithms • SBCSD: [Hyttinen et al., UAI, 2013] • Inherently less compact representation of path constraints. • Does not handle conflicts; non applicable to real data. • In addition, it admits cycles. • Scales up to 14 variables • Lininf [Hyttinen et al., UAI 2012, JMLR 2012] • Linear relations only. • Scales up poorly (6 variables in total with overlapping variables, 10 without). • In addition, it admits cycles. Execution Time in Seconds

  41. Performance on Simulated Data

  42. Application on Mass Cytometry data cd4+ T-cells cd8+ T-cells Response to PMA

  43. Summary and Conclusions • Mass Cytometry data a good domain for causal discovery • Hundreds of robust causal postulates • Approach: • Conservative: local discovery, performing all tests, independent analysis of populations • Opportunistic: using 2 thresholds for (in)dependency • New algorithm that can handle • different experimental conditions • overlapping variable subsets • deal with statistical errors • Numerous directions open for future work on this collection of data • Experiments under way!

  44. Acknowledgements and Credit • Ioannis Tsamardinos • Associate Prof • Lab Head • Sofia Triantafillou • Ph.D. Candidate • Vincenzo Lagani • Research Fellow • Jesper Tegnér • Prof • Unit Head • Angelika Schmidt • Post-Doc David Gomez-Cabrero, Project Leader Funded by: STATegra EU project (stategra.eu)

More Related