Detecting active subnetworks in metabolic interaction graphs with missing data

Detecting active subnetworks in metabolic interaction graphs with missing data PI: Fritz Roth Student: Luke Hunter

Introduction • What is systems biology? • Enumerate components • Enumerate relationships • Study with a model (computational/mathematical) • Analyze, predict, and interpret results • Why is systems biology important? • Increased amount of data • High throughput technology • Fast sharing of information (internet) • Understand emergent properties

Project Goal and Applications • Goal • Find regions of metabolism that are altered by a drug, environment, or disease • Applications • Determine what causes disease phenotype • Study affect of drugs on metabolism • Study action and signaling pathways • Determine how biological modules communicate • Remove human bias from “pathway” assignment

Step 1: Enumerate Components (1) • Determine differential expression of metabolites • Use mass spectrometry • Use t-test to obtain p-values • Null hypothesis is that control and disease patient metabolite concentrations are identical Sabatine, M., et. al. (2005).

Step 1: Enumerate Components (2) • The inverse cumulative normal distribution function converts p-values to z-scores: where

Step 2: Enumerate Relationships (1) • What is a graph? • A graph is an organizational structure made up of nodes and edges: • Represent metabolism as graph • Nodes are metabolites (with z-scores) • Edges are reactions connecting those metabolites

Step 2: Enumerate Relationships (2) • Kyoto Encyclepedia of Genes and Genomes (KEGG) • Compounds • Glycans • Reactions

Step 3: Modeling (1)Scoring Functions • What is an active subnetwork? • Scoring functions: Naïve Ideker et al. (2002) Whitlock (2005) Geometric Mean

Step 3: Modeling (2)Simulated Annealing (SA) Ideker et al. 2002

Step 3: Modeling (3) • Show example: presentation data\0 Corrected Score Iteration Number

Step 4: Predictions (1) • Simulated annealing predictions for glucose data: nodes3d-win32

Step 4: Predictions (2) • Predictions are in agreement with the literature

Acknowledgements • Dr. Fritz Roth & Dr. Gabriel Berriz • Dr. Jocelyn Spragg & Deborah Milstein • NSF REU Program • Everyone else

References • Papers • [1] Ahloulay M, Schmitt F, Dehaux M, Bankir L. 1999. Vasopressin and urinary concentrating activity in diabetes mellitus. Diabetes Metab. 25(3):213-22 • [2] Pappa KI, Vlachos G, Theodora M, Roubelaki M, Angelidou K, Antsaklis A. 2007. Intermediate metabolism in association with the amino acid profile during the third trimester of normal pregnancy and diet-controlled gestational diabetes. Am J Obstet Gynecol. 2007 Jan;196(1):65.e1-5. • [3] Ma XR, Zhou CF, Wang SQ, Wang WQ, Liu YX, Wang SX, Wang FF, Zhang JH, Li YY. 2007. Effects of ganoderma lucidum spores on mitochondrial calcium ion and cytochrome C in epididymal cells of type 2 diabetes rats. Zhonghua Nan Ke Xue. 2007 May;13(5):400-2. • [4] Ideker, T., Ozier, O., Schwikowski, B., and Siegel, A.F. 2002. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18: S233–S240. • [5] Whitlock, M. (2005). Combining probability from independent tests: the weighted Z-method is superior to Fisher’s approach. J. Evol. Biol. 16, 1368-1373. • [6] Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006). • [7] Sabatine, M., et. al. (2005). Metabolomic Identification of Novel Biomarkers of Myocardial Ischemia. Circulation. 2005; 112:3868-3875. • Books • [8] Systems Biology: Properties of Reconstructed Networks by Bernhard Ø. Palsson • Websites • [9] http://www.aber.ac.uk/compsci/Research/bio/robotsci/tech/ml/models.shtml • [10] http://search.cpan.org/~lbrocard/GraphViz-2.02/lib/GraphViz.pm • [11] http://brainmaps.org/index.php?p=desktop-apps-nodes3d

Questions?

Step 3: Modeling (2) • Corrected z-score (background distributions) Std. Dev. of zagg Average zagg Subset size Subset size

Step 4: Analyze Results • Exp #3: Does a biological signal exist? (glucose data) • Compare high scores of scrambled vs. non-scrambled data • This is a very low p-value • We reject the null hypothesis Scrambled Non-scrambled

Detecting active subnetworks in metabolic interaction graphs with missing data

Detecting active subnetworks in metabolic interaction graphs with missing data

Presentation Transcript

Missing in Interaction

Displaying data with graphs

MISSING DATA

Displaying Quantitative Data with Graphs

Coping with Missing Data for Active Learning

Learning with Missing Data

Displaying Data with Graphs

Detecting Signal from Data with Noise

Missing Data

Detecting active subnetworks in interaction graphs with missing data

Data Processing with Missing Information

Missing Data

Missing Data

Missing Data in NSQIP

Missing Data

Detecting Interaction Coupling from Task Interaction Histories

Detecting Missing Hyphens in Learner Text

Detecting Missing Hyphens in Learner Text

Path finding in metabolic graphs

Detecting active subnetworks in molecular interaction networks with missing data