240 likes | 382 Views
Detecting active subnetworks in interaction graphs with missing data. PI: Fritz Roth Student: Luke Hunter. Project Goal and Applications. Goal Find regions of biological networks that are altered by a drug, environment, or disease Applications
E N D
Detecting active subnetworks in interaction graphs with missing data PI: Fritz Roth Student: Luke Hunter
Project Goal and Applications • Goal • Find regions of biological networks that are altered by a drug, environment, or disease • Applications • Determine what causes disease phenotype • Study affect of drugs on network • Study action and signaling pathways • Determine how biological modules communicate • Remove human bias from “pathway” assignment
Application #1 • Metabolic interation data • Measure metabolites in blood plasma • Chronic ischemia • Glucose tolerance • Planned Myocardial infarction
Get Metabolite p-values • Determine differential expression of metabolites • Use mass spectrometry • Use t-test to obtain p-values • Null hypothesis is that control and disease patient metabolite concentrations are identical Sabatine, M., et. al. (2005).
Obtain Metabolite z-scores • The inverse cumulative normal distribution function converts p-values to z-scores: where
Build Metabolic Graph • Kyoto Encyclepedia of Genes and Genomes (KEGG) • Compounds • Glycans • Reactions
Scoring Subnetworks: Part 1 • What is an active subnetwork? • Scoring functions: Naïve Ideker et al. (2002) Whitlock (2005) Geometric Mean
Scoring Subnetworks: Part 2 Std. Dev. of zsig • Corrected z-score (background distributions) Average zsig Subset size Subset size
Search the Graph Ideker et al. 2002
Corrected Score Timecourse Corrected Score Iteration Number
Subnetwork Analysis • Look at active subnetwork for glucose data: nodes3d-win32
Betweenness & Predictions • Quantify bottleneck nodes with betweenness*: • Predict that unmeasured metabolites with high betweenness differ (flux, concentration, etc.) between case & control *Algorithm by Ulrik Brandes (2001)
Predictions • Predictions are in agreement with the literature
Ischemia Active Subnetwork & Predictions Agrees with Sabatine, M., et. al. (2005).
Application #2 • Human Papillomavirus (HPV) • DNA-based virus • Over 100 types identified • Can cause precancerous lesions • Viral oncogenes E6 & E7 modify cell cycle to aid viral reproduction
Get Data & Assign Signals • Have HPV array data • Add regions of HPV DNA into human cells • Signal information for 54,673 affy ID’s w/ 23 experimental conditions • Map average affy signal to entrez gene ID • Form two groups: • Class A: All mutants and negative controls (p53 not degraded) • Class B: Early region, E6/E7, and E6 (p53 degraded)
Obtain p-values • Use Welch’s t-test to obtain t-value • Allows for unequal sample size and variance • Integrate distribution function to find p-value • Use incomplete beta function (I)
Generate Human Protein-Protein Interaction Graph • Human Protein Reference Database (HPRD) • Created two protein-protein interaction graphs: • 1 or more citations • 2 or more citations
HPV Results • View Results: nodes3d-win32 • Gene Ontology (GO) Enrichment • Use FuncAssociate (Berriz et al 2003)
Future Ideas • Metabolic: Sample predicted metabolites • HPV: Look at p53 null vs. p53 + E6
Acknowledgements • Dr. Fritz Roth & Dr. Gabriel Berriz • Dr. Jocelyn Spragg & Deborah Milstein • NSF REU Program • Harvard Visiting Student Program • NHGRI & NHLBI • Data • Robert Gerszten (MGH) • Marc Sabatine (Brigham & Women's Hosp.) • Thomas Wang (MGH) • Greg Lewis (MGH) • Thomas Wang (MGH) • Xu Shi (Broad) • Steve Carr (Broad) • Terri Addona (Broad)
References • Papers • [1] Ahloulay M, Schmitt F, Dehaux M, Bankir L. 1999. Vasopressin and urinary concentrating activity in diabetes mellitus. Diabetes Metab. 25(3):213-22 • [2] Pappa KI, Vlachos G, Theodora M, Roubelaki M, Angelidou K, Antsaklis A. 2007. Intermediate metabolism in association with the amino acid profile during the third trimester of normal pregnancy and diet-controlled gestational diabetes. Am J Obstet Gynecol. 2007 Jan;196(1):65.e1-5. • [3] Ma XR, Zhou CF, Wang SQ, Wang WQ, Liu YX, Wang SX, Wang FF, Zhang JH, Li YY. 2007. Effects of ganoderma lucidum spores on mitochondrial calcium ion and cytochrome C in epididymal cells of type 2 diabetes rats. Zhonghua Nan Ke Xue. 2007 May;13(5):400-2. • [4] Ideker, T., Ozier, O., Schwikowski, B., and Siegel, A.F. 2002. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18: S233–S240. • [5] Whitlock, M. (2005). Combining probability from independent tests: the weighted Z-method is superior to Fisher’s approach. J. Evol. Biol. 16, 1368-1373. • [6] Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006). • [7] Sabatine, M., et. al. (2005). Metabolomic Identification of Novel Biomarkers of Myocardial Ischemia. Circulation. 2005; 112:3868-3875. • [8] Mishra et al. Human protein reference database--2006 update. Nucleic Acids Res. 2006 Jan;34(Database issue):D411-4 • [9] U. Brandes (2001). A Faster Algorithm for Betweenness Centrality • Books • [8] Systems Biology: Properties of Reconstructed Networks by Bernhard Ø. Palsson • Websites • [9] http://www.aber.ac.uk/compsci/Research/bio/robotsci/tech/ml/models.shtml • [10] http://search.cpan.org/~lbrocard/GraphViz-2.02/lib/GraphViz.pm • [11] http://brainmaps.org/index.php?p=desktop-apps-nodes3d
Predictions • Glucose: • Glycine (C00037) • Iminoglycine (C15809) • O-Acetyl-L-homoserine (C001077) • Ischemia: • L-Homocysteine (C00155) • Sulfite (C00094) • 4-Aminobutanal (C00555) • Planned MI: • Taurine (C00245) • Pyridoxal (C00250) • Uridine (C00299)