1 / 53

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops. www.bioinformatics.ca. Module #: Title of Module. 2. Module 7 – Part III Pathway and Network Analysis. Lincoln Stein Bioinformatics for Cancer Genomics May 27-31, 2013. Classes of Gene Set Analysis. DAVID. GSEA. Reactome FI network PARADIGM.

slouis
Download Presentation

Canadian Bioinformatics Workshops

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Canadian Bioinformatics Workshops www.bioinformatics.ca

  2. Module #: Title of Module 2

  3. Module 7 – Part III Pathway and Network Analysis Lincoln Stein Bioinformatics for Cancer Genomics May 27-31, 2013

  4. Classes of Gene Set Analysis DAVID GSEA Reactome FI network PARADIGM Khatriet al. PLOS Comp Bio. 8:1 2012

  5. Limitations of Gene Set Enrichment Analysis Many possible gene sets – diseases, molecular function, biological process, cellular compartment, pathways... Gene sets are heavily overlapping; need to sort through lists of enriched gene sets! “Bags of genes” obscure regulatory relationships among them.

  6. Pathway Databases Advantages: Usually curated. Biochemical view of biological processes. Cause and effect captured. Human-interpretable visualizations. Disadvantages: Sparse coverage of genome. Different databases disagree on boundaries of pathways.

  7. KEGG

  8. Reactome

  9. Reactome Hand-curated pathways in human. Rigorous curation standards – every reaction traceable to primary literature. Automatically-projected pathways to non-human species. 22 species; 1112 human pathways; 5078 proteins. Features: Google-map style reaction diagrams with overlays; Find pathways containing your gene list; Calculate gene overrepresentation in pathways; Find corresponding pathways in other species. Open access.

  10. Pathway Commons

  11. Pathway Colorization Main feature offered by all pathway databases. Upload a gene list Database calculates an enrichment score on each pathway and displays ranked list. Browse into pathways of interest; download colorized pictures.

  12. Example from Reactome

  13. Example from Reactome

  14. Networks Pathways capture only the “well understood” portion of biology. Networks cover less well understood relationships: Genetic interactions Physical interaction Coexpression GO term sharing Adjacency in pathways

  15. Biological Networks are Scale Free Properties: The degree (# connections) of nodes follows a power law. A node of degree k+1 is exponentially less likely to occur than a node of degree k. The local clustering coefficient (tendency of nodes to interconnect) is independent of the degree of the node. Nature Reviews Genetics 5, 101-113 (February 2004) | doi:10.1038/nrg1272

  16. Biological Networks are Scale Free Implications: A small number of genes have a large number of connections (chokepoints). A large number of genes have a small number of connections (leaves). Genes cluster (functional groups). The cluster sizes are also scale-free (many small clusters, few large clusters). Nature Reviews Genetics 5, 101-113 (February 2004) | doi:10.1038/nrg1272

  17. Network Databases Can be built automatically or via curation. Popular sources of curated networks: BioGRID – Curated interactions from literature; 529,000 genes, 167,000 interactions. InTact – Curated interactions from literature; 60,000 genes, 203,000 interactions. MINT – Curated interactions from literature; 31,000 genes, 83,000 interactions.

  18. Uncurated Interaction Sources Text mining approaches Computationally extract gene relationships from text, such as PubMed abstracts. Much faster than hand curation. Not perfect: Problems recognizing gene names. Is hedgehog a gene or a species? Natural language processing is difficult. Popular resources: iHOP PubGene

  19. Uncurated Interaction Sources Experimental techniques Yeast 2 hybrid protein interactions. Protein complex pulldowns/mass spec. Genetic screens, such as synthetic lethals, enhancer/suppressor screens. NOT perfect Y2H interactions have taken proteins out of natural context; physical interaction != biological interaction. Protein complex pulldowns plagued by “sticky” proteins such as actin. Genetic screens highly sensitive to genetic background (“network effects”).

  20. Integrative Approaches Combine multiple sources of evidence to increase accuracy. Simple example: “Party hubs” are Y2H interactions that have been filtered for those partners that share the same temporal-spatial location. Complex example: Combine multiple sources of curated and uncurated evidence.

  21. Example: Reactome FI Network Curated Human Data – Version 35. 5078 proteins 4166 reactions 3870 complexes 1112 pathways Only ~25% of genome! Goal: add a “corona” of uncurated interaction data around scaffold of curated pathway data.

  22. Expanding Reactome’s Coverage Curated Pathways Uncurated Information human PPI PPI inferred from fly, worm & yeast PPI from text mining Gene co-expression GO annotation on biological processes Protein domain- domain interactions GeneWays CellMap TRED Naïve Bayes Classifier Annotated Functional Interactions Predicted Functional Interactions Wu et al. (2010) Genome Biology

  23. Integrated Functional Interaction (FI) Network • 10,956 proteins (9,542 genes). • 209,988 FIs. • ~50% coverage of genome. • False (+) rate < 1% • False (-) rate ~80% 5% of network shown here

  24. Active Network Extraction + Machine Learning Uncurated Interaction Evidence Curated Pathway Dbs Reactome Functional Interaction Network (~11,000 proteins; 200,000 interactions) Extract and Cluster Altered Genes Disease “modules” (10-30)

  25. Clustering of TCGA Breast Cancer Mutations Cadherin signaling Signaling by Tyrosine Kinase receptors NOTCH and wnt signaling Focal adhesion ECM-Receptor interaction Neuroactive ligand-receptor interaction Mucin cluster Cell adhesion molecules Ubiquitin-mediated proteolysis Metabolism of proteins Signaling by Rho GTPases DNA repair Cell cycle Axon guidance M phase G2/M Transition Calcium signaling

  26. 256 Pancreatic Cancer Mutations Patient Samples Genes

  27. Pancreatic Mutation Modules Module 0: MAPK, Hedgehog, TGFβ signaling Module 4: ECM, focal adhesion, integrin signaling Module 5: Wnt & Cadherinsingaling Module 3: Translation Module 2: B-cell receptor, ERBB, FGFR, EGFR signaling Module 9: Axon guidance Module 10: muscle contraction Module 1: Heterotrimeric G-protein signaling Module 7: Axon guidance Module 6: Ca2+ signaling Module 8: MHC class II antigen presentation

  28. Modules After Hierarchical Clustering Patient Samples Modules

  29. Network-Based Clustering Algorithms • Reactome FI network (Wu & Stein, Genome Biol. 2012 13(12):R112) • Expression or SNV analysis • Online analysis via Cytoscape Plugin (lab) • HotNet (Vandinet al. J Comput Biol. 2011 Mar;18(3):507-22). • Expression or SNV analysis • Local installation with Python & MatLab • Cytoscape visualization • WGCNA (Langfelderet al. 2008 BMC Bioinformatics 9: 559.) • Expression analysis • Local installation as R package.

  30. Classification of Tumors via Molecular Phenotype Test Classify Proteomics Transcriptomics Genomics

  31. Risk Stratification Don’t Treat TEST Low risk – reduce treatment Treat 10-20% progress High risk – treat aggresively Relapse No Relapse

  32. Challenges in Biomarker Discovery • Overtraining • 22,000 genes; any given cancer may show alterations in 1000s of them; patients cohorts are in 100s. • Can find a set of gene alterations that nicely predicts survival in a single cohort by chance. • Field is littered with biomarkers that didn’t replicate in independent cohorts. • Disease Heterogeneity • If there are many subtypes of disease then need even larger cohorts. • Tumor Heterogeneity • A single primary tumor may carry high-risk and low-risk subclones simultaneously.

  33. Using Network Architecture to Accelerate Biomarker Selection Expression Analysis of tumours from multiple patients Principal component analysis on modules Disease Module Map Correlate principal components with clinical parameters GuanmingWu Genome Biol. 2012 Dec 10;13(12):R112

  34. Samples Used Built the network using Nejm: van de Vijver et al 2002 295 Samples, ~12,000 genes Event: death Validated with GSE4922: Ivshina et al. Cancer Res. 2006 249 Samples, ~13,000 genes Event: recurrence or death

  35. PC Analysis Identifies Module 2 as Explaining Much of Variation in Survival

  36. Same Signature Predicts Survival in Independent Data Set

  37. And Three More Data Sets as Well…

  38. Module 2: Kinetochore + Aurora B Signaling

  39. Integration of Multiple Data Sets • Experimental samples can be interrogated many ways: • RNA expression • Genome/exome sequencing • Copy number changes/loss of heterozygosity • shRNA knockdown screens • Integrate multiple functional data types using network/pathway relationships?

  40. PARADIGM Vaske, Benz et al. Bioinformatics 26:i237 2010

  41. Factor graph: directed graph connecting genes; each gene is activated, inactivated, or unchanged in a single patient. Vaske, Benz et al. Bioinformatics 26:i237 2010

  42. Vaske, Benz et al. Bioinformatics 26:i237 2010

  43. PARADIGM: The Bad News • Distributed in source code form only • Requires several third-party math/graph libraries (all open source). • Tedious to compile! • Scant documentation. • No repositories of formatted pathway data. • No examples of converting experimental data into input files. • Good news: we are working on a web service implementation for a Reactome-based implementation.

  44. Take Home Messages • Pathway/network analysis can provide context to altered gene lists. • Pathway/network analysis differs greatly in complexity , power, and usability: • SIMPLE: Pathway diagram colorization • MODERATE: Reactome FI network extraction • COMPLEX: PARADIGM • This type of analysis is work-in-progress, but promises ability to integrate data across many dimensions.

More Related