581 likes | 839 Views
Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis. Bio-Trac 40 (Protein Bioinformatics) October 9, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology
E N D
Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis Bio-Trac 40 (Protein Bioinformatics) October 9, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center
Overview • Introduction • What are large-scale omics data? • What do they tell you? How to interpret? • Approaches • Omics data integration • Resources: databases and tools • Case studies • Systems biology • Top-down, bottom-up • Pathway, network modeling
Genomics, Proteomics Bioinformatics focus is changing… • Individual molecules • DNA, RNA, proteins • Sequence, structure, function • Evolutionary analysis • Population of molecules • Genome, proteome and other “-omes” • Interactions, complexes • Pathways, processes • High level organizations
From One Gene:multiple genetic variants, multiple transcripts, multiple protein products… and PTMs…
To Global Knowledge: The “-ome” and “-omics” Genome Transcriptome Proteome Metabolome • Other “-omes”: • ORFeome • Promoterome • Interactome • Receptome • Phenome • more…
Corresponding to ECM cluster (Chen et al., 2003; Qiu et al, 2007) Gastric Cancer ECM cluster Global analysis Genes Potential Gene Markers SPARC COL3A1 SULF1 YARS ABCA5 THY1 SIDT2
Identification of novel MAP kinase pathway signaling targets (PMA/TPA K562 cells MAPK pathway targets) Digest of U-24 ~3500 spots ~91spot changes reproducible Twenty-five targets of this signaling pathway were identified, of which only five were previously characterized as MKK/ERK effectors. The remaining targets suggest novel roles for this signaling cascade in cellular processes of nuclear transport, nucleotide excision repair, nucleosome assembly, membrane trafficking, and cytoskeletal regulation. -- Mol Cell. 6:1343-54, 2000
Drosophila Embryo Interaction Map Using Y2H technology, 102 bait protein homologous to human cancer genes, 2300 interactions detected, 710 high confidence. The proteins in the map that bear an RA (Ras Association) or RBD (Raf-like Ras-binding) domain define a discrete subnetwork around Ras-like GTPases (colored in yellow). The exploration of the present map leads to numerous biological hypothesis and expands our knowledge of regulatory protein networks important in human cancer as shown by the biological analysis of a particularly interesting network surrounding the Ras oncogene. Genome Res. 15:376-84, 2005.
Omics Data Microarray, 2D, IP, MS, etc. Bioinformatics Databases Gene, Protein, PPI, Pathway, PTM, etc. Literature (MEDLINE) ~50% GO annotations GO Profiling: Molecular function, biological process, cellular component Molecular networks (e.g. interaction, association) biological insights Biological pathways (e.g. KEGG, Reactome, PID, BioCarta) <10% pathway annotations Strategy for Functional Analyses of Omics Data Protein mapping Data integration Functional annotation Text mining Functional analysis Pathway, network, biomarker discovery
Methods for Functional Analysis • Omics data integration • Functional profiling • Pathway analysis • Resources/knowledgebases • Molecular databases • Omics data repositories • Bioinformatics tools • Open source: DAVID, FatiGO, iProXpress • Commercial: Ingenuity, GeneGO • Literature • Text mining
Transcriptomics iProXpress Proteomics mRNA microarray Protein Peptide dbEST coding EST Protein precursor Natural peptides Splicing forms Protease/ Peptidase DNA methylation profiling: coding genes Peptidomics Function Sites Enzyme1 Signaling Pathways Biological Processes Metabolic Pathways Epigenomics Metabolites: HMDB Enzyme2 dbSNP/ HapMap: NS-SNP Metabolomics Genomics Functional Profiling and Analysis Principles of multi-omics data integration for Systems Biology Protein-Centric –Omics Analysis
Functional profiling ID Mapping Batch gene/protein retrieval and profiling Enter ID, gi # Information matrix http://pir.georgetown.edu/pirwww/search/idmapping.shtml
Protein annotations Comments (CC line) Features (FT line) References (RX line) 21 years! Cross References (DR line) Well annotated entry:human p53 (P53_HUMAN) GO
what molecular function? what biological process? what cellular component?
Biological Pathways and Networks Signaling pathways Metabolic pathways Organelle biogenesis Molecular networks
Pathways Human metabolic maps Global gene expression in skeletal muscle from gastric bypass patients before surgery and 1 year afterward. General trend after surgery: up-regulated anaerobic metabolism; down-regulated oxidative phosphorylation green, down-regulated genesred, up-regulated geneswhite, no data available Proc Natl Acad Sci U S A. 2007 Feb 6;104(6):1777-82 http://www.pnas.org/cgi/data/0610772104/DC1/30
Databases of Protein Functions • Metabolic Pathways • KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways • EcoCyc: Encyclopedia of E. coli Genes and Metabolism • MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) • Inter-Molecular Interactions and Regulatory Pathways • IntAct: Protein interaction data from literature and user submission • BIND: Descriptions of interactions, molecular complexes and pathways • DIP: Catalogs experimentally determined interactions between proteins • Reactome - A curated knowledgebase of biological pathways • Pathway Interaction Database (PID) • BioCarta: Biological pathways of human and mouse • Pathway Commons • GO and GO annotation projects
- Molecular Function - Biological Process - Cellular Component (http://www.geneontology.org/) Gene Ontology (GO)
GO Slim http://www.geneontology.org/GO.slims.shtml
Biological Pathway Resource Collection http://www.pathguide.org/ • Protein-protein interactions • Metabolic pathways • Signaling pathways • Pathway diagrams • Transcription factors / gene regulatory networks • Protein-compound interactions • Genetic interaction networks
KEGG Metabolic & Regulatory Pathways • KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/pathway.html)
BioCarta Cellular Pathways (http://www.biocarta.com/index.asp) Transforming Growth Factor (TGF) beta signaling [Homo sapiens]
Transforming Growth Factor (TGF) beta signaling [Homo sapiens] (http://reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=170834&) Reactome: events and objects (including modified forms and complex) Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol] Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens] Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol] Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleus Object -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……
PID Transforming Growth Factor beta signaling
Transforming Growth Factor (TGF) beta signaling Reactome PID ~26 proteins in PID are not defined in Reactome, while only 2 in Reactome not defined in PID
LAP TGF-b TGF-b TGF-b II II I I STRAP Smad 7 Shc Smad 2 Smad 2 Smad 2 Smad 2 Smad 2 S S S S S S S S S S S S S S X S S S S S S S S S S P P P P P P P P P P P P P P P P P P P P P P P P P TAK1 Y T Y T Y T Y Y Y K T T K Y T T Y P P P P U P P P U P P P P P P P P Smad 4 Smad 4 Smad 4 Smad 2 Phosphorylation (P) at Serine (S), Threonine (T) and Tyrosine (Y) Ubiquitination (U) at Lysine (K) TGF-beta signaling – comparison between PID and Reactome Furin Growth signals Ca2+ Growth signals Stress signals PRO:000000616 TGF-beta receptor PRO:000000523 PRO:000000410 Cytoplasm Smad 2 PRO:000000650 MEKK1 Smad 4 ERK1/2 Shc XIAP CaM TAK1 X Degradation P38 MAPK pathway JNK cascade MAPKKK Ski Nucleus Common in both Reactome & PID X Only reported in Reactome * All others are in PID. Not all components in the pathway from both databases are listed DNA binding and transcription regulation
GEO: a gene expression/ molecular abundance repository PRIDE: centralized, standards compliant, public data repository for proteomics data http://www.ncbi.nlm.nih.gov/geo/ IntAct: open source database system and analysis tools for protein interaction data http://www.ebi.ac.uk/pride/ http://www.ebi.ac.uk/pride/
Analysis Tools • iProXpress • http://pir.georgetown.edu/iproxpress/ • DAVID • http://david.abcc.ncifcrf.gov/ • Babelomics - FatiGO • http://babelomics.bioinfo.cipf.es/ • Commercial: • Ingenuity: http://www.ingenuity.com/ • GeneGO: http://www.genego.com/ • Visual tools: • Cytoscape: http://www.cytoscape.org/ • CellDesigner: http://www.celldesigner.org/
iProXpress: Integrative analysis of proteomic and gene expression data Data MS spectrum Peptide ident. Protein ident. http://pir.georgetown.edu/iproxpress/ Information Function Pathway Family Categorize Statistics Association Knowledge
Organelle proteome data sets iProXpress– Pathway Profiling ER Mit • Protein information matrix: extensive annotations including protein name, family classification, function, protein-protein interaction, pathway… • Functional profiling: iterative categorization, sorting, cross-dataset comparison, coupled with manual examination. Mit ER KEGG pathway
iProXpress Analysis Interface 1 2 3 4 5 6 7 8 Cross-data groups comparative profiling
A Literature-Derived Network for Yeast • All MEDLINE abstracts processed using statistical co-occurrence and NLP methods: • Functional association (co-occurrence – grey shades • Physical interaction – green • Regulation of expression – red • Phosphorylation – dark blue • Dephosphorylation – light blue • Inference: Ssn3 ->Hsp104 (b) and Ume6 -> Ino2 & Erg9 (c) expressions Jensen et al., 2006
Case Studies Pathway studies: analysis of proteomics and gene expression data from cancer research I. Estrogen Signaling Pathways (estrogen-induced apoptosis) Breast cancer cells (+E2) IP (AIB1, pY) 1D-gel MS/MS II. Purine Metabolic Pathways (radiation-induced DNA repair) Human fibroblast (AT patient) + irradiation 2D-gel MS DNA microarray III. Melanosome Biogenesis (comparative organelle proteomic profiling) Melanoma cell isolation of stage specific melanosmes MS
E2 Mimicking clinical condition: 2nd phase anti-estrogen drug resistance MCF-7 MCF-7/5C Estrogen deprived condition Signaling pathway: early events? Breast cancer cells AIB1 Growth Apoptosis pY-IP AIB1-IP Integrated Bioinformatics Expression Profiling, Pathway/Network Mapping MS proteomics I. Estrogen Signaling Pathways (estrogen-induced apoptosis) 200nM for 2h Hu ZZ, et al. (2008) US HUPO
Proteins only in E2 treated MCF-7/5C cells from both pY-IP and AIB1-IP GO profiling (biological process) Transcription Cell communication Chromosome remodeling & co-repression, cell cycle inhibition, apoptosis
Pathway Mapping: G(o) alpha-2 subunit (pY/5C +E2) RAP1GAP (AIB1/5C+E2)
GPR30 E2 pY pY ? CDK1 GNAO2 Cytoplasm ? AIB1 AIB1 Rap1GAP Rap1a E2 E2 ERa ERa Gas TLE3 Apoptosis MEK RUNX3 ERK BAD Sirt3 Apoptosis Cell growth Sirt3 Nucleus pY CIP29 Hypothesized E2-induced Apoptosis Pathways pY-IP AIB1-IP Function GNAO2 G(o) alpha-2, GPCR signaling Rap1GAP Growth inhibition/apoptosis CDK1 BAD-mediated apoptosis Sirt3 Histone modification, apoptosis TLE3 Co-repression, apoptosis CIP29 Cell cycle arrest/apoptosis
Text mining for protein-protein interaction (PPI) information
2D-gel/MS DNA Microarray Proteins differentially expressed (1093) mRNAs differentially expressed (231) Intersections Integrated Bioinformatics Expression Profiling, Pathway/Network Mapping (13 proteins/genes) II. Purine Metabolic Pathways (radiation-induced DNA repair) Ionizing Radiation AT5BIVA ATCL8 ATM introduced AT patient fibroblast ATM-wild type ATM-mutated ATM Sensitive to IR damage Resistant to IR damage Hu ZZ, et al. (2008) J Prot. Bioinfo.
DNA synthesis DNA repair dGTP X GTP dGDPGDP ATP X dATP ADPdADP Ribonucleoside diphosphate reductase subunit M2 (RRM2) 1.17.4.1 1.17.4.1 Purine metabolic pathway
RRM2 HDAC1 p53 BRCA1 Functional Association Networks RRM2 connected to other major DNA repair and cell cycle proteins, such as p53, BRCA1, HDAC1.
ATM p53 HDAC1 BRCA1 BRCA1 ATM RRM2 p53 RRM1 RR complex DNA repair RRM2 in radiation-induced ATM-p53-mediated DNA repair pathway
III. Organelle Proteomes Comparative organelle proteome profiling allows to propose key proteins potentially involved in regulation of organelle biogenesis Schematic drawing of melanosome biogenesis pathway and key proteins involved in each stage. Chi A, et al. (2006) J. Prot. Res.
Genomics Bibliomics Transcriptomics Literature Mining Proteomics Metabolomics Bioinformatics …mics …mics …omics Towards Systems Biology (Nature 422:193, 2003) Integrated knowledge and tools are needed for Systems Biology’s research
What is Systems Biology? Systems Biology, 2004, 1(1):19-27. ‘Systems biology defines and analyses the interrelationships of all of the elements in a functioning system in order to understand how the system works.’-- Leroy Hood • How an organism works from an overall perspective. • Interactions of parts of biological systems • how molecules work together to serve a regulator function in cells or between cells. • how cells work to make organs, how organs work to make a person. • Systems biology is the converse of reductionist biology.
Reductionist vs. Systems Biology The driving force for 21st century biology will be integration: Integrating the activity of genes and regulators into regulatory networks Integrating the interactions of amino acids into protein folding predictions Integrating the interactions of metabolites into metabolic networks Integrating the interactions of cells into organisms Integrating the interactions of individuals into ecosystems The driving force in 20th century biology has been reductionism: From the population to the individual From the individual to the cell From the cell to the biomolecule From the biomolecule to the genome From the genome to the genome sequence With the publication of genome sequences, reductionist biology has reached its endpoint