1 / 11

MN-B-C 2 Analysis of High Dimensional (-omics) Data

MN-B-C 2 Analysis of High Dimensional (-omics) Data. Week 5: Proteomics 3. Kay Hofmann – Protein Evolution Group http://www.genetik.uni-koeln.de/groups/Hofmann. Consider one single pathway at a time. Consider a group of genes with interesting experimental finding.

huey
Download Presentation

MN-B-C 2 Analysis of High Dimensional (-omics) Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MN-B-C 2 Analysis of High Dimensional (-omics) Data Week 5: Proteomics 3 Kay Hofmann – Protein Evolution Grouphttp://www.genetik.uni-koeln.de/groups/Hofmann

  2. Consider one single pathway at a time Consider a groupof genes with interestingexperimental finding Find all pathway associations Statistical testforpathwaysthatare over-represented in group Visualizeexperimental datain pathwaydiagram Mapping gene/proteinsetstobiologicalgroups/pathways Pathway-centric Analysis Gene set centric Analysis Map genes topathwaycomponents

  3. Fas-L Fas FLIP FADD Casp8 Diablo APAF1 cIAP Casp9 Casp3 • Classicalnetwork/pathwayrepresentation • Impliesupstream/downstreamordering Exampleof a known 'biologicalpathway' Advantages: Rich Information Familiar to Biologists Easy to interpret Disadvantages: Not always known Difficult in multi-experiment context Statistical evaluation problematic Often not regulated as a whole Mainly used for pathway-centric analysis

  4. Exampleofpathway-centricanalysis red/greencolorindicateup/down-regulation

  5. Exampleofpre-definedgene/proteincategory • Ifstatisticsismoreimportantthangraphics: • Useof'categorial'data • Examples • Fas pathway • Apoptosis inducers • SNARE complex • p53 target • Chromosome 12q13.1 • Plasma membrane protein • NK-Cell marker Fas-L FADD FLIP Casp8 Diablo Fas APAF1 Casp3 Advantages: Suitable for non-network data Better amenable to statistics Many data sources available cIAP Casp9 Disadvantages: Fewer information Less intuitive More tedious interpretation Mainly used for gene set centric analysis

  6. Fisher'sexacttestforgenesetenrichment The groupof 100 top-regulatedproteinscontains20 cMyctargets. Is thissignificant? Thereare 25 000 proteins in total, amongthem 200 cMyctargets 180 80 24720 24800 24900 Fisher's exact test ≈ χ2 test = Hypergeometric test http://www.langsrud.com/fisher.htm p-Value = 1.34E-22 Enrichment = (20*24720)/(80*180) = 34.3-fold

  7. Frequently used sources for pathway annotation • Gene Ontology (GO)Comprehensive;Ontologies defined by consortium,gene assignments by EBI. Three different ontologies "biological process", "molecularfunction", "cellular component". • Sequence motifsFunctional domains and other conserved sequence regions. PROSITE, Pfam, etc. • UniPROT keywordsKeywords plus wordsfromthepublicationtitles, fromtheproteinnameanddescription. • Chromosomal localizationDerived from EnsEMBL, useful for tumor analysis, etc. • CellmarkersCollectedfromtheliteratureandmutlipepublishedexpressionprojects • KEGG"Kyoto Encyclopedia of Genes and Genomes", mainlymetabolic pathways • ComplexmembershipFrompublications (largelyhighthroughputexperiments). • TF targetsCollectedfromvariousdatabasesincludingMSigDB • Curated pathways Collected from various databases including NetPath, PathWiki, Reactome

  8. GO is the most widely used resource "The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The GO collaborators are developing three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner" Ontologies defined by consortium (covering all of biology in all organisms) Gene assignments by 'genome authorities' human:EBI, mouse: MGD Three ontologies "biological process", "molecularfunction", "cellular component". Organized as 'directed acyclic graph' (DAG) ApoptosisCell cycle Response to pathogen Cell Protein KinaseReceptor Transcription factor Organelle Membrane Mitochondrium NucleusInner Mito. Membrane Ribosome Mitochondrial Membrane Intermembrane space Outer Membrane Inner Membrane

  9. GO is braindead at multiple levels II. Automatic mass-annotations • good coverage in broad 'boring' categories • properties that can be gleaned from protein classes • properties that are associated with sequence domains/motifs • properties that can be guessed from the protein name • poor coverage in more specific categories Example 1: All Keratins (type I, II, cytokeratins, hair keratins, follicular keratins) have the same set of annotations: 'epidermis development', 'intermediate filament', 'keratin filament', 'structural constituent of epidermis', 'structural molecule'. Annotators often fall for misleading names: KCTD family is wrongly classified as 'potassium transporters' (with a whole group of associated annotations like e.g. 'plasma membrane associated') just because they contain a domain called 'potassium channel tetramerization domain'. There are lots of similar examples

  10. GO is getting better: This problem from two years ago has disappeared CytokineActivity CytokineReceptor Binding SOCS2 ProlactinReceptor Binding Interleukin-10 Receptor Binding IL-10 Prolactin GH • Number of false-negatives greatly reduced • Number of inconsistencies between human and mouse greatly reduced

  11. Useful outside resources for PA GSEA: http://www.broadinstitute.org/gsea/index.jsp Gene set enrichment analysis. Similar concept as TreeRanker. DAVID: http://david.abcc.ncifcrf.gov/ Several services, including annotation enrichment Cytoscape: http://www.cytoscape.org/ Network designer/editor, extensible through modules. Userful for protein interactionnetworks, coloring pathways by expression, etc. Genemania: http://genemania.org/ Useful for finding connections within gene sets. Also available as cytoscape module

More Related