MN-B-C 2 Analysis of High Dimensional (-omics) Data

MN-B-C 2 Analysis of High Dimensional (-omics) Data Week 5: Proteomics 3 Kay Hofmann – Protein Evolution Grouphttp://www.genetik.uni-koeln.de/groups/Hofmann

Consider one single pathway at a time Consider a groupof genes with interestingexperimental finding Find all pathway associations Statistical testforpathwaysthatare over-represented in group Visualizeexperimental datain pathwaydiagram Mapping gene/proteinsetstobiologicalgroups/pathways Pathway-centric Analysis Gene set centric Analysis Map genes topathwaycomponents

Fas-L Fas FLIP FADD Casp8 Diablo APAF1 cIAP Casp9 Casp3 • Classicalnetwork/pathwayrepresentation • Impliesupstream/downstreamordering Exampleof a known 'biologicalpathway' Advantages: Rich Information Familiar to Biologists Easy to interpret Disadvantages: Not always known Difficult in multi-experiment context Statistical evaluation problematic Often not regulated as a whole Mainly used for pathway-centric analysis

Exampleofpathway-centricanalysis red/greencolorindicateup/down-regulation

Exampleofpre-definedgene/proteincategory • Ifstatisticsismoreimportantthangraphics: • Useof'categorial'data • Examples • Fas pathway • Apoptosis inducers • SNARE complex • p53 target • Chromosome 12q13.1 • Plasma membrane protein • NK-Cell marker Fas-L FADD FLIP Casp8 Diablo Fas APAF1 Casp3 Advantages: Suitable for non-network data Better amenable to statistics Many data sources available cIAP Casp9 Disadvantages: Fewer information Less intuitive More tedious interpretation Mainly used for gene set centric analysis

Fisher'sexacttestforgenesetenrichment The groupof 100 top-regulatedproteinscontains20 cMyctargets. Is thissignificant? Thereare 25 000 proteins in total, amongthem 200 cMyctargets 180 80 24720 24800 24900 Fisher's exact test ≈ χ2 test = Hypergeometric test http://www.langsrud.com/fisher.htm p-Value = 1.34E-22 Enrichment = (20*24720)/(80*180) = 34.3-fold

Frequently used sources for pathway annotation • Gene Ontology (GO)Comprehensive;Ontologies defined by consortium,gene assignments by EBI. Three different ontologies "biological process", "molecularfunction", "cellular component". • Sequence motifsFunctional domains and other conserved sequence regions. PROSITE, Pfam, etc. • UniPROT keywordsKeywords plus wordsfromthepublicationtitles, fromtheproteinnameanddescription. • Chromosomal localizationDerived from EnsEMBL, useful for tumor analysis, etc. • CellmarkersCollectedfromtheliteratureandmutlipepublishedexpressionprojects • KEGG"Kyoto Encyclopedia of Genes and Genomes", mainlymetabolic pathways • ComplexmembershipFrompublications (largelyhighthroughputexperiments). • TF targetsCollectedfromvariousdatabasesincludingMSigDB • Curated pathways Collected from various databases including NetPath, PathWiki, Reactome

GO is the most widely used resource "The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The GO collaborators are developing three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner" Ontologies defined by consortium (covering all of biology in all organisms) Gene assignments by 'genome authorities' human:EBI, mouse: MGD Three ontologies "biological process", "molecularfunction", "cellular component". Organized as 'directed acyclic graph' (DAG) ApoptosisCell cycle Response to pathogen Cell Protein KinaseReceptor Transcription factor Organelle Membrane Mitochondrium NucleusInner Mito. Membrane Ribosome Mitochondrial Membrane Intermembrane space Outer Membrane Inner Membrane

GO is braindead at multiple levels II. Automatic mass-annotations • good coverage in broad 'boring' categories • properties that can be gleaned from protein classes • properties that are associated with sequence domains/motifs • properties that can be guessed from the protein name • poor coverage in more specific categories Example 1: All Keratins (type I, II, cytokeratins, hair keratins, follicular keratins) have the same set of annotations: 'epidermis development', 'intermediate filament', 'keratin filament', 'structural constituent of epidermis', 'structural molecule'. Annotators often fall for misleading names: KCTD family is wrongly classified as 'potassium transporters' (with a whole group of associated annotations like e.g. 'plasma membrane associated') just because they contain a domain called 'potassium channel tetramerization domain'. There are lots of similar examples

GO is getting better: This problem from two years ago has disappeared CytokineActivity CytokineReceptor Binding SOCS2 ProlactinReceptor Binding Interleukin-10 Receptor Binding IL-10 Prolactin GH • Number of false-negatives greatly reduced • Number of inconsistencies between human and mouse greatly reduced

Useful outside resources for PA GSEA: http://www.broadinstitute.org/gsea/index.jsp Gene set enrichment analysis. Similar concept as TreeRanker. DAVID: http://david.abcc.ncifcrf.gov/ Several services, including annotation enrichment Cytoscape: http://www.cytoscape.org/ Network designer/editor, extensible through modules. Userful for protein interactionnetworks, coloring pathways by expression, etc. Genemania: http://genemania.org/ Useful for finding connections within gene sets. Also available as cytoscape module

MN-B-C 2 Analysis of High Dimensional (-omics) Data

MN-B-C 2 Analysis of High Dimensional (-omics) Data

Presentation Transcript

Dimensional Analysis

Dimensional Analysis

Dimensional Analysis

Dimensional Analysis

Dimensional Analysis

Dimensional Analysis

2-Dimensional High Spatial Resolution Spectroscopy

Dimensional Analysis

Dimensional Analysis

Dimensional Analysis

DIMENSIONAL ANALYSIS

Dimensional Analysis

Dimensional Analysis

Dimensional Analysis

Dimensional Analysis

Dimensional Analysis

Dimensional Analysis

Dimensional Analysis

Log b MN 2 = log b M + log b N 2

Dimensional Analysis

Dimensional Analysis