510 likes | 608 Views
Where we are and where we are going From biology to data and back again. Chris Evelo Department of Bioinformatics - BiGCaT Maastricht University. Existing Knowledge C arefully H idden in:. Computers aren’t good at:. Listening. Reading. There is a lot of knowledge t o structure.
E N D
Where we are and where we are goingFrom biology to data and back again Chris Evelo Department of Bioinformatics - BiGCaT Maastricht University
Existing Knowledge Carefully Hidden in:
Computers aren’t good at: Listening Reading
There is a lotof knowledge to structure
Cardiomyopathy: Downregulated genes Fatty Acid Degradation? Other pathways / processes?
Find the pathways:Biological processes in duodenal mucosa affected by glutamine administration
Understandgenomics ExampleWikiPathway Pathway Pathway on glycolysis. Using modern systems iology annotation.And genes and metabolites connected to major databases.
PathVisio www.pathvisio.org • Visualize data on biological pathways • It can use gene expression, proteomics and metabolomics data • Identify significantly changed processes Martijn P van Iersel, Thomas Kelder, Alexander R Pico, Kristina Hanspers, Susan Coort, Bruce R Conklin, Chris Evelo (2008) Presenting and exploring biological pathways with PathVisio. BMC Bioinformatics 9: 399
adding data =adding colour ExamplePathVisio result Showing proteomics and transcriptomics results on the glycolysis pathway in mice liverafter starvation. [Data from Kaatje Lenaerts and MilkaSokolovic, analysis by Martijn van Iersel]
dbNP Architecture Simple Assay module Body weight, BMI, etc. GSCF Query module Templates Full-text querying Templates Templates Transcriptomics module Groups Subjects Raw data cell files Structured querying Result data p-values z-values Clean data gene expression Pathways, GO, metabolite profiles Events Protocols Profile-based analysis Epigenetics module Samples Assays Study comparison Raw data Nimblegen Illumina Resulting Genome Feature data Clean CPG islanddata Web user interface
Generic Study Capture FrameworkData input / output GSCF Templates Templates Templates Groups Subjects NCBO Ontologies Data import xls, cvs, text Protocols Events web interface Samples Assays Outputxls ISAtab custom programs API custom programs Molgenis custom programs custom dbs custom dbs EBI repository custom dbs
Epigenetics DNA Methylation Pipeline Raw data Nimblegen R QC, processing R QC, processing Clean DNA methylation data (GenomeFeatureFormat) Result data with p-values (GFF) Statistical analysis R QC, processing Raw data Illumina Sequence QC, processing Raw sequencing data MeDIP, BIS-Seq RA6 RA12
WikiPathways WikiPathways: Pathway Editing for the People. Alexander R. Pico, Thomas Kelder, Martijn P. van Iersel, Kristina Hanspers, Bruce R. Conklin, Chris Evelo. PLoS Biology2008: 6: 7. e184 Commentaries:Big data: Wikiomics. Mitch Waldrop. Nature 2008: 455, 22-25We the curators. Allison Doerr. Nature Methods 2008: 5, 754–755No rest for the bio-wikis. Ewen Callaway. Nature 2010: 468, 359-360 Public resource for biological pathways Anyone can contribute and curate More up-to-date representation of biological knowledge
www.wikipathways.org Search: “One carbon”
Editing Click Login needed Registration by e-mail address All edits logged
PPS1Liver Cytoscape visualization used to group Pathways with high z-score grouped together. Explains why there are relatively few significant genes, but many pathways with high z-score. All pathways
Existing Knowledge Carefully Hidden in:
Problem: Identifier Mapping Entrez Gene 3643 ? Affymetrix probeset 100234_at
BridgeDB: AbstractionLayer classIDMapperRdbrelational database interfaceIDMapper classIDMapperFiletab-delimitedtext classIDMapperBiomart web service
Can we show SNPs? Using dbSNP links in ENSEMBLas part of BridgeDB libs
Gene/Protein Y Metabolite X TF Gene/Protein Z RS00001 RS00002 RS00003 RS00004 mi999 Metabolite Y RS00005
Gene/Protein Y Metabolite X TF RS00005 RS00002 Gene/Protein Z RS00001 RS00003 RS00004 mi999 Metabolite Y Functionalize SNPs Unkown function (attribute to gene) Changing protein functionality (coding) In miRNA binding site Changing protein interactions (coding) In TF binding site
Many more SNPs in one interaction (which is one reason Hapmap based approaches don’t work well) Gene/Protein Y RS00001 RS00002 RS00003 RS00011 RS00004 RS00012 RS00005 RS00013 RS00014 RS00015 Gene/Protein Z
Give them (predicted) direction Which helps in evaluating epidemiology studies Gene/Protein Y RS00001 RS00002 RS00003 RS00011 RS00004 RS00012 RS00005 RS00013 RS00014 RS00015 Gene/Protein Z
Give them quantities (from Biochemistry and Epidemiology) Which makes them usable in SBML modelsBut then also the interactions in the model need to have directions and quantities. Gene/Protein Y RS00001 RS00002 RS00003 RS00011 RS00004 RS00012 RS00005 RS00013 RS00014 RS00015 Gene/Protein Z