1 / 51

Where we are and where we are going From biology to data and back again

Where we are and where we are going From biology to data and back again. Chris Evelo Department of Bioinformatics - BiGCaT Maastricht University. Existing Knowledge C arefully H idden in:. Computers aren’t good at:. Listening. Reading. There is a lot of knowledge t o structure.

Download Presentation

Where we are and where we are going From biology to data and back again

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Where we are and where we are goingFrom biology to data and back again Chris Evelo Department of Bioinformatics - BiGCaT Maastricht University

  2. Existing Knowledge Carefully Hidden in:

  3. Computers aren’t good at: Listening Reading

  4. There is a lotof knowledge to structure

  5. Cardiomyopathy: Downregulated genes

  6. Cardiomyopathy: Downregulated genes Fatty Acid Degradation? Other pathways / processes?

  7. What do we really need? Well…

  8. Find the pathways:Biological processes in duodenal mucosa affected by glutamine administration

  9. Understandgenomics ExampleWikiPathway Pathway Pathway on glycolysis. Using modern systems iology annotation.And genes and metabolites connected to major databases.

  10. PathVisio www.pathvisio.org • Visualize data on biological pathways • It can use gene expression, proteomics and metabolomics data • Identify significantly changed processes Martijn P van Iersel, Thomas Kelder, Alexander R Pico, Kristina Hanspers, Susan Coort, Bruce R Conklin, Chris Evelo (2008) Presenting and exploring biological pathways with PathVisio. BMC Bioinformatics 9: 399

  11. adding data =adding colour ExamplePathVisio result Showing proteomics and transcriptomics results on the glycolysis pathway in mice liverafter starvation. [Data from Kaatje Lenaerts and MilkaSokolovic, analysis by Martijn van Iersel]

  12. Process the data…

  13. dbNP Architecture Simple Assay module Body weight, BMI, etc. GSCF Query module Templates Full-text querying Templates Templates Transcriptomics module Groups Subjects Raw data cell files Structured querying Result data p-values z-values Clean data gene expression Pathways, GO, metabolite profiles Events Protocols Profile-based analysis Epigenetics module Samples Assays Study comparison Raw data Nimblegen Illumina Resulting Genome Feature data Clean CPG islanddata Web user interface

  14. Generic Study Capture FrameworkData input / output GSCF Templates Templates Templates Groups Subjects NCBO Ontologies Data import xls, cvs, text Protocols Events web interface Samples Assays Outputxls ISAtab custom programs API custom programs Molgenis custom programs custom dbs custom dbs EBI repository custom dbs

  15. Epigenetics DNA Methylation Pipeline Raw data Nimblegen R QC, processing R QC, processing Clean DNA methylation data (GenomeFeatureFormat) Result data with p-values (GFF) Statistical analysis R QC, processing Raw data Illumina Sequence QC, processing Raw sequencing data MeDIP, BIS-Seq RA6 RA12

  16. Now we just need the Pathways

  17. WikiPathways WikiPathways: Pathway Editing for the People. Alexander R. Pico, Thomas Kelder, Martijn P. van Iersel, Kristina Hanspers, Bruce R. Conklin, Chris Evelo. PLoS Biology2008: 6: 7. e184 Commentaries:Big data: Wikiomics. Mitch Waldrop. Nature 2008: 455, 22-25We the curators. Allison Doerr. Nature Methods 2008: 5, 754–755No rest for the bio-wikis. Ewen Callaway. Nature 2010: 468, 359-360 Public resource for biological pathways Anyone can contribute and curate More up-to-date representation of biological knowledge

  18. www.wikipathways.org Search: “One carbon”

  19. Click

  20. Editing Click Login needed Registration by e-mail address All edits logged

  21. Draw the proteins and interactions

  22. How to ever do data visualization?

  23. Connect to Genome Databases

  24. Double click to annotate the proteins

  25. Add reference to literature Click

  26. Download

  27. PPS1Liver Cytoscape visualization used to group Pathways with high z-score grouped together. Explains why there are relatively few significant genes, but many pathways with high z-score. All pathways

  28. Existing Knowledge Carefully Hidden in:

  29. Backpages link to databases

  30. Problem: Identifier Mapping Entrez Gene 3643 ? Affymetrix probeset 100234_at

  31. BridgeDB: AbstractionLayer classIDMapperRdbrelational database interfaceIDMapper classIDMapperFiletab-delimitedtext classIDMapperBiomart web service

  32. Can we show SNPs? Using dbSNP links in ENSEMBLas part of BridgeDB libs

  33. But it will look like this….

  34. Gene/Protein Y Metabolite X TF Gene/Protein Z RS00001 RS00002 RS00003 RS00004 mi999 Metabolite Y RS00005

  35. Gene/Protein Y Metabolite X TF RS00005 RS00002 Gene/Protein Z RS00001 RS00003 RS00004 mi999 Metabolite Y Functionalize SNPs Unkown function (attribute to gene) Changing protein functionality (coding) In miRNA binding site Changing protein interactions (coding) In TF binding site

  36. Many more SNPs in one interaction (which is one reason Hapmap based approaches don’t work well) Gene/Protein Y RS00001 RS00002 RS00003 RS00011 RS00004 RS00012 RS00005 RS00013 RS00014 RS00015 Gene/Protein Z

  37. Give them (predicted) direction Which helps in evaluating epidemiology studies Gene/Protein Y RS00001 RS00002 RS00003 RS00011 RS00004 RS00012 RS00005 RS00013 RS00014 RS00015 Gene/Protein Z

  38. Give them quantities (from Biochemistry and Epidemiology) Which makes them usable in SBML modelsBut then also the interactions in the model need to have directions and quantities. Gene/Protein Y RS00001 RS00002 RS00003 RS00011 RS00004 RS00012 RS00005 RS00013 RS00014 RS00015 Gene/Protein Z

  39. So we just have to color the jellies, ehhrm SNPs

  40. Thanks!

More Related