1 / 36

Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery

Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery. Dr. Erik Bongcam-Rudloff SGBC-SLU Uppsala, Sweden. Biologists modus operandi. Observing a phenomenon that is in some way interesting or puzzling.

hope
Download Presentation

Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integration and analysis of multi-type high-throughput data for biomolecular knowledge discovery • Dr. Erik Bongcam-Rudloff • SGBC-SLU • Uppsala, Sweden

  2. Biologists modus operandi • Observing a phenomenon that is in some way interesting or puzzling. • Making a guess as to the explanation of the phenomenon. • Devising a test to show how likely this explanation is to be true or false. • Carrying out the test, and, on the basis of the results, deciding whether the explanation is a good one or not. In the latter case, a new explanation will (with luck) 'spring to mind' as a result of the first test. http://www.biology.ed.ac.uk/archive/jdeacon/statistics/tress2.html

  3. The Observed phenomenon

  4. Selection of test times

  5. But was is the real event?

  6. Sometimes you could be lucky • Positive “Positive” results are used “negative” rejected Why? Only positive results are publishable “Positive” results are used “negative” rejected Why? Only positive results are publishable

  7. Next Generation techniques

  8. New challenges 1 TB data

  9. Gbases produced at Sanger

  10. World NGS Map http://omicsmaps.com/

  11. But this is wonderful! Or? • Sequence without knowledge connected to it is worth: 0 • The deluge of data produced by these hordes of machines worldwide demand automatic workflows • Complete new systems to shuffle data around • Storage of never used amounts • Machines with gigantic amounts of RAM

  12. COSTS

  13. PROBLEMS • NOmenclature • Publishing culture • Moving target development • Old ways of work and resistance to changes in culture

  14. Publishing culture as example • We get tax payers money, we pay publishers to publish, the publishers sell the articles and obtain the copy rights • To connect knowledge to sequences we need automatic methods, workflows, text mining. Most of this is limited by close database systems. Only available is PubMed. But PubMed has only short abstracts. NO information about conditions, M&M etc • We need to change this culture

  15. The BLAST analogy... • By far the most used tool by biologists • Not possible if databases were not Open Access and freely searchable • Imagine if Nucleotide and Protein databases followed the life science publishing model

  16. BLAST

  17. BLAST

  18. BLAST

  19. BLAST

  20. BLAST

  21. Human centric • What about all other areas of the Life Sciences? • Most genes are named by sequence similarity, but are the functions the same?

  22. Microbiome A microbiome is the totality of microbes, their genetic elements (genomes), and environmental interactions in a particular environment. http://www.secondgenome.com

  23. Fat and lean • Metabolic effects of transplanting gut microbiota from lean donors to subjects with metabolic syndrome.A. Vrieze et al, EASD abstracts, 24 September 2012. • The result was: Lean donor faecal infusion improves hepatic and peripheral insulin resistance as well as fasting lipid levels in obese individuals with the metabolic syndrome

  24. Genome sizes

  25. How many species? • Several orders of magnitude: • Some estimates:3-50 million species of arthropods1-100 million species of nematodes • Only a portion of bacterias have being identified, 99% of bacterias cannot be cultured. • “Once the diversity of the microbial worldis catalogued, it will make astronomy to look like a pitiful science”Julian Davies, Professor Emeritus. UBC

  26. New research strategies Microbial Livestock Plants

  27. Typical Sources of Metagenomics • Soil samples • Sea water samples • Air samples • Medical samples • Farm animal samples • Ancient bones • Human microbiome

  28. Ion Proton: "Personal Genome Machine". • LIFE TECHNOLOGIES CORPORATION Real tests of transcriptome sequencing on the Proton. Using 500 ng of input poly-A RNA, it was possible to generate 50 million reads from a melanoma cancer sample. Joe Boland of the National Cancer Institute according to Genomeweb.

  29. Oxford Nanopore http://www.nanoporetech.com/

  30. High technology everywhere!

  31. New applications • Only imagination will put the limits of what its possible to be done using Next Generation Technologies!

  32. The big challenge: • Open Access, Open source, collaborative networks • Data sharing • Common language • Tool systems to glue all together!!

  33. SeqAhead • COST Action BM1006: Next Generation Sequencing Data Analysis Network. 2011-2014 • COST Action 25 countries • http://www.seqahead.eu/

  34. ALLBIO • 10 partners 8 countries • FP7 project • Broadening the Bioinformatics Infrastructure to unicellular, animal, and plant science • www.allbioinformatics.eu

  35. THANKS!! Como 2012

More Related