340 likes | 502 Views
Less is more. Approaches to biologist-driven analysis and next-generation sequencing data. Paul Gordon Genome Canada Bioinformatics Platform University of Calgary. What am I doing here?. Genome Canada Bioinformatics Platform. Next Generation Sequencing Next Generation Web
E N D
Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of Calgary
What am I doing here? Genome Canada Bioinformatics Platform • Next Generation Sequencing • Next Generation Web • Future challenges
Better tech: less DNA, more sequence 44μm 70nm
Sprockets: Hierarchical Gene Models from ESTs Developed in collaboration with BASF Plant Sciences
CAVEman • Java 3D-based, world-first complete 3D human body atlas (adult male) • 2,335 organs, hierarchical organization following TerminologiaAnatomica • Numerous applications involving mapping of genetic and disease data • More information: http://cave.ucalgary.ca/caveman Pharmacokinetics visualization (Absorption-distribution-metabolism-excretion of Aspirin) Patient MRI stack mapped onto atlas and registered by landmarks Exploring gene expression patterns
Basic Research • ING-protein interactions (cancer and ageing-rated proteins) • ArchaealUV-light response • Large-scale human • genome organization
Research Applications • Kidney transplants: improved rejection diagnostics in Edmonton • Desulf.: mechanisms of oil pipeline corrosion and its prevention • Mad cow disease/chronic • wasting disease: live diagnostics
DNA Diagnostics Discovery for Mad Cow Preinoculation Preclinical Clinical Control animal #6 Ball toy Controls Photo: S. Czub, CFIA Lethbridge
Next-gen Motif finding (elk dataset) 61 blood samples 107 million base pairs 432 billion pairwise alignments (6574312) Decypher hardware accelerator 1082019 25mers or smaller Decypher hardware accelerator Uninfected 152317 Infected 132417 Thousands of animal coverage/timepoint combos (CPU intensive) Infected 3 universal
Feedback Activation Integration Virus particles? ~25nm Protected promoters (Motifs A & B) Vacuole Manuelidis et al, PNAS 2007 CNA Export Cell death PrP Amyloid fibres Nucleoprotein complexes Possible mode of action? PrPsc(+?) Infectious agent Carp et al., EMBO J., 2006 Leblanc et al., EMBO J. 2006 Stengel et al., Biochem. Biophys. Res. Commun. 2006 Lee et al., Biochem. Biophys. Res. Commun. 2006 Etc. Retrovirus PrP Endogenous Retrovirus? Consistent with protein-only evidence… Neurovirulent? (e.g. M.L. Labat 1999) ↑ EVI1 ↑PLZF ↓PLZF-controlled genes Circulating Nucleic Acids
Better tech: less input, more results Better tech: less DNA, more sequence Generate Manuscript Now
Bioinformatics Semantic Web Where are we at? Life Sciences Emerging Technologies Web Source: Gartner Inc.
How software works… (Gene name, DNA sequence, QTL…) Parameters/Input Functions/ Rules Results/ Output (article, allele,…)
1998 Now The problem with the Web Once you label me, you negate me. Søren Kierkegaard
Bluejay http://bluejay.ucalgary.ca Comparative genomics Waypoints Gene expression integration BioMoby linking
The task at hand (biologist) ACCGT… Sequencer Data File (Binary) Known Proteins BLAST Report (related proteins) (computer scientist)
DNASequence NCBI_gi Sequence_Alignment
Audience Willing to take training Capable but fearful Taverna self-starters Amoeba God Self-perception of computer skills
The need for shoehorns • The current vision of the Semantic Web intends to create a new structure starting up with no reference to its vast, functioning, but more primitive predecessor … things just don’t happen like that
All the Web as Workflows Seahawk prompting Proxied Web page Drag ‘n’ drop Seahawk
What’s Ahead? The more a man learns, the more he realizes how little he knows
http://www.uniprot.org/tissues/229 http://purl.uniprot.org/po/0009009
Take home messages As tech improves, we can ask better questions We will need shoehorns to access existing resources for the foreseeable future