370 likes | 400 Views
Future Challenges in Bioinformatics. Introduction. Introduction: How RRX got involved … Life sciences context: How bioinformatics came to be important… The past half century: How bioinformatics has “evolved”…. Introduction. Categories of Bioinformatics Tools Why We Need Supercomputers
E N D
Introduction • Introduction: How RRX got involved … • Life sciences context: How bioinformatics came to be important… • The past half century: How bioinformatics has “evolved”…
Introduction • Categories of Bioinformatics Tools • Why We Need Supercomputers • Software Development Issues • Future Challenges • Tools for Biotech Projects • Summary
How RRX got involved … • Submitted a Canadian Foundation for Innovation (CFI) proposal for Advanced Bioinformatics Collaborative Computing (ABioCC)
How RRX got involved … • Developed an SVG based visualization front end • Paper will be presented at SVG Open 2003 in Vancouver on July 17th
How bioinformatics came to be important… • After the structure of DNA was reverse engineered with X-Ray diffraction in 1953 focus shifted to nucleic acid sequence analysis • DNA/RNA/protein sequence data accumulated using computer programs for storage and analysis
How bioinformatics came to be important… • Bioinformatics algorithms in development for the last half century came into wide spread use by researchers • The ability to compare sequences created a homology context for unknown sequences of interest leading to advances…
How bioinformatics came to be important… • Improved sequencing technology enabled the complete deciphering of the human genome >>> 1999 • About 3.18 billion base pairs • Celera used 300 PE Biosystems ABI Prism 3700 DNA Analysers
How bioinformatics has “evolved”… • Central dogma of molecular biology – • DNA sequences are transcribed into mRNA sequences, mRNA sequences are translated into protein sequences, which fold 3D creating structures with functions statistically survival selected >>>affecting the prevalence of the underlying DNA sequences in a population
How bioinformatics has “evolved”… • This created a supporting information flow • Organization and control of genes in the DNA sequence • Identification of transcriptional units in the DNA sequence • Prediction of protein structure from sequence • Analysis of molecular function
How bioinformatics has “evolved”… • Another covariant information flow was created based on the scientific method • Create hypothesis wrt biological activity • Design experiments to test the hypothesis • Evaluate resulting data for compatibility with the hypothesis • Extend/modify hypothesis in response
How bioinformatics has “evolved”… • IT used to handle explosion of data from high throughput techniques, too complex for manual analysis • X-ray diffraction
How bioinformatics has “evolved”… • Automated DNA sequencing • Amersham Biosciences • Applied Biosystems • Beckman Coulter • LI-COR • SpectruMedix Corp. • Visible Genetics Corp.
How bioinformatics has “evolved”… • Microarray expression analysis
How bioinformatics has “evolved”… • Rapid emergence of 3D macromolecular structure databases • New sub discipline: structural bioinformatics • Atomic and sub cellular spatial scales • Representation/physics • Storage/retrieval/source data correlation/interpretation • Analysis/simulation • Display/visualization
Categories of Bioinformatics Tools… • Databases >>> search/compare • Sequence Analysis - Clusters • Genomics • Phylogenics • Structure Prediction • Molecular Modelling • Microarrays • Packages, Misc Apps, Graphics, Scripts
aceperl BLAST Blastall Blastpgp BLAT Blimps Entrez FASTA fastacmd formatdb getz HMMER IMPALA InterProScan PHI-BLAST ProSearch PSI-BLAST PSI-BLASTN Seguin Swat tace xace Categories of Bioinformatics Tools… • Database >>> search/compare
Artemis Bl2seq BLAST Clustal W, X consed/autofinish Cross_match Dotter EMBOSS FASTA Glimmer HMMER InterProScan MEME View Paracel Transcript Assem Phrap Phred Primers ProSearch Readseq2 Rnabob RRTree SAPS seals Seqsblast STADEN Swat T-Coffee Sequence Analysis
Calc_primers Cross_match FPC GENSCAN Glimmer Image Mzef Phrap Phred STADEN Swat tace tace_celegans tRNAscan-SE xace xace_celegans Genomics
Clustal W Clustal X MOLPHY MrBayes PHYLIP RRTree T-Coffee TREE-PUZZLE TreeViewX Phylogenics
Structure Prediction • EMBOSS • MEME • Modeller • Mzef • PHI-BLAST
Molecular Modelling • Modeller • homology modeling an alignment of a sequence to be modeled with known related structures • Rasmol • a molecular graphics program intended for 3D visualisation of proteins and nucleic acids • Raster3D (publishing images) • X3DNA • analyzing and rebuilding 3D structures
Microarrays • Dapple • a program for quantitating spots on a two-colour DNA microarray image.. • OligoArray • a program that computes gene specific oligonucleotides that are free of secondary structure for genome-scale oligonucleotide microarray construction.
BioPERL BioJava boxshade mvscf seg Split_fasta Packages, Useful Scripts/Source Code, Graphics, PERL • povRay • Raster3D • MOLPHY
Why We Need Supercomputers… • Some commercial packages run on “supercomputers” • Accelrys: modeling and simulation • Materials Studio • Cerius2 (SGI Unix only) • Homology modeling to catalyst design • Insight II (SGI Unix only) • 3D graphical environment for physics based molecular modeling • Catalyst (high end Unix servers) • database management valuable in drug discovery research • QUANTA (high end Unix servers) • crystallographic 2D/3D protein structure solution • Discovery Studio
Why We Need Supercomputers… • Supercomputer advantages • Multiple processors • Large shared memory • Handle very large files • Large/fast RAID arrays • Terabyte tape backup systems • Power backup systems • High performance networks
Why We Need Supercomputers… • Common bioinformatics requirements • Computationally intensive tasks • Large memory models • Intensive/complex database searches • Large experimental database sets • Large derived database sets • Large persistent intermediate data structures • Teamwork data sharing and visualization
Why We Need Supercomputers… • Network requirements • Driving gigE/10gigE NICs • Moving large files/data sets rapidly • Visualization streams/Access GRID • Coordinating Cluster/GRID computing • Dynamic provisioning of light paths
Why We Need Supercomputers… xxxxxxxxxxxxxxxxxxxxxxx
Software Development Issues… • Collaboration contexts/barriers • Team work … collaboration spaces • Standards development … DTDs • Integration issues… • experimental data to homology to 3D model • platform issues… • network issues – 9k MTU - jumbo frames • Licensing issues – public vs. private
Future Challenges… • Creating developer infrastructure for building up structural models from component parts … • components from macromolecule libraries ported to object models • Understanding the design principles of systems of macromolecules and harnessing them to create new functions … • specialized molecular machines
Future Challenges… • Learning to design drugs efficiently and cost effectively based on knowledge of the target … • target generation automation • validation automation • Development of enhanced simulation models that give insight into context based function from knowledge of structure … • possible use of artificial intelligence to limit scope of search
Summary • Bioinformatics • well positioned to assist with application development • exploring novel bioinformatics software development • proceeding with supporting access GRID and optical switching technology