370 likes | 402 Views
Explore the past evolution of bioinformatics, its pivotal role in life sciences, and upcoming challenges in this field. Discover the categories of bioinformatics tools, the need for supercomputers, software development issues, and tools for biotech projects.
E N D
Introduction • Introduction: How RRX got involved … • Life sciences context: How bioinformatics came to be important… • The past half century: How bioinformatics has “evolved”…
Introduction • Categories of Bioinformatics Tools • Why We Need Supercomputers • Software Development Issues • Future Challenges • Tools for Biotech Projects • Summary
How RRX got involved … • Submitted a Canadian Foundation for Innovation (CFI) proposal for Advanced Bioinformatics Collaborative Computing (ABioCC)
How RRX got involved … • Developed an SVG based visualization front end • Paper will be presented at SVG Open 2003 in Vancouver on July 17th
How bioinformatics came to be important… • After the structure of DNA was reverse engineered with X-Ray diffraction in 1953 focus shifted to nucleic acid sequence analysis • DNA/RNA/protein sequence data accumulated using computer programs for storage and analysis
How bioinformatics came to be important… • Bioinformatics algorithms in development for the last half century came into wide spread use by researchers • The ability to compare sequences created a homology context for unknown sequences of interest leading to advances…
How bioinformatics came to be important… • Improved sequencing technology enabled the complete deciphering of the human genome >>> 1999 • About 3.18 billion base pairs • Celera used 300 PE Biosystems ABI Prism 3700 DNA Analysers
How bioinformatics has “evolved”… • Central dogma of molecular biology – • DNA sequences are transcribed into mRNA sequences, mRNA sequences are translated into protein sequences, which fold 3D creating structures with functions statistically survival selected >>>affecting the prevalence of the underlying DNA sequences in a population
How bioinformatics has “evolved”… • This created a supporting information flow • Organization and control of genes in the DNA sequence • Identification of transcriptional units in the DNA sequence • Prediction of protein structure from sequence • Analysis of molecular function
How bioinformatics has “evolved”… • Another covariant information flow was created based on the scientific method • Create hypothesis wrt biological activity • Design experiments to test the hypothesis • Evaluate resulting data for compatibility with the hypothesis • Extend/modify hypothesis in response
How bioinformatics has “evolved”… • IT used to handle explosion of data from high throughput techniques, too complex for manual analysis • X-ray diffraction
How bioinformatics has “evolved”… • Automated DNA sequencing • Amersham Biosciences • Applied Biosystems • Beckman Coulter • LI-COR • SpectruMedix Corp. • Visible Genetics Corp.
How bioinformatics has “evolved”… • Microarray expression analysis
How bioinformatics has “evolved”… • Rapid emergence of 3D macromolecular structure databases • New sub discipline: structural bioinformatics • Atomic and sub cellular spatial scales • Representation/physics • Storage/retrieval/source data correlation/interpretation • Analysis/simulation • Display/visualization
Categories of Bioinformatics Tools… • Databases >>> search/compare • Sequence Analysis - Clusters • Genomics • Phylogenics • Structure Prediction • Molecular Modelling • Microarrays • Packages, Misc Apps, Graphics, Scripts
aceperl BLAST Blastall Blastpgp BLAT Blimps Entrez FASTA fastacmd formatdb getz HMMER IMPALA InterProScan PHI-BLAST ProSearch PSI-BLAST PSI-BLASTN Seguin Swat tace xace Categories of Bioinformatics Tools… • Database >>> search/compare
Artemis Bl2seq BLAST Clustal W, X consed/autofinish Cross_match Dotter EMBOSS FASTA Glimmer HMMER InterProScan MEME View Paracel Transcript Assem Phrap Phred Primers ProSearch Readseq2 Rnabob RRTree SAPS seals Seqsblast STADEN Swat T-Coffee Sequence Analysis
Calc_primers Cross_match FPC GENSCAN Glimmer Image Mzef Phrap Phred STADEN Swat tace tace_celegans tRNAscan-SE xace xace_celegans Genomics
Clustal W Clustal X MOLPHY MrBayes PHYLIP RRTree T-Coffee TREE-PUZZLE TreeViewX Phylogenics
Structure Prediction • EMBOSS • MEME • Modeller • Mzef • PHI-BLAST
Molecular Modelling • Modeller • homology modeling an alignment of a sequence to be modeled with known related structures • Rasmol • a molecular graphics program intended for 3D visualisation of proteins and nucleic acids • Raster3D (publishing images) • X3DNA • analyzing and rebuilding 3D structures
Microarrays • Dapple • a program for quantitating spots on a two-colour DNA microarray image.. • OligoArray • a program that computes gene specific oligonucleotides that are free of secondary structure for genome-scale oligonucleotide microarray construction.
BioPERL BioJava boxshade mvscf seg Split_fasta Packages, Useful Scripts/Source Code, Graphics, PERL • povRay • Raster3D • MOLPHY
Why We Need Supercomputers… • Some commercial packages run on “supercomputers” • Accelrys: modeling and simulation • Materials Studio • Cerius2 (SGI Unix only) • Homology modeling to catalyst design • Insight II (SGI Unix only) • 3D graphical environment for physics based molecular modeling • Catalyst (high end Unix servers) • database management valuable in drug discovery research • QUANTA (high end Unix servers) • crystallographic 2D/3D protein structure solution • Discovery Studio
Why We Need Supercomputers… • Supercomputer advantages • Multiple processors • Large shared memory • Handle very large files • Large/fast RAID arrays • Terabyte tape backup systems • Power backup systems • High performance networks
Why We Need Supercomputers… • Common bioinformatics requirements • Computationally intensive tasks • Large memory models • Intensive/complex database searches • Large experimental database sets • Large derived database sets • Large persistent intermediate data structures • Teamwork data sharing and visualization
Why We Need Supercomputers… • Network requirements • Driving gigE/10gigE NICs • Moving large files/data sets rapidly • Visualization streams/Access GRID • Coordinating Cluster/GRID computing • Dynamic provisioning of light paths
Why We Need Supercomputers… xxxxxxxxxxxxxxxxxxxxxxx
Software Development Issues… • Collaboration contexts/barriers • Team work … collaboration spaces • Standards development … DTDs • Integration issues… • experimental data to homology to 3D model • platform issues… • network issues – 9k MTU - jumbo frames • Licensing issues – public vs. private
Future Challenges… • Creating developer infrastructure for building up structural models from component parts … • components from macromolecule libraries ported to object models • Understanding the design principles of systems of macromolecules and harnessing them to create new functions … • specialized molecular machines
Future Challenges… • Learning to design drugs efficiently and cost effectively based on knowledge of the target … • target generation automation • validation automation • Development of enhanced simulation models that give insight into context based function from knowledge of structure … • possible use of artificial intelligence to limit scope of search
Summary • Bioinformatics • well positioned to assist with application development • exploring novel bioinformatics software development • proceeding with supporting access GRID and optical switching technology