400 likes | 502 Views
Developing a Bioinformatics Grid-aware Client. Steven Stones-Havas 1 and Allen Rodrigo 2 1 Biomatters Ltd And 2 The Bioinformatics Institute (New Zealand). Talk Overview. Overview of BeSTGrid and NZBioGrid Motivation and design philosophy The Geneious platform Workflows
E N D
Developing a Bioinformatics Grid-aware Client Steven Stones-Havas1 and Allen Rodrigo2 1Biomatters Ltd And 2The Bioinformatics Institute (New Zealand)
Talk Overview • Overview of BeSTGrid and NZBioGrid • Motivation and design philosophy • The Geneious platform • Workflows • Future developments
Kiwi Advanced Research and Education Network (KAREN) • Established by REANNZ Ltd • “REANNZ (Research and Education Advanced Network New Zealand Ltd) is the Crown-owned company set up to establish, own and operate a high-speed telecommunications network for the research and education sectors.” • www.karen.net.nz • High Speed Connectionn (up to 10 Gbits/sec) between NZ Universities and CRIs • High Speed Connection to Australia (up to XX Mbits/sec) and the rest of the world
Broadband enabled Science and Technology GRID (BeSTGRID) • “BeSTGRID started in 2006 as a Tertiary Education Commission Innovation and Development Fund Project 2006-2008, focused on how to make eResearch work, to create a fully-functional eResearch ecosystem for New Zealand. BeSTGRID delivered mechanisms, methods and tools that facilitate collaboration on shared information, sharing of computational resources and online visualization of instruments and experiments.” www.bestgrid.org • Consists of • Data Grid • Collaboration Grid • Computational Grid
NZBioGrid • To implement a grid-enabled platform for biological science researchers that will deliver access to biologically relevant databases and applications. • Funded by the Ministry of Research, Science and Technology and TelstraClear Ltd, through REANNZ. • Project started in May 2007. • The Bioinformatics Institute (New Zealand), University of Auckland, in partnership with: • Biomatters Ltd • NetValue Ltd
So what exactly is “Bioinformatics”? • The computational organization and analysis of biological information • Bioinformatics is an interdisciplinary science. It integrates: • Biology • Computer Science • Mathematics • Statistics
2002 NCBI, National Library of Medicine, NIH www.ncbi.nlm.nih.gov “There are approximately 65,369,091,950 bases in 61,132,599 sequence records in the traditional GenBank divisions and 80,369,977,826 bases in 17,960,667 sequence records in the WGS division as of August 2006.”
New Sequencing Technologies • Roche, Illumina, and Applied Biosystems have released next-generation sequencers that produce large quantities of sequence information. • Millions of shotgun fragments, each between 25nt-250nt long • 106 - 109 nt in a single run (within days/weeks) • Other technologies will follow.
Databases Nucleotide Sequence Databases RNA sequence databases Protein sequence databases Structure Databases Genomics Databases (non-vertebrate) Metabolic and Signaling Pathways Human and other Vertebrate Genomes Human Genes and Diseases Microarray Data and other Gene Expression Databases Proteomics Resources Other Molecular Biology Databases Organelle databases Plant databases Immunological databases Total 1062 Source: NAR Database Categories List
NZBioGrid -- Motivation • 21st century biology relies on bioinformatics. • Many biologists are mathematically or computationally challenged, but • They need access to data • They need access to tools • They have solved these problems in an ad hoc manner • Many computational biologists and bioinformaticists (who are not so challenged) write programs that are • Difficult to run (e.g., command line input) • Have different input/output formats • Focus on one or a few analyses • Many computational tasks take a great deal of time to execute.
NZBioGrid – Design Philosophy • To develop a tool that • Is easy to use • Can sit on a desktop • Available for different OSs (principally, Windows and MacOS) • Has a GUI • Can integrate I/O across different analyses • Can be extended as more analyses/ software become available • Can use the resources on BeSTGRID to relieve computational burden on individual computers • Minimal “culture change: focuses on OUTCOMES as well as ANALYSES
NZBioGrid and Geneious • The Bioinformatics Institute has teamed up with Biomatters Ltd to deliver a grid-enabled platform for computational analysis. • The platform is built on Biomatters’ existing product, Geneious. • Written in Java • E-mail-like interface • Standard bioinformatics tools • Consistent GUI • API permits plug-ins • Runs on a desktop, but is internet-aware
Comprehensive toolset • Sequence and structure alignment • Primer design and restriction analysis • Phylogenetic and taxonomic tree building • Contig assembly • Publication searching • Automatic search agents • Collaboration
Grid-enabled Geneious client -- Development • Grid Plug-in • Plug-ins for existing programs on BeSTGRID • Plug-ins for generic command-line programs on BeSTGRID • Workflows
Getting Started You need a security Certificate GRIX (http://grix.vpac.org/downloads/)
Software Available with Native Plug-ins • Now available • ClustalW • MrBayes • LAMARC • PAUP* • Soon to be added • BLAST • BEAST
Command-line Programs: Command Line Interface Creator (CLIC) • As more programs are added, there needs to be a facility that permits these programs to be integrated into the platform. • CLIC is a plug-in that permits a user to specify command line syntax and switches for any program that permits command-line input • CLIC generates an XML script that Geneious uses to create a dialog box.
Workflows • A great deal of work has been done on workflows • “Support basic research in computer science to create a science of workflows.” Recommendation by the NSF-sponsored Workshop on the Challenges of Scientific Workflows, May 06 • In biology, there are two broad types of workflows: • Repetitive application of routine tasks • Well-defined, generally accepted workflows • “Program-splicing” • Permits different combinations of programs • Need to allow for user to interact with workflow
Opportunity for user to review alignment with summary diagnostics. • Development of alignment quality scores, that permit automatic progression through workflow.
Program-splicing • We want to use an outcome-driven approach to workflow development. • We want the users to tell us what type of data they have, and what they want to get out of the data. • Different from analysis-driven approach. • Plan to create “The Advisor” • A simple expert system that will design workflows based on user’s input. • Input delivered using questions/answers about outcomes. • Output is a workflow.
Future plans • More plug-ins • More workflows • Work with NetValue Ltd • Delivery of rapid database searching based on their SlimSearch platform (several orders of magnitude faster than BLAST) • The Advisor • Publicity blitz
What we have learnt so far • User uptake depends on: • Ease of use • Ease of access • Reliability of grid services • Publicity • Need two-tier platform: • For biologists (focus on outcomes) • For bioinformaticists/computational biologists (focus on analyses)
Acknowledgements • Ministry of Research, Science and Technology • TelstraClear • REANNZ • The NZ BioGrid Design Team: David Bryant Alexei Drummond Stephane Guindon Howard Ross www.bioinformatics.org.nz