160 likes | 485 Views
BioJava in 2002. An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD). What is BioJava?. Java code (Java2 required – 1.2 and higher) Open-Source Bioinformatics Library for building Applications Sequence Centric (we’d love to do more)
E N D
BioJava in 2002 An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD)
What is BioJava? • Java code (Java2 required – 1.2 and higher) • Open-Source • Bioinformatics • Library for building Applications • Sequence Centric (we’d love to do more) • Part of the Open Bioinformatics Foundation (OBF) • Drop biojava.jar into your CLASSPATH & go
Where is BioJava? • http://www.biojava.org • mailto:biojava-l@biojava.org • #biojava on irc.openprojects.net
Who is BioJava? • 35+ Developers in most continents and time-zones • Core team >5 individuals • Ever expanding user group
What’s Been There for a While? • Sequences with hierarchical features • Sequence databases • Sequence IO • Various sequence formats (embl, genbank, gff, swissprot…) • Object model can be bypassed for high-performance scanning • Probability distributions over symbols and Dynamic programming toolkit • Blast Parsers
What’s Reasonably New? • TagValue parser API • Sequence Search APIs • Interoperable with BioJava XML-based parsers for many common sequence search algorithms • Pure-Java SSAHA implementation • Bit-packed sequence storage • Taxonomies • Literature References • Phred
What’s Recently Improved? • Gap handling • Consistent algebra for representing ambiguities (e.g. n), compound symbols (e.g. codons) and gaps • DAS Client is now very robust • Distributed sequence API allows DAS-like distributed sequence databases to be easily built and implemented • More ‘framey’ annotation bundles • Sequence Rendering • Looks much better now • Handles ‘dotter-style’ 2d rendering • We now actually write JUnit Tests!
Java 1.4-reliant Source • Java 1.4 offers APIs that are really useful for Bioinformatics • Logging • NIO interfaces for fast IO and raw data access • Regular expressions • Cascading Exceptions • Biojava code relying on 1.4 APIs are conditionally built • SSAHA implementation • Some parsers and handlers for TagValue • Restriction enzyme digests
OBDA and Fun Trips • Sponsored by O’Reilly and Electric Genetics • Developers attended a two-part Hackathon in Tuscon, AZ, USA and Cape Town, South Africa • Representatives from BioJava, BioPerl, BioPython, BioRuby, Ensembl, Emboss and others • We hammered out and implemented a range of standards designed from the ground up to be • Interoperable between the Bio* projects • Relatively easy to implement from scratch • We drank lots of red wine
OBDA Support • BIOCORBA – corba sequence interfaces • BioSQL – relational tables and standard semantics for storing sequences • BioFetch – cgi-bin-based sequence fetching • XEMBL – xml-based sequence fetching • Bio Directories – configuration file for resolving resources • Flat-file Indexing – fetch records by ID and secondary ID from multiple ASCII files
Things We’d Like To Do in the Near Future • Support non-DNA areas of Bioinformatics • Cladistics, evolutionary trees, clusters • Expression data • Proteomics • Networks/pathways • Biochemical reactions • Integrate pre- and post-1.4 exception systems • Modify the change notification system • Better synchronization and transaction support • Easier to optimize events that don’t have listeners • More robust handling of event cascades
What Will We See in BioJava 2? • Pervasive use of Ontologies • Storing annotating data • Definition of processing pipelines (e.g. customizing parsers) • Bindings between BioJava interfaces and external data sources • Das, biosql, biocorba • Pervasive querying making any BioJava application an Object Data Store with easy routes for data-providers to optimize searches • Much more code generation • Push most repetitive code into code generators • Auto-generate much of the event notification web • Much better transactionallity • Reduce implementation cost for developers • Expose any/all BioJava instances through SOAP • Naming and Directory Services
And the Biggest Change of All? • Make the library accessible to casual developers for writing throw-away scripts as well as system architects • Documentation • Tutorials • Training • Utility classes (e.g. SeqIOTools)