380 likes | 523 Views
DAS developer workshop. Tim Hubbard th@sanger.ac.uk 26th February 2007 Wellcome Trust Sanger Institute. Distributed Annotation System. Origins: xml client/server specification (http://biodas.org/) Lincoln Stein, Sean Eddy, Robin Dowell and LaDeana Hillier acedb based prototype server
E N D
DAS developer workshop Tim Hubbard th@sanger.ac.uk 26th February 2007 Wellcome Trust Sanger Institute
Distributed Annotation System • Origins: • xml client/server specification (http://biodas.org/) • Lincoln Stein, Sean Eddy, Robin Dowell and LaDeana Hillier • acedb based prototype server • Java based prototype client • Dowell, R.D., Jokerst, R.M., Day, A., Eddy, S.R. & Stein, L. (2001) BioMedCentral Bioinformatics 2. • Genome campus adoption • Initially via Ensembl becoming a DAS client (now also a DAS server) • Code: Dazzle and Proserver servers; Bio::DASLite and biojava client libraries • Hosts DAS registry
DAS in a nutshell • Standardized set of web services • Reference servers (the sequence) • Annotation servers (features: chr:start-end) • Alignment servers (chr:start-end matches chr:start-end) • Identifier based servers (ref item X rather than coordinate) • Standardization allows clients to connect to different DAS sources without additional programming
Data integration • Complete genomes provide the framework to pull all biological data together such that each piece says something about biology as a whole • Biology is too complex for any organisation to have a monopoly of ideas or data • The more organisations provide data or analysis separately, the harder it becomes for anyone to make use of the results
Utility of bioinformatics Scientific impact Too little bioinformatics Too many databases Too diverse interfaces
Split data and presentation • Databases responsible for curating data and serving it as primitive datatypes defined by open standards (high cost) • Different front ends or components of front ends compete for users (development of each low cost) c.f. browsers.
e! contigview epigenome Apollo 3D structure Servers Campus DAS systems Clients Genome Coordinates Dazzle CDS Coordinates Sources Ensembl Pfam Swissprot PubMed Proserver e! geneview Protein Coordinates LDAS otterlace Stable Identifiers Pfam Sequence Alignments Registry
DAS infrastructure status • Lots of progress • Servers: Dazzle, Proserver, Bio::Daslite • Clients: Ensembl, Vega, Dasty, SPICE, Pfam, Jalview, Pepper, IGB • >200 sources in DAS registry (http://www.dasregistry.org/) • Broadly adopted by Ensembl, biosapiens, efamily, ZF-models, eProtein • Lots still to do… • Slow adoption rate, particularly in US: upload still easier than distributed… • Lack of searching, write back: slow development of DAS2 • Encourage/facilitate programming against DAS servers • Opportunities • Source ranking, credit, social networking • Inter-client communications protocol • Async delivery/caching; servers built on servers/workflows • Alternative entry points from servers? Next left/right? Date of addition?
New synteny aware vertebrate curation environment based on rewrite of acedb (zmap)
Consensus Annotation Assembly DAS viewer Annotation Servers of data derived from other servers
Consensus Annotation Assembly DAS viewer Annotation Servers of data derived from other servers tracing back evidence
Acknowledgements Ewan Birney Tony Cox Thomas Down Rob Finn Stefan Graf David Jackson Andreas Kahari Eugene Kulesha Roger Pettett Matt Pocock Andreas Prlic James Smith Jim Stalker Ensembl/Sanger Web team efamily, biosapiens, eProtein Zebrafish analysis (ZF-models) Anacode/Acedb (otterlace/Zmap)
Coordinate Synchronisation Server Server Server Server Sequence Programs Annotation Viewer Distributed Annotation External Contributors Database providers html xml Users xml Hubbard & Birney, Open annotation offers a democratic solution to genome sequencing (1999) Nature, 403, 825.
WWW browser Ensembl MySQL Database Ensembl WWW server http BioJava DAS viewer Data Adaptor Dazzle BioJava DAS server XFF BioJava DAS client library DASGFF (http) Apollo viewer/ editor Data Adaptor Dazzle BioJava DAS server Data Adaptor AceDB GFF files Local GFF files BioJava DAS implementation
WWW browser Ensembl MySQL Database Ensembl WWW server http BioJava DAS client library Data Adaptor Dazzle BioJava DAS server Dazzle BioJava DAS server Data Adaptor AceDB GFF files Data from DAS servers integrated into web displays
DAS Server DAS Server DAS Server Viewer DAS v Web Different Web sites Different interfaces No integration Web Model: links DAS Model: Different DAS sites Automatic Integration Single interface
Distributed Annotation System • xml client/server specification (http://biodas.org/) • Lincoln Stein, Sean Eddy, Robin Dowell and LaDeana Hillier • acedb based prototype server • Java based prototype client • Dowell, R.D., Jokerst, R.M., Day, A., Eddy, S.R. & Stein, L. (2001) BioMedCentral Bioinformatics 2. • Ensembl (http://www.ensembl.org/das/) • das mailing list • server/client combination available (alpha release) • Based on BioJava, with BioJava viewer • Interface to Apollo, as an alternative viewer
External data from DAS sources Data integration with Ensembl User data (Upload from flat file) NCBI data (DAS server)
All data from DAS sources Virtual data integration User data Vega genes Ensembl
DAS like model applied to other data types • features on a linear sequence • DNA, protein sequences, protein structures • Campus wide MRC ‘grid’ protein family integration project (SCOP, CATH, Pfam, InterPro, MSD) will develop DAS for protein structures. • annotation connected to stable identifiers • References, experimental observations • Sanger note book, attached to genes • group relationships between identifiers • protein-protein interactions; protein families, orthologues
Ensembl MySQL Database Ensembl WWW server Dazzle BioJava DAS server Upload to Sanger DAS server Setup local DAS server and load Data into it Dazzle BioJava DAS server Data from DAS servers integrated into web displays WWW browser Data mapped to Genome Sequence Sanger
Ensembl MySQL Database Ensembl WWW server Dazzle BioJava DAS server Setup local DAS server and load Data into it Data from DAS servers integrated into web displays WWW browser Data mapped to Genome Sequence Sanger
Ensembl MySQL Database Virtual server using Ensembl WWW code Dazzle BioJava DAS server Setup local DAS server and load Data into it Dazzle BioJava DAS server Data from DAS servers integrated into web displays CustomWWW views Data mapped to Genome Sequence Sanger
DAS annotation From other research projects HumanENSGxxx MouseENSMUSGxxx Zebrafish Worm? Yeast? Orthologueview pages OTTOxxxxx1
Identifier Synchronisation Server Server xml 2D Distributed Annotation External Contributors Database providers Server xml Viewer Users
Component models • Do one thing, but do it well • Would rely on databases providing public APIs to components of their services • Interoperability: standardised return (e.g. XML) as well as standardised query interface • Example: OpenDoc • Apple attempt to split desktop applications into components, which users would mix and match. Would have allowed competition at component level. Failed. (Microsoft? Poor implementation?)
Database apoptosis • Software developers think nothing of rewriting software and throwing the old version away • More features, more complexity, more confusing (different, incompatible ways of getting same or worse result) • Retire feature if another database does it better and it can be used as a component?
Solution 3: integrate using DAS • Many Ensembl web views are DAS clients • Whole of Ensembl is a DAS server (from release38) • Ensembl site integrated with other DAS clients (e.g. SPICE for protein structure)
Integration using DAS • Whole of Ensembl is a DAS server (from release38) • Viewing Ensembl annotation on PDB • SPICE DASclient linkedto contigview