510 likes | 635 Views
BioMOBY: An architecture for interoperability. Benjamin Good Wilkinson Laboratory iCAPTURE Centre University of British Columbia. Acknowledgements. Mark Wilkinson , Edward Kawas, Nina Opushneva – iCAPTURE @ UBC Phillip Lord, Martin Senger – myGrid @ U Manchester
E N D
BioMOBY:An architecture for interoperability Benjamin Good Wilkinson Laboratory iCAPTURE Centre University of British Columbia
Acknowledgements • Mark Wilkinson , Edward Kawas, Nina Opushneva – iCAPTURE @ UBC • Phillip Lord, Martin Senger – myGrid @ U Manchester • Heiko Schoof, Rebecca Ernst – MIPS • Paul Gordon - University of Calgary • Carole Goble – myGrid @ U Manchester • Lincoln Stein - CSHL • Damian Gessler, Andrew Farmer, Gary Schiltz - NCGR • Bill Crosby, Matthew Links, Luke McCarthy – U of S • Midori Harris – EBI & GO Consortium • Mike Niemi – IBM • Fiona Cunningham, Shuly Avraham – CSHL • Ken Stuebe – SDSC • Richard Bruskiewich – IRRI
Outline • What BioMOBY is • Why it was needed • How it works • Current Status • Works in Progress
What BioMOBY is A generic solution for sharing distributed computational resources
Why it was/is needed High throughput Biology SGD SGD SGD SGD SGD SGD SGD SGD
Why it was/is needed High throughput Biology SGD SGD SGD SGD SGD SGD SGD TAIR
Why it was needed High throughput Biology IRRI Gramene SGD SGD SGD SGD MIPS TAIR
Why it was/is needed High throughput Biology IRRI Gramene SGD SGD SGD GO TAIR MIPS
Why it was/is needed High throughput Biology IRRI Gramene SGD SGD ?!?!? GO IPGRI MIPS
An Architecture for Dis- Integration? DB1 Program DB2
Web ServicesAnother architecture for Dis-Integration? API1 API2 API3 WuBlast Genbank NCI
BioMOBYAn architecture for Integration Program DB2 DB1
Note the Target Audience • Not NCBI • Small to medium sized resource providers • First priority to support their own users • Limited time and money • Makes certain options impossible • No massive data warehouse • No standardization of implementation • (database, programming language)
Outline • What BioMOBY is • Why it was needed • How it works • Current Status
The Moby plan • Design an ontological framework for data-type creation • Let independent service providers build data-types using this framework • Use these data-types to define web service interfaces. • Register these interfaces in a “yellow pages” • Machines can find an appropriate service • Machines can execute that service unattended
Object Ontology • Data types defined in an open, shared GO-like ontology • Nodes define data Classes • Edges define the relationships between Classes • Edges define one of three relationships • ISA • Inheritance relationship • All properties of the parent are present in the child • HASA • Container relationship of ‘exactly 1’ • HAS • Container relationship with ‘1 or more’
Data-typing is the key • Each Object in the ontology maps to a simple, concise XML Schema • This rigid yet easily extensible structure facilitates serialization and parsing in any language. • Sharing a framework for creating data-types turns out to be largely sufficient to achieve interoperability
The Simplest Data-Type <Object namespace=‘NCBI_gi’ id=‘111076’/> The combination of a namespace and an identifier within that namespace uniquely identify a data ‘entity’. (Not its representation) Object
MOBY Primitives ISA DateTime ISA Float ISA Integer <Integer namespace=‘’ id=‘’>38</Integer> ISA Object String
A MOBY Data-Type <VirtualSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> </ VirtualSequence > ISA Integer HASA ISA Object String ISA Virtual Sequence
A MOBY Data-Type <GenericSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”> ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String> </ GenericSequence > ISA Integer HASA HASA ISA Object String ISA ISA Virtual Sequence Generic Sequence
A MOBY Data-Type <DNASequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”> ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String> </ DNASequence > ISA Integer HASA HASA ISA Object String ISA ISA ISA Virtual Sequence Generic Sequence DNA Sequence
A portion of the MOBY-S Object Ontology …community-built! 170 registered by 34 authorities
MOBY-S follows the typical Web Service Paradigm MOBY hosts & services Sequence Express. Protein Alleles … MOBY Central - yellowpages Align Phylogeny Primers Sequence Alignment Gene names
Outline • What BioMOBY is • Why it was needed • How it works • Current status • Works in progress
Moby Stats • Mailing list count 162 members • Google Scholar • ‘BioMOBY’ 103 • Citations of original BioMOBY paper 52 • Google links to biomoby.org 322
Deployed Moby Services • Services registered 478 • Services developers (by contact email) 69 http://castor.brc.mcw.edu/files/mobysphere/ > 10 < 10 Thanks to Simon Twigger
Major Implementations • PlaNet consortium • European consortium of plant databases • 121 Services • European Bioinformatics Institute SOAPLab, myGrid • National Bioinformatics Institute of Spain • Nationwide initiative • 35 public services (plus many more on private registry)
It seems to be working! Why? • It provides useful functionality for the target audience. • Functionality not currently available from any other WS/SWS project • It is not difficult to deploy services.
Outline • What BioMOBY is • Why it was needed • How it works • Current Status • Works in Progress
Is it useful outside of these consortia? • Many public services now available (via passive altruism). • As a result, interesting clients are emerging.
Client style 1,2,3 • Power User when you want to do what you already know how to do • Taverna • Produced by the myGrid Consortium • Graphical workflow composer and invoker • Supports BioMOBY services (and many others)
Client style 1,2,3 • Quick and Dirty You know what you have and what you want, but you don’t know how to make it happen • MobyGraphs • Martin Senger of myGrid • Discovers service connectivity between two datatypes • PlaNet Service Aggregator • Precomputes all possible workflows starting from a single input
Client style 1,2,3 • Exploration Mode • Gbrowse_moby • Ahab Starting Data
Ahab • Java Server Pages • Simultaneous service invocations • Session stored as RDF graph • Results displayed with clickable graph. • 0_1 Runs all possible services • 0_2 Gives user control http://bioinfo.icapture.ubc.ca/bgood/Ahab.html
Core Development • Make service development even easier • Expand myGrid collaboration • Migrate to their registry & service ontology • Enhance support for BioMOBY in Taverna • Validation of workflows • Workflow construction “wizards” • Continue Development of Ahab • Visualization
Conclusions • BioMOBY was designed to allow distributed communities to share their computational resources, it seems to be working • Many new opportunities for real distributed data integration are starting to appear
Sponsors BC Bioinformatics Training Program BioMOBY • National Science Foundation (NSF), USA • Canadian Bioinformatics Resource, NRC, Halifax • Open-Bio Foundation • IBM