120 likes | 365 Views
BioHaystack: Gateway to the Biological Semantic Web. Dennis Quan dennisq@us.ibm.com. Problems in bioinformatics. Myriad of public databases have specific facets of information about biological objects of interest (e.g., proteins, genes, etc.)
E N D
BioHaystack: Gateway to theBiological Semantic Web Dennis Quan dennisq@us.ibm.com
Problems in bioinformatics • Myriad of public databases have specific facets of information about biological objects of interest (e.g., proteins, genes, etc.) • Databases have their own access protocols, data formats, naming conventions, and means of describing relationships between objects in different databases • Different software required to view information from different databases • User must be keenly aware of which tool or site to use • Relevant information comes in fragments • Exploration process is discontinuous
A common naming convention: LSID URNs • Life Sciences Identifiers (LSIDs) are URNs for biological objects that are backed by RDF metadata: • E.g., urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:genbank:nm_001240 • LSID and LSID protocol (SOAP-based) specification sponsored by I3C and undergoing standardization by OMG • Most of the publicly available bioinformatics databases available via LSID today • PDB LSID authority online; “proxy” LSID authorities for databases such as NIH databases, SwissProt hosted by I3C • Really easy to set up LSID clients and servers • IBM Internet Technology group provides Open Source LSID client and server software for a variety of languages and platforms
RDF/XML: on demand data integration humanhemoglobinLSID atagccgtacctgcgagtctagaagct derives from atagccgtacctgcgagtctagaagct GenBank derives from + humanhemoglobinLSID oxygentransportprotein humanhemoglobinLSID oxygentransportprotein is a is a Gene Ontology + has 3D structure humanhemoglobinLSID has 3D structure Unified view PDB
Haystack: letting users interact with their data • Haystack is a tool for creating, exploring, and organizing information: • Personal information: e-mails, contacts, documents, etc. • Bioinformatics: proteins, publications, genes, etc. • Research project originating from MIT CSAIL • Uses RDF as an underlying data model • Built on Java and Eclipse, IBM’s Open Source rich client platform http://haystack.lcs.mit.edu/
Browsing highly interconnected information • Single screen presents multiple facets of a single object originating from separate databases • Users navigate space like a Web browser: hyperlinking, drag and drop, etc.
Personalization • People keep track of their information by personalizing their workspaces: • Grouping paperwork into folders • Highlighting important text in documents • Attaching sticky notes as reminders • Jotting down lists of related items • Haystack has pervasive support for annotation and allows users to group related objects together arbitrarily for their own purposes
BioHaystack • BioHaystack: application of Haystack technologies to bioinformatics problem • Integrated environment for working with biological data • Intended for end users, i.e., non-programmers • Builds on LSID, RDF, and Haystack • Integration offers the promise of lowering barriers to access to different backend systems (e.g., LSID servers, Grids, Web Services, relational databases, annotation servers) • Just as the Web browser acts as a client for Web content, BioHaystack can act as a client for biological Semantic content and services
Real world collaboration: myGrid • UK-funded joint project with the University of Manchester and other UK research institutions • RDF-based platform for supporting e-Science experiments • Real use cases; developed in collaboration with bioinformaticians • myGrid creates LSIDs and RDF metadata in the process of enacting experiments for scientists • Using BioHaystack as a browser for metadata
Registry Bioinformaticians myGrid Architecture Taverna WF Builder Query & Retrieve Query & register Workflow Execution Discovery View Annotation/description FreeFluo Enactor invoking Annotation providers Interface Description Store data/ knowledge Pedro Annotation tool mIR Others WSDL Service Providers Soap- lab Vocabulary Haystack Provenance Browser Ontology Store Data descriptions Scientists Courtesy of Professor Carole Goble, University of Manchester
BioHaystack + myGrid Courtesy of Professor Carole Goble, University of Manchester
Thank you for your attention • Dennis Quan, dennisq@us.ibm.com (IBM Watson Research) • Haystack project home page (download available May 24) • http://haystack.lcs.mit.edu/ • IBM LSID home page • http://www.ibm.com/developerworks/oss/lsid/ • myGrid home page • http://www.mygrid.org.uk/ • See also our session on constructing Haystack applications: • Developer’s Day, Saturday, 4:30pm