1 / 12

BioHaystack: Gateway to the Biological Semantic Web

BioHaystack is an integrated environment for working with biological data, providing a solution to the problems in bioinformatics by integrating various databases and tools. It allows users to browse highly interconnected information, personalize their workspaces, and collaborate with other scientists. Built on LSID, RDF, and Haystack technologies, BioHaystack offers a user-friendly platform for accessing and analyzing biological data.

boydk
Download Presentation

BioHaystack: Gateway to the Biological Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioHaystack: Gateway to theBiological Semantic Web Dennis Quan dennisq@us.ibm.com

  2. Problems in bioinformatics • Myriad of public databases have specific facets of information about biological objects of interest (e.g., proteins, genes, etc.) • Databases have their own access protocols, data formats, naming conventions, and means of describing relationships between objects in different databases • Different software required to view information from different databases • User must be keenly aware of which tool or site to use • Relevant information comes in fragments • Exploration process is discontinuous

  3. A common naming convention: LSID URNs • Life Sciences Identifiers (LSIDs) are URNs for biological objects that are backed by RDF metadata: • E.g., urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:genbank:nm_001240 • LSID and LSID protocol (SOAP-based) specification sponsored by I3C and undergoing standardization by OMG • Most of the publicly available bioinformatics databases available via LSID today • PDB LSID authority online; “proxy” LSID authorities for databases such as NIH databases, SwissProt hosted by I3C • Really easy to set up LSID clients and servers • IBM Internet Technology group provides Open Source LSID client and server software for a variety of languages and platforms

  4. RDF/XML: on demand data integration humanhemoglobinLSID atagccgtacctgcgagtctagaagct derives from atagccgtacctgcgagtctagaagct GenBank derives from + humanhemoglobinLSID oxygentransportprotein humanhemoglobinLSID oxygentransportprotein is a is a Gene Ontology + has 3D structure humanhemoglobinLSID has 3D structure Unified view PDB

  5. Haystack: letting users interact with their data • Haystack is a tool for creating, exploring, and organizing information: • Personal information: e-mails, contacts, documents, etc. • Bioinformatics: proteins, publications, genes, etc. • Research project originating from MIT CSAIL • Uses RDF as an underlying data model • Built on Java and Eclipse, IBM’s Open Source rich client platform http://haystack.lcs.mit.edu/

  6. Browsing highly interconnected information • Single screen presents multiple facets of a single object originating from separate databases • Users navigate space like a Web browser: hyperlinking, drag and drop, etc.

  7. Personalization • People keep track of their information by personalizing their workspaces: • Grouping paperwork into folders • Highlighting important text in documents • Attaching sticky notes as reminders • Jotting down lists of related items • Haystack has pervasive support for annotation and allows users to group related objects together arbitrarily for their own purposes

  8. BioHaystack • BioHaystack: application of Haystack technologies to bioinformatics problem • Integrated environment for working with biological data • Intended for end users, i.e., non-programmers • Builds on LSID, RDF, and Haystack • Integration offers the promise of lowering barriers to access to different backend systems (e.g., LSID servers, Grids, Web Services, relational databases, annotation servers) • Just as the Web browser acts as a client for Web content, BioHaystack can act as a client for biological Semantic content and services

  9. Real world collaboration: myGrid • UK-funded joint project with the University of Manchester and other UK research institutions • RDF-based platform for supporting e-Science experiments • Real use cases; developed in collaboration with bioinformaticians • myGrid creates LSIDs and RDF metadata in the process of enacting experiments for scientists • Using BioHaystack as a browser for metadata

  10. Registry Bioinformaticians myGrid Architecture Taverna WF Builder Query & Retrieve Query & register Workflow Execution Discovery View Annotation/description FreeFluo Enactor invoking Annotation providers Interface Description Store data/ knowledge Pedro Annotation tool mIR Others WSDL Service Providers Soap- lab Vocabulary Haystack Provenance Browser Ontology Store Data descriptions Scientists Courtesy of Professor Carole Goble, University of Manchester

  11. BioHaystack + myGrid Courtesy of Professor Carole Goble, University of Manchester

  12. Thank you for your attention • Dennis Quan, dennisq@us.ibm.com (IBM Watson Research) • Haystack project home page (download available May 24) • http://haystack.lcs.mit.edu/ • IBM LSID home page • http://www.ibm.com/developerworks/oss/lsid/ • myGrid home page • http://www.mygrid.org.uk/ • See also our session on constructing Haystack applications: • Developer’s Day, Saturday, 4:30pm

More Related