1 / 39

EnVisioning Data Integration

EnVisioning Data Integration. SME forum 2009, Vienna. Henning Hermjakob hhe@ebi.ac.uk. EnCore. Enfin. Experiment. Model. Use cases. Target user group: Bioinformaticians, programmatic access Simple Set of “interesting” Affymetrix ids, Get the relevant UniProt accession numbers

gkane
Download Presentation

EnVisioning Data Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EnVisioning Data Integration SME forum 2009, Vienna Henning Hermjakob hhe@ebi.ac.uk

  2. EnCore Enfin Experiment Model

  3. Use cases • Target user group: Bioinformaticians, programmatic access • Simple • Set of “interesting” Affymetrix ids, • Get the relevant UniProt accession numbers • Get the surrounding interaction networks from IntAct • A bit more • Set of differentially expressed proteins in Pride • Find experiments with “similar” set of regulated genes • Get Reactome pathways • Expand protein set by IntAct, then get Reactome pathways • Even more: EnVision

  4. Red edges: Bouwmeester et al, 2005 Green edges: Rual et al, 2005 Violet edges: Stelzl et al, 2005

  5. Infrastructure Shallow integration easy addition of resources independent resources minimal centralisation easier to maintain very flexible Common Service Interface established standards well defined schema

  6. ? ? ? ? Diverse web service world database access analysis tools External service External service External service External service External service SOAP XML REST CSV plain text PERL API JAVA API • Multiple manual connections with possibly multiple technologies • Multiple result files which have to be combined manually • Difficult to keep audit trail • Much work to reproduce

  7. Enfin XML EnCORE External service External service heterogeneous external world External service External service External service EnCORE service EnCORE service EnCORE service EnCORE service standardised EnCORE world Enfin XML Enfin XML Enfin XML Enfin XML • Single entry point • One technology • No manual combination of results • Audit trial build in • Visualisation build in • Easy to reproduce EnVISION ! User interface & representation

  8. enXml – the EnCORE data exchange format XML schema standard interface to services simple and easy to understand structure generic to allow various data types stores service results and keeps an audit trail minimal restrictions for data representation high degree of freedom modelling user data need for modelling guidelines to ensure service interoperability ENFIN XML

  9. EnsMart IntAct enXml document graph Molecules Experiments Sets 1993_s_at s2 start BRCA1 s12 BRAP toUniProt s26 Q5ST83 ppiExpand s27 s29 H2AFX s28

  10. Source relation EnsMart IntAct enXml document graph Molecules Experiments Sets 1993_s_at s2 start BRCA1 s12 BRAP toUniProt s26 Q5ST83 ppiExpand s27 s29 H2AFX s28

  11. AffyMetrix probe set ID to protein ID mapping ArrayExpress micro array data BioModels search for biological models CellMINT protein localization information g:GOSt protein grouping, functional profiling IntAct protein interactions KEGG pathway pathway search PICR Protein Identifier Cross Reference PRIDE protein identification Reactome pathway search UniProt protein information retrieval Utility generation of ENFIN XML from protein IDs Existing EnCore web services

  12. Enfin XML EnCORE External service External service heterogeneous external world External service External service External service EnCORE service EnCORE service EnCORE service EnCORE service standardised EnCORE world Enfin XML Enfin XML Enfin XML Enfin XML • Single entry point • One technology • No manual combination of results • Audit trial build in • Visualisation build in • Easy to reproduce EnVISION ! User interface & representation

  13. - doService ENFIN XML performs service with standard parameters - doServiceAdv performs service with custom parameters ENFIN XML - doServiceTest only echoes the input Synchronous communication service client call service

  14. protein domain prediction tool http://www.ibi.vu.nl/programs/domainationwww/ analysis tool, not only data retrieval service possible long run times sync communication inadequate initiator for async communication model Domaination

  15. client service ENFIN XML submit - doServiceAsync submits service with standard parameters & returns job ticket - getStatus loop reports the status of the job with specified ticket ENFIN XML if status OK retrieve - retrieveResult returns the result of job with specified ticket Asynchronous web services ticket number ticket number status ticket number

  16. Primarily designed as framework for bioinformaticians Write your own client to access one or multiple services (example clients available in different programming languages) Very flexible access, can be tailored to your specific needs Full control over the client and its functionality Create your own services to extend the functionality of EnCORE Semi-automatic WSDL wrapper generation for services Workflow control with Taverna (Prototype) EnCore use

  17. EnVision • EnVision: Application of EnCore in a semi-fixed data flow • Easier to demonstrate functionality than by showing a bunch of WSDLs • Production application for the analysis of (proteomics) datasets • Source for biologist feedback • EnVision(1): Technically oriented demonstrator, access to XML configuration files, XSLT output generation • EnVision2: “Friendly” end user application • Beta version • http://www.ebi.ac.uk/enfin-srv/envision2/

  18. Protein Identifier Space Translation • PICR translates between ca. 20 protein identifier spaces • Based on sequence identity • Shows all known sequence-identifier associations, both historic and current • Based on UniParc archive of 18 million public protein sequences • Interactive use and computational access (web service, REST) • Côté RG, et al.: The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases. BMC Bioinformatics. 2007 Oct 18;8:401. http://www.ebi.ac.uk/Tools/picr

  19. Protein Identifier Space Translation http://www.ebi.ac.uk/Tools/picr

  20. Ontology Lookup Service http://www.ebi.ac.uk/ols

  21. Côté RG, Jones P, Martens L, Apweiler R, Hermjakob H.: The Ontology Lookup Service: more data and better tools for controlled vocabulary queries. Nucleic Acids Res. 2008 May 8. http://www.ebi.ac.uk/ols

  22. User DAS Proxy DAS Registry DAS Infrastructure DAS Servers Tying databases together: DAS http://www.ebi.ac.uk/dasty

  23. - Lightweight integration http://www.ebi.ac.uk/dasty

  24. Acknowledgements EU FP6   LSHG-CT-2005-518254 Pascal Kahlem

  25. ? http://www.ebi.ac.uk/enfin-srv/envision2/

  26. Examples of data modelled in enXml <experiment id="ID57"> <names> <fullName>Enfin IntAct service: find interaction partners</fullName> <shortLabel>enfin-intact</shortLabel> </names> <input>ID2</input> <result>ID56</result> </experiment> <experiment id="ID15"> <names> <fullName>Enfin Reactome service: find pathways from protein list</fullName> <shortLabel>enfin-reactome</shortLabel> </names> <input>ID8</input> <result>ID13</result> <result>ID14</result> <parameter factor="3" term="enfin-reactome-max-pathways"/> <parameter factor="2" term="enfin-reactome-min-proteins-per-pathway"/> <attribute name="enfin-reactome-add-coverage">true</attribute> </experiment>

  27. EnCORE Enfin XML External service External service heterogeneous external world External service External service External service EnCORE service EnCORE service EnCORE service EnCORE service standardised EnCORE world Enfin XML Enfin XML Enfin XML Enfin XML • Single entry point • One technology • No manual combination of results • Audit trial build in • Visualisation build in • Easy to reproduce EnVISION ! User interface & representation

  28. EnVision

More Related