390 likes | 427 Views
EnVisioning Data Integration. SME forum 2009, Vienna. Henning Hermjakob hhe@ebi.ac.uk. EnCore. Enfin. Experiment. Model. Use cases. Target user group: Bioinformaticians, programmatic access Simple Set of “interesting” Affymetrix ids, Get the relevant UniProt accession numbers
E N D
EnVisioning Data Integration SME forum 2009, Vienna Henning Hermjakob hhe@ebi.ac.uk
EnCore Enfin Experiment Model
Use cases • Target user group: Bioinformaticians, programmatic access • Simple • Set of “interesting” Affymetrix ids, • Get the relevant UniProt accession numbers • Get the surrounding interaction networks from IntAct • A bit more • Set of differentially expressed proteins in Pride • Find experiments with “similar” set of regulated genes • Get Reactome pathways • Expand protein set by IntAct, then get Reactome pathways • Even more: EnVision
Red edges: Bouwmeester et al, 2005 Green edges: Rual et al, 2005 Violet edges: Stelzl et al, 2005
Infrastructure Shallow integration easy addition of resources independent resources minimal centralisation easier to maintain very flexible Common Service Interface established standards well defined schema
? ? ? ? Diverse web service world database access analysis tools External service External service External service External service External service SOAP XML REST CSV plain text PERL API JAVA API • Multiple manual connections with possibly multiple technologies • Multiple result files which have to be combined manually • Difficult to keep audit trail • Much work to reproduce
Enfin XML EnCORE External service External service heterogeneous external world External service External service External service EnCORE service EnCORE service EnCORE service EnCORE service standardised EnCORE world Enfin XML Enfin XML Enfin XML Enfin XML • Single entry point • One technology • No manual combination of results • Audit trial build in • Visualisation build in • Easy to reproduce EnVISION ! User interface & representation
enXml – the EnCORE data exchange format XML schema standard interface to services simple and easy to understand structure generic to allow various data types stores service results and keeps an audit trail minimal restrictions for data representation high degree of freedom modelling user data need for modelling guidelines to ensure service interoperability ENFIN XML
EnsMart IntAct enXml document graph Molecules Experiments Sets 1993_s_at s2 start BRCA1 s12 BRAP toUniProt s26 Q5ST83 ppiExpand s27 s29 H2AFX s28
Source relation EnsMart IntAct enXml document graph Molecules Experiments Sets 1993_s_at s2 start BRCA1 s12 BRAP toUniProt s26 Q5ST83 ppiExpand s27 s29 H2AFX s28
AffyMetrix probe set ID to protein ID mapping ArrayExpress micro array data BioModels search for biological models CellMINT protein localization information g:GOSt protein grouping, functional profiling IntAct protein interactions KEGG pathway pathway search PICR Protein Identifier Cross Reference PRIDE protein identification Reactome pathway search UniProt protein information retrieval Utility generation of ENFIN XML from protein IDs Existing EnCore web services
Enfin XML EnCORE External service External service heterogeneous external world External service External service External service EnCORE service EnCORE service EnCORE service EnCORE service standardised EnCORE world Enfin XML Enfin XML Enfin XML Enfin XML • Single entry point • One technology • No manual combination of results • Audit trial build in • Visualisation build in • Easy to reproduce EnVISION ! User interface & representation
- doService ENFIN XML performs service with standard parameters - doServiceAdv performs service with custom parameters ENFIN XML - doServiceTest only echoes the input Synchronous communication service client call service
protein domain prediction tool http://www.ibi.vu.nl/programs/domainationwww/ analysis tool, not only data retrieval service possible long run times sync communication inadequate initiator for async communication model Domaination
client service ENFIN XML submit - doServiceAsync submits service with standard parameters & returns job ticket - getStatus loop reports the status of the job with specified ticket ENFIN XML if status OK retrieve - retrieveResult returns the result of job with specified ticket Asynchronous web services ticket number ticket number status ticket number
Primarily designed as framework for bioinformaticians Write your own client to access one or multiple services (example clients available in different programming languages) Very flexible access, can be tailored to your specific needs Full control over the client and its functionality Create your own services to extend the functionality of EnCORE Semi-automatic WSDL wrapper generation for services Workflow control with Taverna (Prototype) EnCore use
EnVision • EnVision: Application of EnCore in a semi-fixed data flow • Easier to demonstrate functionality than by showing a bunch of WSDLs • Production application for the analysis of (proteomics) datasets • Source for biologist feedback • EnVision(1): Technically oriented demonstrator, access to XML configuration files, XSLT output generation • EnVision2: “Friendly” end user application • Beta version • http://www.ebi.ac.uk/enfin-srv/envision2/
Protein Identifier Space Translation • PICR translates between ca. 20 protein identifier spaces • Based on sequence identity • Shows all known sequence-identifier associations, both historic and current • Based on UniParc archive of 18 million public protein sequences • Interactive use and computational access (web service, REST) • Côté RG, et al.: The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases. BMC Bioinformatics. 2007 Oct 18;8:401. http://www.ebi.ac.uk/Tools/picr
Protein Identifier Space Translation http://www.ebi.ac.uk/Tools/picr
Ontology Lookup Service http://www.ebi.ac.uk/ols
Côté RG, Jones P, Martens L, Apweiler R, Hermjakob H.: The Ontology Lookup Service: more data and better tools for controlled vocabulary queries. Nucleic Acids Res. 2008 May 8. http://www.ebi.ac.uk/ols
User DAS Proxy DAS Registry DAS Infrastructure DAS Servers Tying databases together: DAS http://www.ebi.ac.uk/dasty
- Lightweight integration http://www.ebi.ac.uk/dasty
Acknowledgements EU FP6 LSHG-CT-2005-518254 Pascal Kahlem
? http://www.ebi.ac.uk/enfin-srv/envision2/
Examples of data modelled in enXml <experiment id="ID57"> <names> <fullName>Enfin IntAct service: find interaction partners</fullName> <shortLabel>enfin-intact</shortLabel> </names> <input>ID2</input> <result>ID56</result> </experiment> <experiment id="ID15"> <names> <fullName>Enfin Reactome service: find pathways from protein list</fullName> <shortLabel>enfin-reactome</shortLabel> </names> <input>ID8</input> <result>ID13</result> <result>ID14</result> <parameter factor="3" term="enfin-reactome-max-pathways"/> <parameter factor="2" term="enfin-reactome-min-proteins-per-pathway"/> <attribute name="enfin-reactome-add-coverage">true</attribute> </experiment>
EnCORE Enfin XML External service External service heterogeneous external world External service External service External service EnCORE service EnCORE service EnCORE service EnCORE service standardised EnCORE world Enfin XML Enfin XML Enfin XML Enfin XML • Single entry point • One technology • No manual combination of results • Audit trial build in • Visualisation build in • Easy to reproduce EnVISION ! User interface & representation