410 likes | 551 Views
European School of Bioinformatics. Data integration with EnCORE. Florian Reisinger florian@ebi.ac.uk. background problems working with web resources infrastructure existing EnCORE services communication model (sync / async) enXml + example data service combination (workflows)
E N D
European School ofBioinformatics Data integration with EnCORE Florian Reisinger florian@ebi.ac.uk
background problems working with web resources infrastructure existing EnCORE services communication model (sync / async) enXml + example data service combination (workflows) how to use the system user interfaces Overview
Infrastructure Shallow integration easy addition of resources independent resources minimal centralisation easier to maintain very flexible Common Service Interface established standards well defined schema
different ways to access data human interfaces: web page based forms, clients, queries programmatic interfaces: SOAP, REST, APIs, XML, text, … different programming languages Perl, Java, C#/C++,… different data models e.g. sequence as a FASTA or plain string various ways to model proteins, genes,… multiple identifier for one biological entity (UniProt, IPI, …) Common problems
? ? ? ? Diverse web service world database access analysis tools External service External service External service External service External service SOAP XML REST CSV plain text PERL API JAVA API • Multiple manual connections with possibly multiple technologies • Multiple result files which have to be combined manually • Difficult to keep audit trail • Much work to reproduce
Enfin XML EnCORE External service External service heterogeneous external world External service External service External service EnCORE service EnCORE service EnCORE service EnCORE service standardised EnCORE world Enfin XML Enfin XML Enfin XML Enfin XML • Single entry point • One technology • No manual combination of results • Audit trial build in • Visualisation build in • Easy to reproduce EnVISION ! User interface & representation
Web Service Definition Language Similar to a Java interface, it represents a contract between the service requestor and the service provider, but in contrast is designed to be language and platform independent. Mostly used for SOAP services Definitions: Interface information for all publicly available functions Data types for all requests and responses Binding information about the transport protocol to be used Address information for locating the specified service It contains all the information a client needs to use the service. Auto-generation of clients Web Service Interface: WSDL
Service technologies Why using SOAP, XML and XML schema? • platform and language independent • well established technologies • already used in various standards • very good support in nearly all languages • well defined structure • can be validated • syntactically according to the schema • semantically using a validator tool
AffyMetrix probe set ID to protein ID mapping ArrayExpress micro array data BioModels search for biological models CellMINT protein localization information g:GOSt protein grouping, functional profiling IntAct protein interactions KEGG pathway pathway search PICR Protein Identifier Cross Reference PRIDE protein identification Reactome pathway search UniProt protein information retrieval Utility generation of ENFIN XML from protein IDs Existing EnCore web services
- doService ENFIN XML performs service with standard parameters - doServiceAdv performs service with custom parameters ENFIN XML - doServiceTest only echoes the input Synchronous communication service client call service
protein domain prediction tool http://www.ibi.vu.nl/programs/domainationwww/ analysis tool, not only data retrieval service possible long run times sync communication inadequate initiator for async communication model Domaination
client service ENFIN XML submit - doServiceAsync submits service with standard parameters & returns job ticket - getStatus loop reports the status of the job with specified ticket ENFIN XML if status OK retrieve - retrieveResult returns the result of job with specified ticket Asynchronous web services ticket number ticket number status ticket number
FuncNet: protein function comparison http://www.funcnet.eu • Distributed protein function comparison pipeline • Given a set of proteins with some shared function... ... which of these other proteins also share that function? • Aggregation of pairwise functional similarity predictions between query and reference proteins • Example for test case: Predicting mitotic spindle proteins Many other uses, for example: Finding proteins related to LKB1 tumor suppressor
CODA Protein lists IN Protein pairs OUT GECO Front-end service All communication uses SOAP. hiPPI etc... JACOP Use case FuncNet
Enfin XML EnCORE NCBI External service g:GOSt External service PICR Reactome UniProt heterogeneous external world External service External service External service EnCORE service EnCORE service EnCORE service EnCORE service standardised EnCORE world Enfin XML Enfin XML Enfin XML Enfin XML EnVISION User interface & representation
enXml – the EnCORE data exchange format XML defined by XML schema standard interface to services simple and easy to understand structure generic to allow various data types stores service results and keeps an audit trail minimal restrictions for data representation high degree of freedom modelling user data need for modelling guidelines to ensure service interoperability ENFIN XML
<molecule id="ID1"> <names> <fullName>Breast cancer type 1 susceptibility protein</fullName> </names> <xrefs> <primaryRef refTypeAc="MI:0358" refType="primary-reference" id="P38398" dbAc="MI:0486" db="UniProt"/> </xrefs> <moleculeType termAc="MI:0326" term=“protein"/> <attribute name="UniProt keywords"> Zinc-finger;Zinc;Repeat;Polymorphism;Phosphorylation;Nuclear protein; Metal-binding;DNA-binding;DNA repair;DNA damage;Disease mutation; Cell cycle;Anti-oncogene;3D-structure </attribute> </molecule> Examples of data modelled in enXml
<set id="ID12"> <participant moleculeRef="ID1"/> <participant moleculeRef="ID2"/> </set> <set id="ID33"> <names> <fullName>IntAct interaction</fullName> </names> <xrefs> <primaryRef id="EBI-1263051" db="intact“ dbAc=“MI:0469”/> </xrefs> <setType id=“SO:0001093" db=“SO" term="protein_protein_interaction"/> <participant moleculeRef="ID1"/> <participant moleculeRef="ID7"/> <attribute nameAc="MI:0001" name="interaction detection method">MI:0006</attribute> </set> Examples of data modelled in enXml
<experiment id="ID57"> <names> <fullName>Enfin IntAct service: find interaction partners</fullName> <shortLabel>enfin-intact</shortLabel> </names> <input>ID2</input> <result>ID56</result> </experiment> <experiment id="ID15"> <names> <fullName>Enfin Reactome service: find pathways from protein list</fullName> <shortLabel>enfin-reactome</shortLabel> </names> <input>ID8</input> <result>ID13</result> <result>ID14</result> <parameter factor="3" term="enfin-reactome-max-pathways"/> <parameter factor="2" term="enfin-reactome-min-proteins-per-pathway"/> <attribute name="enfin-reactome-add-coverage">true</attribute> </experiment> Examples of data modelled in enXml
Primarily designed as framework for bioinformaticians Write your own client to access one or multiple services (example clients available in different programming languages) Very flexible access, can be tailored to your specific needs Full control over the client and its functionality Create your own services to extend the functionality of EnCORE Additional “instant” usage and end user access Working example clients User interfaces EnVision / EnVision2, web interface to EnCORE services Taverna, workflow management tool How to use the system
EnVISION simple one page web interface flexible mechanism of service connection possibility to specify options for service calls simple XSL transform of resulting Enfin XML EnVISION 2(prototype) web application for end user structured, user friendly representation of the data supports multiple datasets links to source databases for more detailed information Taverna (external project) powerful workflow design & management tool easy to use with EnCORE services User Interfaces
EnVision 2 detail views for human readable presentation of service results simple start page implicit EnCORE web service calls supports multiple datasets
EnVision 2 detail views for human readable presentation of service results modular for custom visualisation + simple start page implicit EnCORE web service calls + Externally run workflows (Taverna) supports multiple datasets
Taverna …