250 likes | 373 Views
Recycling Services and Workflows through Discovery and Reuse. Chris Wroe 1 , Phillip Lord 1 , Simon Miles 2 , Juri Papay 2 , Luc Moreau 2 , Carole Goble 1 1 University of Manchester, 2 University of Southampton, . Workflows.
E N D
Recycling Services and Workflows through Discovery and Reuse Chris Wroe1, Phillip Lord1, Simon Miles2, Juri Papay2, Luc Moreau2, Carole Goble1 1 University of Manchester, 2 University of Southampton,
Workflows • myGrid has concentrated on making workflows as straightforward to build from existing web services as possible. Taverna Workflow Diagram
But…. • Doesn’t mean they are disposable • Scientists build more complex protocols • Time investment • Workflows become a useful resource capturing what services work well together to perform a goal
Common bioinformatics tasks • Task 1 • I’ve got a sequence what genes may be in it? • Task 2 • I’ve got a gene what is known about this gene? • Task 3 • I’ve got a gene what is known about the protein coded for that gene? • Task 4 • I’ve got a microarray result, analyse it for up- regulated, down regulated, clusters of gene regulation?
Promoting sharing • Workflows intended to make it more straightforward to share • Re-locatable • Explainable • How do you promote sharing: • How do you find them and component services in the first place? • Need a way of describing what they do • Need a place to put descriptions • Need a way of searching for them
Requirements • User centric model for describing services and workflows (bioinformatics focus) • Architecture & middleware components • User applications for describing and searching for workflows and services
myGrid’s model of services operation name, description input output task method resource application • Compatible with UDDI & WSDL • Compatible with OWL-s • Compatible with bioMoby • User centric • Data centric • Operation centric
myGrid’s model of services operation name, description input output task method resource application • Simple description of workflows • overall inputs and outputs • component operations • not the sequencing of operations contains workflow subClassOf WSDL operation Soaplab service bioMoby service
myGrid’s model of services operation name, description input output task method resource application input parameter name, description semantic type format transport type collection type collection format output workflow WSDL operation Soaplab service bioMoby service
myGrid’s model of services operation name, description input output task method resource application service name, description authororganisation input parameter name, description semantic type format transport type collection type collection format output workflow WSDL operation WSDL service Soaplab service bioMoby service
A Blast Description Service Name: Blast Operation: execute task: pairwise_local_aligning resource: EMBL application: blastn Parameter: Input: Name: accession semantic type: EMBL Nucleotide sequence id transport data type: string Output: Name: Result semantic type: sequence alignment report transport data type: string
View Service Architecture Discovery by describing services required Semantic Find Component Workflow Registry Taverna Workbench Discovery Client Extract service descriptions to reason over Service Registry Personalised View Component Service Registry Personalised discovery using UDDI clients and publishing of personal metadata Pull service adverts from global registries
Registries and views • Registry • Stores structured description of service or workflow • UDDI & WSDL data model with extensibility • Represented as RDF in Jena repository • Query using RDQL • Views • Local views aggregate a filtered set of registrations from multiple registries
Views over registries Notification of worflows and services with a performance indicator > 90 External registry 1 Blast @ NCBI Organisational view Blast @ Soton Blast @ NCBI External registry 2 Blast @ DDBJ Local registry 2 Blast @ Soton
FETA • Registries are domain independent • But many queries and indexes of services and workflows are domain dependent • FETA provides a domain dependent indexing component that works in concert with the registry. • Uses ontologies as a source of domain knowledge
FETA Example • Domain dependent query • “Find a workflow or service that performs nucleotide sequence alignment” • = performs task aligning or more specific • + accepts input nucleotide sequence or more general Biological data Task Bio Sequence data Aligning Nucleotide sequence data Local aligning ……. Pairwise local aligning Protein sequence data Global aligning ……. …….
Annotation Service Providers Ontologists Others Ontology Store Description extraction WSDL Interface Description Vocabulary Soap- lab Pedro Annotation tool Annotation providers Annotation/ description Taverna Workbench Registry (Personalised View) Registry Registry plug-in Registry
Pedro Data Entry Tool Pedro Data Entry Tool
Discovery in Taverna • User chooses services • A common ontology is used to annotate and query any myGrid object including services. • Discover workflows and services described in the registry via Taverna. • Look for all workflows that accept an input of semantic type nucleotide sequence • Aim to have semantic discovery over public view on the Web.
Reuse in Taverna • Drag a workflow entry into the explorer pane and the workflow loads. • Drag a service/ workflow to the scavenger window for inclusion into the workflow
Uptake & availability • Have we succeeded in promoting uptake? • Too early to say • Plan to deploy a public registry in the Autumn • Software availability • Taverna – available on sourceforge • Ontology – available on http://www.mygrid.org.uk • View - available on myGrid website • FETA– prototype version from myGrid CVS • PEDRo – available on sourceforge • CAUTION – we are currently reassessing / re-implementing the communication between these components. If you are interested in deploying this system wait/ contact us for advice.
Acknowledgements An EPSRC funded UK eScience Program Pilot Project Particular thanks to the other members of the Taverna project, http://taverna.sf.net
myGrid People Core • Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pokock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users • Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK • Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK • Steve Kemp, Liverpool, UK Postgraduates • Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) • Robin McEntire (GSK) Collaborators • Keith Decker