260 likes | 434 Views
Seven Bottlenecks to Workflow Reuse and Repurposing. Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester. Take home message. New problem Workflow reuse and repurposing is happening, how do we make it scale?
E N D
Seven Bottlenecks toWorkflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester ISWC 2005, Galway
Take home message • New problem • Workflow reuse and repurposing is happening, how do we make it scale? • Data: Survey of 6 e-Science middleware projects • Requirements analysis: 7 bottlenecks • Creating a pool of process knowledge • Accessing this pool ISWC 2005, Galway
e-Science • Support sharing and col-laboratories in science • The world of distributed web services • A boom in services: e.g. 1800+ bio services in the myGrid project • Pulled together as in silico experiments • Scientist-friendly workflow languages • Hard to build (>1 year!) • A boom in workflows? 100 workflows in myGrid, up to 50 services ISWC 2005, Galway
Evolving e-Science to a Web of Science? • In silico experiments as commodities and know-how • Share, reuse, repurpose • authoring time, quality and provenance collection Manchester, Biology Manchester, CS Newcastle, CS ISWC 2005, Galway
Workflow by example Scientists & developers Edit workflow (repurposing actions) Discover existing work Maintain reuse/repurpose history Try out workflow Register and annotate workflow and new services for reuse Deploy workflow 3rd party annotation providers Scientists & developers Scientists Wroe, Goble, Goderis, Lord et al.Recycling workflows and services through discovery and reuse. CCPE 2005 ISWC 2005, Galway
Analyze This ISWC 2005, Galway
Analyze This x #scientistsx #workflowsx #versionsx #runs ISWC 2005, Galway
Workflow Web service ISWC 2005, Galway
Workflow reuse Web service reuse ISWC 2005, Galway
Repurposing, discovery and composition • Discovery • The process of finding, ranking and selecting existing resources • Composition • The process of combining resources into a new working assembly • (auto-) discovery + (auto-) integration • Repurposing • Auto discovery + manual integration • Need techniques for composition-oriented discovery • Discovery supporting integration through rankings ISWC 2005, Galway
A field report of six projects • www.myGrid.org.uk • reuse by collaborators • personal reuse (versioning) • www.kepler-project.org • 10 complex workflows • reuse of distributed execution models • www.inforsense.com • intranet exchanges within large pharmas • www.geodise.org • 150 Matlab functions, 10 scripts • reuse of function combinations ISWC 2005, Galway
A field report of six projects • www.myGrid.org.uk • reuse by collaborators • personal reuse (versioning) • www.kepler-project.org • 10 complex workflows • reuse of distributed execution models • www.inforsense.com • intranet exchanges within large pharmas • www.geodise.org • 150 Matlab functions, 10 scripts • reuse of function combinations No support for comparing workflows! No third party reuse! ISWC 2005, Galway
7 bottlenecks toreuse & repurposing Web of Science realm Ranking Process KA Discovery model Workflow interoperability Workflow rigidity Wearehere IPrights Service availability ISWC 2005, Galway
Ranking Process KA Discovery model Workflow interoperability Workflow rigidity IPrights Service availability Step 1: Collect as many workflows as possible ISWC 2005, Galway
Step 2: Make thiscollection usable Ranking Process KA Discovery model Workflow interoperability Workflow rigidity IPrights Service availability ISWC 2005, Galway
Wanted: technology providers Semantic Webcommunity? Ranking Process KA Discovery model Workflow interoperability e-Science community Workflow rigidity IPrights Service availability ISWC 2005, Galway
Service availability web services: Kepler actors, myGrid processors, Inforsense services Local services: Web enable, encode, repository Intellectual property rights Anonymization; journal policies Workflow rigidity Evolution and adaptation: parametrisation The bottlenecks, in more detail ISWC 2005, Galway
Benesh notation Laba notation 4 The nice thing about workflow standards… • Workflow languages abound • Out of 6 projects, 5 do not use BPEL • Behavioural semantics left implicit, as a feature • Repurposing in case of multiple workflow systems • outside system boundaries • and across ISWC 2005, Galway
4 The nice thing about workflow standards… • Bring out the behavioural semantics • Comparing 3 projects through workflow patterns • E.g. simple merge • Scientific workflows use functional programmingpatterns • How do these combineinto different distributed execution models? • WSMO/SWSI/OWL-S? ISWC 2005, Galway
5 What belongs in the discovery model? • How to retrieve existing scientific workflows? • Scientists & developers facing distributed programs • For scientists? Dataflow discovery, in jargon, largely abstracting from control ACAAGATGCCATTGT • For developers? Controlflow discovery, largely abstracting from data • Workflow patterns, Kepler distributed execution models • Process networks, process algebra, Petri nets… ? = ? ISWC 2005, Galway
5 What belongs in the discovery model? • For scientists • WSMO Capability and OWL-S Profile clearly not intended for data flow-based queries • OWL DL: A-Box based workflow queries [Goderis+DL’05] • For developers • Workflow patterns, Kepler distributed execution models • Pattern example based retrieval • An early table of combined execution models ISWC 2005, Galway
6 New challenges in Knowledge Acquisition • Who does the annotation? + + • What should be in the annotation? • Workflow fragments • Task aggregation/prediction • “Service decomposition” • The things that went wrong! ISWC 2005, Galway
6 New challenges in Knowledge Acquisition • Who does the annotation? • Updated service ontology learning and automated service annotation techniques • What should be in the annotation? • Workflow fragments • “Service decomposition” • Cutting up service webs • Social network analysis (services as users!) • The things that went wrong • Web site usability mining ISWC 2005, Galway
7 Ranking workflow relevance • Repurposing measuring integration effort • Ranking data flow (in jargon) • Structural edit distance • E.g. services to remove/add/replace to equal 2 workflows • For OWL workflow ontology, need abduction or off-line processing • Ranking control flow • Relationship between control flow constructs ISWC 2005, Galway
Take home message • Problem: Workflow reuse and repurposing is happening, how do we make it scale • Data: Survey of 6 e-Science middleware projects • Requirements analysis: 7 bottlenecks • Creating a pool of process knowledge • Workflow interoperability • Accessing this pool of knowledge • Workflow discovery, KA and ranking ISWC 2005, Galway
Acknowledgements • This work is supported by the UK e-Science programme EPSRC GR/ R67743. • The authors would like to acknowledge the myGrid team. Hannah Tipney developed the Williams’ syndrome workflow and is supported by The Wellcome Foundation (G/R:1061183). We thank the survey interviewees for their contribution: Chris Wroe, Mark Greenwood and Peter Li (myGrid), Ilkay Altintas (Kepler), Vasa Curcin (InforSense), Ian Wang (Triana), Colin Puleston (Geodise) and Ben Butchart (Sedna). • Sean Bechhofer provided useful comments on the draft. ISWC 2005, Galway