1 / 26

Seven Bottlenecks to Workflow Reuse and Repurposing

Seven Bottlenecks to Workflow Reuse and Repurposing. Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester. Take home message. New problem Workflow reuse and repurposing is happening, how do we make it scale?

suchin
Download Presentation

Seven Bottlenecks to Workflow Reuse and Repurposing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seven Bottlenecks toWorkflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester ISWC 2005, Galway

  2. Take home message • New problem • Workflow reuse and repurposing is happening, how do we make it scale? • Data: Survey of 6 e-Science middleware projects • Requirements analysis: 7 bottlenecks • Creating a pool of process knowledge • Accessing this pool ISWC 2005, Galway

  3. e-Science • Support sharing and col-laboratories in science • The world of distributed web services • A boom in services: e.g. 1800+ bio services in the myGrid project • Pulled together as in silico experiments • Scientist-friendly workflow languages • Hard to build (>1 year!) • A boom in workflows? 100 workflows in myGrid, up to 50 services ISWC 2005, Galway

  4. Evolving e-Science to a Web of Science? • In silico experiments as commodities and know-how • Share, reuse, repurpose • authoring time, quality and provenance collection Manchester, Biology Manchester, CS Newcastle, CS ISWC 2005, Galway

  5. Workflow by example Scientists & developers Edit workflow (repurposing actions) Discover existing work Maintain reuse/repurpose history Try out workflow Register and annotate workflow and new services for reuse Deploy workflow 3rd party annotation providers Scientists & developers Scientists Wroe, Goble, Goderis, Lord et al.Recycling workflows and services through discovery and reuse. CCPE 2005 ISWC 2005, Galway

  6. Analyze This ISWC 2005, Galway

  7. Analyze This x #scientistsx #workflowsx #versionsx #runs ISWC 2005, Galway

  8. Workflow Web service ISWC 2005, Galway

  9. Workflow reuse Web service reuse ISWC 2005, Galway

  10. Repurposing, discovery and composition • Discovery • The process of finding, ranking and selecting existing resources • Composition • The process of combining resources into a new working assembly • (auto-) discovery + (auto-) integration • Repurposing • Auto discovery + manual integration • Need techniques for composition-oriented discovery • Discovery supporting integration through rankings ISWC 2005, Galway

  11. A field report of six projects • www.myGrid.org.uk • reuse by collaborators • personal reuse (versioning) • www.kepler-project.org • 10 complex workflows • reuse of distributed execution models • www.inforsense.com • intranet exchanges within large pharmas • www.geodise.org • 150 Matlab functions, 10 scripts • reuse of function combinations ISWC 2005, Galway

  12. A field report of six projects • www.myGrid.org.uk • reuse by collaborators • personal reuse (versioning) • www.kepler-project.org • 10 complex workflows • reuse of distributed execution models • www.inforsense.com • intranet exchanges within large pharmas • www.geodise.org • 150 Matlab functions, 10 scripts • reuse of function combinations No support for comparing workflows! No third party reuse! ISWC 2005, Galway

  13. 7 bottlenecks toreuse & repurposing Web of Science realm Ranking Process KA Discovery model Workflow interoperability Workflow rigidity Wearehere IPrights Service availability ISWC 2005, Galway

  14. Ranking Process KA Discovery model Workflow interoperability Workflow rigidity IPrights Service availability Step 1: Collect as many workflows as possible ISWC 2005, Galway

  15. Step 2: Make thiscollection usable Ranking Process KA Discovery model Workflow interoperability Workflow rigidity IPrights Service availability ISWC 2005, Galway

  16. Wanted: technology providers Semantic Webcommunity? Ranking Process KA Discovery model Workflow interoperability e-Science community Workflow rigidity IPrights Service availability ISWC 2005, Galway

  17. Service availability web services: Kepler actors, myGrid processors, Inforsense services Local services: Web enable, encode, repository Intellectual property rights Anonymization; journal policies Workflow rigidity Evolution and adaptation: parametrisation The bottlenecks, in more detail ISWC 2005, Galway

  18. Benesh notation Laba notation 4 The nice thing about workflow standards… • Workflow languages abound • Out of 6 projects, 5 do not use BPEL • Behavioural semantics left implicit, as a feature • Repurposing in case of multiple workflow systems • outside system boundaries • and across ISWC 2005, Galway

  19. 4 The nice thing about workflow standards… • Bring out the behavioural semantics • Comparing 3 projects through workflow patterns • E.g. simple merge • Scientific workflows use functional programmingpatterns • How do these combineinto different distributed execution models? • WSMO/SWSI/OWL-S? ISWC 2005, Galway

  20. 5 What belongs in the discovery model? • How to retrieve existing scientific workflows? • Scientists & developers facing distributed programs • For scientists? Dataflow discovery, in jargon, largely abstracting from control ACAAGATGCCATTGT • For developers? Controlflow discovery, largely abstracting from data • Workflow patterns, Kepler distributed execution models • Process networks, process algebra, Petri nets… ? = ? ISWC 2005, Galway

  21. 5 What belongs in the discovery model? • For scientists • WSMO Capability and OWL-S Profile clearly not intended for data flow-based queries • OWL DL: A-Box based workflow queries [Goderis+DL’05] • For developers • Workflow patterns, Kepler distributed execution models • Pattern example based retrieval • An early table of combined execution models ISWC 2005, Galway

  22. 6 New challenges in Knowledge Acquisition • Who does the annotation? + + • What should be in the annotation? • Workflow fragments • Task aggregation/prediction • “Service decomposition” • The things that went wrong! ISWC 2005, Galway

  23. 6 New challenges in Knowledge Acquisition • Who does the annotation? • Updated service ontology learning and automated service annotation techniques • What should be in the annotation? • Workflow fragments • “Service decomposition” • Cutting up service webs • Social network analysis (services as users!) • The things that went wrong • Web site usability mining ISWC 2005, Galway

  24. 7 Ranking workflow relevance • Repurposing  measuring integration effort • Ranking data flow (in jargon) • Structural edit distance • E.g. services to remove/add/replace to equal 2 workflows • For OWL workflow ontology, need abduction or off-line processing • Ranking control flow • Relationship between control flow constructs ISWC 2005, Galway

  25. Take home message • Problem: Workflow reuse and repurposing is happening, how do we make it scale • Data: Survey of 6 e-Science middleware projects • Requirements analysis: 7 bottlenecks • Creating a pool of process knowledge • Workflow interoperability • Accessing this pool of knowledge • Workflow discovery, KA and ranking ISWC 2005, Galway

  26. Acknowledgements • This work is supported by the UK e-Science programme EPSRC GR/ R67743. • The authors would like to acknowledge the myGrid team. Hannah Tipney developed the Williams’ syndrome workflow and is supported by The Wellcome Foundation (G/R:1061183). We thank the survey interviewees for their contribution: Chris Wroe, Mark Greenwood and Peter Li (myGrid), Ilkay Altintas (Kepler), Vasa Curcin (InforSense), Ian Wang (Triana), Colin Puleston (Geodise) and Ben Butchart (Sedna). • Sean Bechhofer provided useful comments on the draft. ISWC 2005, Galway

More Related