380 likes | 516 Views
Workflow discovery in e-science. Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa. Agenda. Web services in science Workflow re-use Workflow discovery Is workflow discovery a new problem? How do people match up workflows?
E N D
Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa
Agenda • Web services in science • Workflow re-use • Workflow discovery • Is workflow discovery a new problem? • How do people match up workflows? • Can we replicate the behaviour with tools? • Conclusions
Scientific workflows • e-science = supporting scientists to encode, enact, explain and share experimental procedures featuring lots of specialised data • Case study: bioinformatics • Understanding the DNA to behaviour link • 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna • Re-use and repurposing of workflows • +/- 200 Taverna workflows shared at fffff
Scientific workflows • e-science = supporting scientists to encode, enact, explain and share experimental procedures • Case study: bioinformatics • Understanding the DNA to life link • 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna • Re-use and repurposing of workflow fragments • +/- 200 Taverna workflows shared at fffff
Manchester, CS dept Manchester Biology dept Newcastle, CS dept
Scientific workflows • e-science = supporting scientists to encode, enact, explain and share experimental procedures • Case study: bioinformatics • Understanding the DNA to life link • 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna • Re-use and repurposing of workflow fragments • +/- 200 Taverna workflows shared at www.myExperiment.org
One + Three questions • Can’t we just do it with ? • Keyword search doesn’t seem to cut it • Is workflow discovery a new problem? • How do people match up workflows? • Can we replicate the behaviour with tools?
myExperiment.org my current workflow
? myExperiment.org my current workflow
1. Is workflow discovery a new problem? Source: survey of 21 myGrid/Taverna users
1. Is workflow discovery a new problem? Yes Workflow discovery subsumes service discovery
? 2. How do people match up workflows?
? 3. Can we replicate the behaviour with tools? 1 + 2 3 ... 1 2 3
? A user experiment with bioinformatics workflows +
? Workflow discovery task • Can I sensibly adapt an existing experimental procedure (workflow) with another one? • Extend Replace +
Workflow corpus • 66 similar workflows for Graves’ disease done by single author • 1 + 5 workflows • Workflow diagram • No documentation • No annotation 1 + 5
By the experts, for the experts • 9 bioinformaticians and 4 developers at a Taverna training day
Matching strategies • Matching input workflow with 5 others 2 1 ? 3 4 5
Human on-line matching strategies! • Traits • Scores of attraction • Yes or no
Matching strategy: traits From an analysis of 30 000 profiles
Matching strategy: scoring Score Percentile Confidencelevel www.AmIHotOrNot.com
Traits • Predicted trait
Traits and score • Predicted trait • Score of similarity, usefulness and confidence E.g. [1 Identical – 9 Not similar]
? The gold standard • The collection of workflow similarity assessments • Predictive traits, possibly interacting 1 + 5 Traits/score
2. How do people match up workflows? • Difficulty of task • Biological relationship very difficult for 6 out of 9 • Shape similarity difficult for 4 out of 13 • Medium confidence • Consistency • Inter participant disagreement on how to order biological similarity and shape similarity [Spearman rank order test] • Predictive traits • No one trait dominant between and within participants [Levene homogeneity of variance test]
Can we do better? • Simpler tasks and workflows • Taverna experienced users • Workflow documentation and annotation • Other factors in use, e.g. size difference • Fix allowed factors • Adopt black box approach: yes/no matching
Automated discovery technique • Unattributed graph matcher implementation by Messmer and Bunke • Sub-isomorphism detection; exponential time complexity • DAGs and optimization for repository of graphs • Workflows parsed as graphs • Workflow input, workflow output andintermediate services as nodes • Data links as edges probeSetid databaseid AffyMapper_seq Blastx Results_Blastx
Automated discovery technique • Ranking based on • shared nodes • difference in size between input graph and repository graphs
Average similarity assessments across participants ? 3. Can we replicate the behaviour with tools? Kind of.. + 1 + 66 Traits/score
? OWL workflow ontology Current work Precision / recall Graph matching Text clustering 1 + 2 12 + 21 3 ... 1 2 3 Yes/no
Take home • Scientists compose Web services for real – and share their results • Workflow discovery is a real problem, which subsumes service discovery • A range of matching strategies and techniques apply • Evaluation is a challenge - gold standards hard to build • Come and play at myExperiment.org • References at www.cs.man.ac.uk/~goderisa