190 likes | 401 Views
Scientific Workflow Requirements. Carole Goble, University of Manchester, UK Bertram Ludaescher, SDSC, USA. Attendees included. Bob Mann Anthony Mayer Austin Tate Bertram Lud ä scher Geoffrey Fox Jeffrey Grethe Matthew Shields Mike Wilde Simon Cox Carole Goble Antoon Goderis
E N D
Scientific Workflow Requirements Carole Goble, University of Manchester, UK Bertram Ludaescher, SDSC, USA
Attendees included • Bob Mann • Anthony Mayer • Austin Tate • Bertram Ludäscher • Geoffrey Fox • Jeffrey Grethe • Matthew Shields • Mike Wilde • Simon Cox • Carole Goble • Antoon Goderis • Earl Ecklund • Alan Bundy • Albert Burger • Jessica Chen-Burger • And a bunch more whose names we didn’t get
Scientific Workflow Requirements • characterise scientific workflows, • identify their requirements • compare/contrast with business workflow requirements. Some science stakeholders • neuroscience, astronomy, engineering Few business stakeholders
A Scientist Writes • “Work in my problem solving environment so that I don’t need to change the way I work.”
User facing • Reflect the modelling paradigm of the scientist. • Varies between experiments, disciplines • Which user would that be then? • Creators, users, auditors, validators (I know if its right if I see it but I can’t right it) • Biologists compared to bioinformaticians, and transitioning between • Different users different environments • Appropriate levels of abstraction. • User models -> workflow models • Simple to use & intuitive creation, deployment, execution and debugging environments
Supporting Scientific Practice • Incrementally exploratory prototypical TYPE A • Got the data, now get the nature paper before the next guy • Large scale production TYPE B • Got the idea, Get the data for every many experiments, and even many teams, communities blah blah • Migration from TYPE A to TYPE B. • Capture of TYPE A for later non-interactive replay in a parameterised fashion. • Workflow creation paradigms • by example, plagiarism, drag and drop • Provenance tracking
Cool tools, right tools • I love my VI editor • Diagramming tools, text tools • Works on all workflows, use which you like when you like. • Good tools! Easy tools! Friendly tools! For the domain user (which user?) not the computer scientist • Cat skinning • Multiple scripting language support • Multiple ways to write a workflow
Transparency and control • Looking under the hood and inside the box • observe, trace, compare, muse, fettle & fiddle. • What should be transparent? • Do users need to know what format data is in or just that it is an image? • Unveil at different levels of detail, through the wedding cakes, stacks • Opaque to some users some of the time, drillable by others some of the time • Role, authorisation, policy • Scientist knows best
User interaction • Creation, Discovery, Enactment • Single User interaction with workflow execution • Choice between paths of execution in specific states • Parameter modification mid-run • Collaborative multi-user interaction in creation • Reusing workflows -> Modularisation • Reusing wfs with different parameters and datasets • Joining up wfs from different areas, different disciplines and across scales • E-science crosses disciplines!! • No support for “extreme team wf creation” • Collaborative multi-user interaction in execution?
Legacy and Extensibility • Ingesting legacy and external applications & services • May not run on every platform, may need an emulator. • Heterogeneity – of types, platforms etc • Include arbitrary services available within the users domain or hacked up by the users. • Simon’s piece of Matlab hackery – dark matter services. • On the fly development and assimilation • Suspending the workflow, or prompting the user • For the prototypical exploratory workflows largely. • Massaging, lubrication, facilitating, gluing without programming ! • Easy to extend to meet specific or unique requirements
More on workflow sorts • Batch vs interactive • Dataflow vs control flow vs state driven • Incrementally exploratory prototypical vs large scale production (and migration from former to latter).
Workflow lifecycles • Prototypical workflow development to production run • Different parts of the lifecycle might need different environments and policies • Different sorts of users will interact at different points in the lifecycle.
Security, trust and validation • Guarantees • That a provisioned service is what it says it is and follows all notification mandates. • Models of soundness at different level, well behavedness • 500 lambs follow 10-15 shepherds (or wolves?) • Validate at the right time not every time. • Confidence in someone else’s stuff • I can look at it to check it but I can’t write it.
Business vs Scientific • Its all the same and its all different • Use cases and scenarios needed. • Classify business and scientific workflow against Matthew’s Stack • Drivers • Science workflow driven by scientific questions, outcomes and vanity. • Business workflow driven by business processes & goals and $£€ • Granularity • Business languages for coarse grain of swf • Scientists hack at fine grain level
Business vs Scientific • Individualism vs Corporations • Ratios -- more creators than users in science? • What is the Scientific Business Process?
A techy writes • Formal underpinning in CS theory • What is the underlying formal theoretic model? What is the natural scripting language? • Dataflow is function & parallel • Control flow is imperative & sequential? • SWF creation as programming. • What are the languages?
Next Steps • Write this up! • Harvest some business use cases from Forrester report style sources (and get Tony Hey to pay) • Collect scientific workflow examples • Develop matrixes of system, functional and language requirements against these examples. • Er … that’s it!