130 likes | 248 Views
Taverna workflows in caGrid. caGrid Architecture Face-to-face meeting. Stian Soiland-Reyes & Aleksandra Nenadic, myGrid University of Manchester, UK Boston, 2009-05-11. http://www.mygrid.org.uk/dev/wiki/display/caGrid. Agenda. What is a Taverna Workflow? Abstract caGrid workflow example
E N D
Taverna workflowsin caGrid caGrid Architecture Face-to-face meeting Stian Soiland-Reyes & Aleksandra Nenadic, myGridUniversity of Manchester, UK Boston, 2009-05-11 http://www.mygrid.org.uk/dev/wiki/display/caGrid
Agenda • What is a Taverna Workflow? • Abstract caGrid workflow example • Actual Taverna workflow • caGrid plugin for Taverna • Current work • Where do we go next?
What is a Taverna workflow? • Set of services(web services, RESTful, local scripts, other workflows, etc) • Set of data links between services - “put output X from service A as input Y to service B” • If needed: List handling, control links • This can be called a data-oriented workflows (dataflow) • Say where you want the data to flow instead of what you want to do • Compare with more procedural workflow languages like BPEL • Beneficial way of thinking for much data-driven scientific research
Abstract caGrid workflow • Query the CPAS data service to find protein sequence • Use (parts of) result to query GridPIR and caBIO data services for matching sequences
Actual Taverna workflow • Looks very similar to abstract workflow • Introduces shim services to build and parse data elements Blue: Constant CQL query Purple: Build/parse complex type for web service input/output Orange: Local scripts to parse the description string and build CQL queries Green: caGrid WSDL services http://www.myexperiment.org/workflows/752
caGrid plugin for Taverna (1) • Listing all services: • Discover/browse services registered in the caGrid Index Service • Easy to install into Taverna:
caGrid plugin for Taverna (2) • …or by semantic search:
Current work by myGrid & caGrid • Develop Taverna support for GAARDS-secured caGrid services • Wrap existing 3rd party services (that are used by existing Taverna users) for caGrid and annotate them to match Silver-level compatibility guidelines • Taverna workflow as a caGrid service • Service discovery improvements • Documentation, building example workflows
Real example: Lymphoma type prediction • Scientific value • Using gene-expression patterns associated with DLBCL and FL to predict the lymphoma type of an unknown sample. • Using SVM (Support Vector Machine) to classify data, and predicting the tumor types of unknown examples. • Main steps • Query training data from experiments stored in caArray • Preprocess (normalize) the microarray data. • Add training and testing data into SVM service to get classification results *Fig. from MA Shipp. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.Nature medicine, 2002
Lymphoma type prediction workflow Wei Tanhttp://www.myexperiment.org/workflows/746 Query • Preprocess • Classify & predict
Lymphoma type prediction results The (few) classification errors are highlighted Acknowledgements: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) Jared Nedzel (MIT), Wei Tan
Where do we go next? • Just some ideas.. • Tighter integration with caDSR • Partial rerun of workflows • Improve Taverna’s support for complex XML types • Workflow sharing • Workflows in caGrid portal • Guided workflow building using caGrid metadata • Easily build CQL queries from Taverna • Google Summer of Code 2009