1 / 40

An Introduction to Designing, Executing and Sharing Workflows with Taverna

An Introduction to Designing, Executing and Sharing Workflows with Taverna. Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011. Exercise 1: Exploring the Workbench. Taverna can be downloaded from http://www.taverna.org.uk/

addison
Download Presentation

An Introduction to Designing, Executing and Sharing Workflows with Taverna

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011

  2. Exercise 1: Exploring the Workbench Taverna can be downloaded from http://www.taverna.org.uk/ Go to the page and find the latest (2.3) Download the correct version for your operating system Follow the instructions in the Taverna installer The following page shows a screenshot of Taverna and the different panels that make up the workbench

  3. Taverna Workbench Workflow Diagram Services Panel Workflow Explorer

  4. 1. Workflow Diagram The visual representation of workflow • Shows inputs/outputs, services and control flows • Allows editing of the workflow by dragging and dropping and connecting services together • Enables saving of workflow diagrams for publishing and sharing

  5. 1. Workflow Explorer • The Workflow Explorer shows the detailed view of your workflow. It shows default values and descriptions for service inputs and outputs and it shows where remote services are located. It also shows configuration details, such as iteration and looping (we will come back to these things later). • Workflow validation details can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available.

  6. 1. Available Services Panel Lists services available by default in Taverna • Local java services • WSDL Web Service – secure and public • RESTful Services • R Processor services (for statistical analyses) • Beanshell scripts • Xpath scripts Allows the user to add new services or workflows from the web or from file systems – there are loads more available!

  7. Exercise 2: Building a Simple Workflow • In the Services panel, type ‘image’ into the search box. • Select ‘Get Image from URL’ • This is a local service, but web services work the same way • Many historical documents are stored as images on the web. This is a simple, but useful service to help gather data • Drag this service across to the workflow diagram panel

  8. Exercise 2: Building a Simple Workflow • In a blank space in the workflow diagram, right-click and select “Add Workflow Input Port” • Type a name (e.g. URL) for this input in the pop-up window and click “ok” • Do the same to create a new workflow output. Call this output “image”

  9. Exercise 2: Building a Simple Workflow You now have 3 boxes in the diagram and we need to connect them up into a workflow First, we need to find out how many inputs and outputs the ‘get image from URL’ service has At the top of the workflow diagram, select the ‘show ports’ icon Show Ports

  10. Exercise 2: Building a Simple Workflow Click on the workflow input box and drag the linking arrow across to the URL input of the ‘get_image_from_URL’ service. Link the image output of ‘get_image_from_url’ to the workflow output port

  11. Exercise 2: Building a Simple Workflow You have now built your first workflow! It should look something like this. In many cases, you have to supply input data for EVERY service input port. In this case, however, the ‘base’ input is optional, so we will leave it. Save the workflow by going to file -> save workflow

  12. Exercise 2: Building a Simple Workflow Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench

  13. Exercise 2: Building a Simple Workflow An input window will appear. As you can see, we have not yet added a description of the workflow or of the input Click on ‘New Value’ in the input window and add the url http://www.archives.gov/exhibits/featured_documents/magna_carta/images/magna_carta.jpg where it says “some input data goes here”

  14. Exercise 2: Building a Simple Workflow Click “run workflow” In the bottom left of the results window, click on the results. You will now see an image from the specified web page Workflow results can be saved here if required by clicking on ‘save all values’

  15. 2: Adding a Workflow Description Right-click on a blank part of the workflow diagram and select “show details” In the workflow explorer panel, the details page will open up. Add some details about the workflow (e.g. who is the author, what does the workflow do). You can also add examples and descriptions for the workflow inputs by selecting them in the explorer panel and selecting “details” Adding this metadata makes the workflow much more reusable Save the workflow by going to “File -> save workflow”

  16. Exercise 3: Adding New Services • New services can be gathered from anywhere on the web • We will find a new service and add it to the workbench • IMPACT and SACPE have a whole suite of services. We will add one (you will be using it later on today) • Go to https://fue.onb.ac.at/synapse. Here you will find a list of IMPACT services • Click on IMPACTTesseractV3Proxy and copy the link you are directed to. • This is the WSDL address and is what Taverna needs to run the service

  17. 3. Adding New Services Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service Select ‘WSDL service…’ A window will pop-up asking for a web address

  18. 3. Adding New Services Enter the service address you just copied Scroll down the Services list, you will see your new service there

  19. Exercise 4: Sharing and Reusing Workflows Go to http://www.myexperiment.org myExperiment is a social networking site for sharing workflows and workflow expertise and experiences Browse around the site and see what it contains Find everything that has been tagged with ‘text mining’, for example Look at the text mining workflows. You will see some that are specific to biology, some that are generally applicable, and some that are specific to other scientific disciplines

  20. 4. Sharing and Reusing workflows IMPACT have many workflows on myExperiment, but they are not public. You must join an IMPACT group before you can see them and use them. Create yourself an account and join the group called ‘IMPACT-myGrid-Hackathon’ (NOTE: you need to join this group to access content for future exercises) Explore the shared items in this group. These are examples of the types of tasks IMPACT workflows can perform

  21. 5. Using Workflows from myExperiment You can download and run the workflows from the myExperiment website, or you can use myExperiment directly from Taverna To use workflows from the website, you can either download them, or copy the workflow file location into the ‘open workflows from the web’ option in Taverna’s file menu.

  22. 5. Using Workflows from myExperiment Go back to Taverna and click on the myExperiment icon at the top of the workbench Go to ‘my stuff’ and log in (using the same credentials as the web page) Find the IMPACT-myGrid-Hackathon group by using the ‘search’ option. Look at the shared items and find the workflow called ‘Text to List’ Click on ‘open’ and this workflow will be automatically imported into your Taverna design window

  23. 5. Validate your Workflow Taverna checks to see that everything is connected properly and that all the required services are available Go to the workflow explorer and click on ‘validation report’ See if Taverna has found any problems with the workflow. Errors will be displayed in red, warnings in yellow. Workflows with warnings often still run. If there are problems, follow the instructions to resolve them by clicking on the ‘Solution’ tab If not, run the workflow

  24. 5. Using Workflows from myExperiment Use the default input suggested to run the workflow. The workflow will collect and list some example data stored at the given URL It returns a list of image files We can now combine this workflow with the one we made earlier to return the actual images. In Taverna, you can add workflows as if they were any other kind of service – these are called ‘Nested Workflows’

  25. 6. Reusing and connecting Workflows From the current workflow design window, go to ‘Insert -> Nested workflow Import the workflow you made earlier, by selecting ‘import from file’ You can see a small version of the workflows, so you can check you are importing the correct workflow

  26. 6. Reusing and connecting Workflows We now need to connect the two workflows together Connect the Text2List service to the input of the nested workflow by dragging an arrow across. Make a new workflow output port (by right-clicking and adding workflow output port) Connect the output of the nested workflow to the new workflow output port

  27. 6. Reusing and connecting Workflows Your new workflow should look something like this Save and run the workflow This time, as it runs, you will see Taverna automatically iterates over the list of data produced by Text2List NOTE: some of the iterations will fail. See if you can tell which Look at one of the resulting images

  28. 7. Looking at Intermediate Results You can track intermediate workflow values through the results view. This is very useful for working out where unexpected results came from. On the diagram, click the Text2List service and look at its inputs and outputs in the results. You can save the workflow in myExperiment if you wish, but make sure you give credit to the nested workflow author and make sure you ONLY share it with the IMPACT-myGrid-Hackathon group

  29. Controlling data flow in WorkflowsAdvanced Exercises

  30. As you have already seen, Taverna can automatically iterate over sets of data. When 2 sets of iterated data are combined, however, Taverna needs extra information about how they should be combined. You can have: A cross product – combining every item from list 1 with every item from list 2 - all against all A dot product – only combining item 1 from list 1 with item 1 from list 2, and so on – line against line 8. Iteration

  31. Find and load the workflow ‘Demonstration of configurable iteration’ from myExperiment Read the workflow metadata to find out what the workflow does (by looking at the ‘Details’) Select the ‘ColourAnimals’ service and select the ‘Details’ in the workflow explorer and ‘configure list handling’ Click on ‘dot product’ in the pop-up window. This allows you to switch to cross product 8. Iteration

  32. Run the workflow twice – once with ‘dot product’ and once with ‘cross product’. Save the first results so you can compare them – what is the difference? What does it mean to specify dot or cross product? 8. Iteration

  33. 9. Retries: Making your Workflow Robust Web services can sometimes fail due to network connectivity If you are iterating over lots of data items, you can guard against these temporary interruptions by adding retries to your workflow Upload the ‘Retry-Example’ workflow from the IMPACT-myGrid-Hackathon group. This workflow is designed to fail sometimes. Run the workflow as it is and count the number of failed iterations

  34. 9. Retries: Making your Workflow Robust Now, select the ‘sometimes_fails’ service and select the ‘details’ tab in the workflow explorer panel Click on ‘advanced’ and ‘configure’ for retries In the pop-up box, change it so that it retries each service iteration 2 times Run the workflow again – how many failures do you get this time? Change the workflow to retry 5 times – does it work every time now?

  35. 10. Looping From myExperiment, download and open the workflow “dummy_example_of_looping” This workflow is asynchronous. This means that when you submit data (by running the workflows), it will return a jobID and place your job in a queue. This is very useful if your job will take a long time! The ‘CheckStatus’ service will query your job ID to find out if it is complete

  36. 10. Looping The default behaviour in a workflow is to call each service only once for each item of data – so what if your job has not finished when ‘Status’ workflow asks? Run the workflow Almost every time, the workflow will ‘fail’ (in this case, that means it will return 0) because the results have not been returned before the workflow reaches the ‘getResults’ service

  37. 10. Looping This is where looping is useful. Taverna can keep running the ‘status’ service until it reports that the job is done. Select the ‘CheckStatus’ service and click on the ‘details’ tab in the workflow explorer Select ‘advanced’ and click on ‘add looping’ Use the drop-down boxes in the looping window to set ‘state’ ‘is_not_equal_to’ RUNNING

  38. 10. Looping Save the workflow and run it again This time, the workflow will run until the ‘CheckStatus’ service reports that it is either COMPLETE, or it has an ERROR. You will see results for ‘GetResults’, but you will still get an error for ‘GetResults2’. This is because there is one more configuration to change – we also need ‘Control Links’

  39. A control link specifies that there is a dependency of one service on another even though there is no data flowing between them. A control link is a line with a white circle at the end that connects two services (see the link between ‘CheckStatus’ and ‘getResults’ 11. Control Links

  40. 11. Control Links We will add control link to getResults2 Right-click on getResult2 and select ‘Run after’ from the drop down menu. Set it to ‘Run after’ -> ‘CheckStatus’ Save and run the workflow Now you will see both results returned

More Related