470 likes | 631 Views
Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna. Ravi Madduri University of Chicago Argonne National Laboratory. About me. Research Fellow at the Computation Institute, University of Chicago
E N D
Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University of Chicago Argonne National Laboratory
About me • Research Fellow at the Computation Institute, University of Chicago • Lead architect for Workflow technologies in the caBIG project • Workflow Working Group Chair and a key person in the BIRN project • Interested in Informatics, Applications of High throughput data transfer, computing in Biomedical informatics
Agenda • Introduction to Service Oriented Science (SoS) • Introduction to caBIG as an example of SoS • Introduce caGrid as an enabler of SoSvision • Introduce Workflow concepts • Talk about our implementation using Taverna • Show a few Tavernaworkflows including the AutoQRS workflow from CVRG • Lessons learned and future directions.
Service-Oriented Science People create services (data, code, instr.) … which I discover (& decide whether to use) … & compose to create a new function ... & then publish as a new service. I find “someone else” to host services, so I don’t have to become an expert in operatingservices & computers! I hope that this “someone else” can manage security, reliability, scalability, … ! ! “Service-Oriented Science”, Science, 2005
caBIG Goal and Vision caBIG is a virtual web of interconnected data, individuals and organizations that redefines how research is conducted, care is provided, and patients/participants interact with the biomedical enterprise. • Connect the cancer research community through a shareable, interoperable infrastructure • Deploy and extend standard rules and a common language to more easily share information • Build or adapt tools for collecting, analyzing, integrating and disseminating information associated with cancer research and care
caBIG function dimensions caGrid Clinical Data and Trials Management Biospecimen Management In Vivo Imaging Molecular Characterization
What is caGrid? • Biomedical applications that share data all have common needs for syntactic and semantic interoperability • caGrid is a software toolkit aimed at software developers creating Grid applications
caGrid provides • Metadata services that add semantic information to all Grid services • The GAARDS toolkit, a standard security platform • Introduce: the ‘Eclipse’ for services development • Index Service: A service registry for advertisement and discovery of capabilities
A scientific workflow • precisely defines a multi-step procedure, to seamlessly integrate and streamline local and remote heterogeneous computational and data resources to perform in silico scientific exploration.
Service discovery Data access Service interaction Security enforcement Knowledge sharing Workflow Requirements
Overview of caGrid Workflow Composition Discovery instruments reuse Community data Orchestration generate Connectivity Analysis • Workflow as consumer • Easily reuse services for complex experiments. • Workflow as contributor • Workflow as “best practice” wrapped as services. • Workflow providing RoI for SOA Virtualization Security caGrid computation resource
caGrid Workflow Suite • Service discovery • Data access • Service interaction • Security enforcement • Knowledge sharing
The caBIG Workflow System • Data-flow modeling flavor • caGrid activity • State management (WSRF) • Security (GSI) • Service discovery based on cancer research metadata. composition Discovery • Implicit iteration: handle parallel execution • WSRF and GSI enforcement • Workflow Execution. Service • Workflows in caGrid Portal Community reuse Execution Reuse generate A “Facebook” for caGrid workflows caGrid
Semantic Service Discovery • Semantic search – searches Index Service for registered caGrid services matching various search criteria: • Service name, inputs, outputs, research center, class names, concept codes, etc.
Semantic Service Discovery Service metadata • Types of query • String based. • Property based. • Semantic based.
caBIG services palette • As a result of semantic search or direct adding • caBIG services appear in Taverna’s Service Panel • Ready to be dragand dropped into caGrid workflows
Security enforcement • Authentication • Ability to invoke services secured by Grid Security Infrastructure (GSI) • Integrated caGrid Security framework (GAARDS) with Taverna’s Credential manager • Transport Level Security • Authorization • This is done on the service side upon looking at User’s credentials • Credential Delegation Service Integration
Secure Grid services • Taverna can invoke secure Grid services that require user to log in to caGrid • Taverna interacts with caGrid’s GAARDS infrastructure to obtain user’s proxy: • Authenticate the user with user’s affiliated Authentication Service • Obtain user’s proxy from Dorian Service • Default proxy lifetime: 12 hours
Using secure caGrid services • Involves: • Discovering a secure caGrid service from Taverna • Logging onto selected caGrid to obtain a proxy certificate • Saving and managing caGrid proxies and username and passwords
Configuring secure services (1/2) • Authentication Service and Dorian Service urls required in order to obtain user’s proxy • Can be configured globally for all services from the same caGrid (in preferences) • Can be configured individually for a particular caGrid service (overrides configuration from preferences)
Configuring secure services (2/2) • View secure’s service details • Configure service’ssecurity properties
Logging onto caGrid • User is prompted for his caGrid username and password when any secure service is invoked from a workflow for the first time
Credential management • Taverna obtains proxy for user from Dorian Service using user’s caGrid username and password • Proxies are saved and managed byCredential Manager • caGrid username and password can also be remembered
Workflow execution service Workflow Portlet Taverna Workbench Client API Workflow Service Taverna Engine Data Services Analytical Services caGrid & Other Services EPR createResource Stateful Resources (Resource Properties) startWorkflow getStatus getOutput Taverna Workflow Service wraps the Taverna execution engine into a WS-Resource and exposes operations such as createResource, startWorkflow, getStatus, and getOutput for user submitted workflows.
Workflow execution service • Taverna Workflow Service • Provides stateful resources that execute the workflows. • Supports caGrid security architecture (GSI Security). • Allows programmatic submission of workflows.
Access Taverna workflow via caGrid portal Taverna Workflow Portlet is deployed in the caGrid Portal on the training Grid: URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow View : 1 • The Portlet currently lists a few workflows with their descriptions that can be browsed from the above URL • Users can select a workflow they are interested in running.
Access Taverna workflow via caGrid portal URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow View : 2 • Based on the number of input ports in the workflow, the portlet prompts the users to enter the input values in the textbox. • For example, the Lymphoma workflow takes only one input in the form an Experiment ID that identifies the experiment that caArray uses for data collection. • Hit submit after the entering the data.
Access Taverna workflow via caGrid portal URL : http://portal-demo.training.cagrid.org/web/guest/tools/taverna-workflow Views : 3, 4, & 5 • The portlet stores the user submitted workflows in the current session of the portal. • Users can View all the Active and Completed Workflows in the session. • Clicking the Output Button shows the output of the workflow. • The portlet provides workflow specific view-resolvers to render the outputs. For E.g: Lymphoma workflow currently displays the output in a html table.
Knowledge Sharing • Search ‘cabig’ in myExperiment or • Typehttp://www.myexperiment.org/search?type=workflows&query=cabig • Typehttp://tinyurl.com/cabig-workflow
Lymphoma Prediction Workflow MicroArray from tumor tissue Microarray preProcessing Lymphoma prediction
Lymphoma type prediction Acknowledgement: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) Jared Nedzel (MIT)
AutoQRS Analysis Workflow WFDB binary and Patient ID Store WFDB WFDBdata service Retrieve WFDB Patient Record Analysis Execution Record JSDL service AutoQRS Analytical Service Invoke Processing AutoQRS Output Data Service AutoQRS XML Results
Accomplishments • Lymphoma workflow – Among the top 20 most viewed/downloaded Workflows in myExperiment • This is more impressive given that this workflow was uploaded much later than the other workflows • Our BMC-Bioinformatics Article on “caGrid Workflow Toolkit: A Taverna based workflow tool for cancer Grid” achieved “Highly Accessed” relative to its age • We are part of the CVRG Project that recently got renewed
Lessons Learned • Lower the barriers to entry for sharing data and analytics • Software is surprisingly hard to use for end users – more so if the benefit is not all too clear • Return on Investment of a SOA is in creating reusable workflows (LEGO blocks) • Workflows are only as good as the services we create • Traditional SDLC does not always work in the favor of the end users • 80-20 and KISS
Goals of Workflow Project in CVRG • Deploy existing technology on the CVRG that can be used to store and execute workflows generated locally using the Taverna workbench • Develop new technology that allows non-expert users to graphically compose and execute workflows via a web-interface. • Extend the Taverna Engine and add support to invocation of REST-style services so that users can annotate workflow inputs and outputs using ontology terms from NCBO Bioportal and other ontology repositories • Develop specifications describing how workflows should be designed, validated, and documented, and support user development of workflows. • Extend the technology so that workflows can be executed in a cloud-computing environment
Suggested Direction • Hosted Workflow Solution– SaaS workflow tools • Globus Online • Galaxy
Inventrio Shannon Hastings Stephen Langella Scott Oster Other colleagues from Ohio State University, National Cancer Institute, JHU … Acknowledgements • Univ. Chicago / ANL • Ian Foster • DinanathSulakhe • Bo Liu • Univ. Manchester, UK • Carole Goble • StianSoiland-Reyes • Alexandra Nenadic
Journal papers & book chapters • Composition as a Service. IEEE Internet Computing. 2010 • A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the case of caGrid. CCPE. 2010. • Data-driven Service Composition in Building SOA Solutions: A Petri Net Approach. IEEE T-ASE, 2010 • Scientific workflows that enable Web-scale collaboration: combining the power of Taverna and caGrid. IEEE Internet Computing. 2008 • Workflow in a Service Oriented Cyberinfrastructure Environment. in: Junwei Cao (Ed.). Cyberinfrastructure Technologies and Applications. Nova Science Publishers, 2008. (book chapter)
Conference papers • Scientific workflows as services in caGrid: a Taverna and gRAVI approach. ICWS 2009 • Wrap Scientific Applications as WSRF Grid Services using gRAVI. ICWS 2009 • Orchestrating caGrid Services in Taverna.ICWS 2008 • Building Scientific Workflow with Taverna and BPEL: a Comparative Study in caGrid. WESOA 2008 • Build Grid Enabled Scientific Workflows using gRAVI and Taverna.SWBES 2008
Contact information • Ravi Madduri • madduri@mcs.anl.gov • Computation Institute, Univ. Chicago • http://www.ci.uchicago.edu/