210 likes | 368 Views
Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example. Nandita Mangal, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi USC Information Sciences Institute Marina del Rey, CA. pegasus.isi.edu. Motivation. Many workflow systems exists today
E N D
Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi USC Information Sciences Institute Marina del Rey, CA Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu pegasus.isi.edu
Motivation • Many workflow systems exists today • The choice of particular system often dictated by who you know • Various workflow system have different capabilities • Application components versus services • Visual vs. scripting workflow descriptions • Performance optimization, etc. • Can you combine two separate systems? • What are the issues? Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Kepler (UCSD and UCDavis) • Scientific workflow management system based on Ptolemy II • Allows scientists to visually design and execute scientific workflows • Actor-oriented model with directors acting as the main workflow engine • Enables different models of computation. Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Pegasus (USC/ISI) • Based on programming language principles • Leverages abstraction for workflow description to obtain ease of use, scalability, and portability • Provides a compiler to map from high-level descriptions to executable workflows • Correct mapping • Performance enhanced mapping • Relies on a runtime engine to carry out the instructions • Scalable manner • Reliable manner Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Combing Kepler & Pegasus • Integration of Kepler visual programming environment with the grid mapping abilities of Pegasus • Giving Kepler users the ability to map their large workflows onto the grid • Giving Pegasus users a visual workflow composition tool • Differences in the level of abstraction of workflow description Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Kepler Provenance Challenge Workflow Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Concrete Workflow Generation and Mapping Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Implementation Strategy • Develop Pegasus-specific entities • Abstract jobs • Directors and actors • “Pegasus Director” and “Pegasus Jobs (Actor Entities)” • act as the main grid components to execute a given grid computation • Focus mainly on abstract jobs in the Kepler environment • portable and resources-knowledge independent workflow descriptions Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Integration Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Pegasus Actor & Director Entities Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Visual Abstract Workflow Creation Users can create visual models of abstract workflows and specify logical transformations without specifying grid resources Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Job Abstract Configuration--Integration with the Transformation Catalog Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Resultant Abstract Job on Kepler Canvas: A Pegasus abstract job can take in multiple input files as can output multiple output file Grid resources information is not expected in such an actor. Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Support for Concrete Jobs--- useful for monitoring and debugging A concrete job requires specific grid resources information from the scientist. Allows the scientist to directly execute jobs on the grid Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Pegasus Director/ DAX Generator • Controls the execution of all the job (actor) entities and creates a resulting directed acyclic graph in XML format • Generates a DAX • Gives it to DAGMan for execution Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Sample DAX Generated : • <?xml version="1.0" encoding="UTF-8"?> • <!-- generated: 2006-12-03T19:27:27-08:00 --> • <!-- generated by: Nandita [??] --> • <adag xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.10.xsd" xmlns="http://www.griphyn.org/chimera/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.10" count="1" index="0" name="WorkflowTemplate_1_1156808715390"> • <!-- part 1: list of all referenced files (may be empty) --> • <!-- part 2: definition of all jobs (at least one) --> • <job id="Job1_id" namespace="keplerdax" name="Job1" version="1"> • <argument><filename file="FileA.png"/> <filename file="FileB.txt"/></argument> • <uses file="FileA.png" link="input"/> • <uses file="FileB.txt" link="output"/> • <uses file="FileC.txt" link="output"/> • </job> • <job id="Job2_id" namespace="keplerdax" name="Job2" version="1"> • <argument><filename file="FileB.txt"/> <filename file="FileE.png"/></argument> • <uses file="FileB.txt" link="input"/> • <uses file="FileE.png" link="output"/> • </job> Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
<job id="Job3_id" namespace="keplerdax" name="Job3" version="1"> <argument><filename file="FileC.txt"/><filename file="FileD.xml"/></argument> <uses file="FileC.txt" link="input"/> <uses file="FileD.xml" link="output"/> </job> <job id="Job4_id" namespace="keplerdax" name="Job4" version="1"> <argument><filename file="FileE.png"/> <filename file="FileF.png"/></argument> <uses file="FileE.png" link="input"/> <uses file="FileD.xml" link="input"/> <uses file="FileF.png" link="output"/> </job> <!-- part 3: list of control-flow dependencies (may be empty) --> <child ref="Job2_id"> <parent ref="Job1_id"/> </child> <child ref="Job3_id"> <parent ref="Job1_id"/> </child> <child ref="Job4_id"> <parent ref="Job2_id"/> </child> <child ref="Job4_id"> <parent ref="Job3_id"/> </child> </adag> Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Provenance Challenge Workflow in Kepler/Pegasus In Kepler each node needs a unique name, so TC needs many duplicate entries Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Integration Benefit for Pegasus users • Visualizing/ Debugging Existing Models: • Support a scientist trying to redo/visualize or easily re-configure existing DAX • Provide option to upload existing DAX files into the workspace • Convert the specified DAX file into a MoML (Kepler’s) format by passing it through an XSLT processor and generating the required directors and actors on the canvas • Issues of scalability (only small workflows can be visualized) • Scoping may need to be applied Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Integration Issues • Kepler acts a visual programming environment • Actors represent single units of computation with data flow among each other • Some configuration not intuitive (TC entries) • There is no concept of representation of files separately in Kepler • Have multiport I/O ports for each job • The user is given the option to connect as many files going into and coming out of the port • Potential use of integrated environment for debugging • Not done • Integration with Pegasus data registry • No monitoring of execution in Kepler • Use of Kepler’s workflow execution engine • Support for Kepler actors in Pegasus Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Relevant Links • Kepler: http://kepler-project.org • Pegasus: http://pegasus.isi.edu • DAGMan: www.cs.wisc.edu/condor/dagman/ • Provenance challenge: http://twiki.ipaw.info/bin/view/Challenge/ • Workshop on Tuesday • NSF workshop on Challenges of Scientific Workflows: www.isi.edu/nsf-workflows06 Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu