1 / 21

Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example

Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example. Nandita Mangal, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi USC Information Sciences Institute Marina del Rey, CA. pegasus.isi.edu. Motivation. Many workflow systems exists today

gazelle
Download Presentation

Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi USC Information Sciences Institute Marina del Rey, CA Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu pegasus.isi.edu

  2. Motivation • Many workflow systems exists today • The choice of particular system often dictated by who you know • Various workflow system have different capabilities • Application components versus services • Visual vs. scripting workflow descriptions • Performance optimization, etc. • Can you combine two separate systems? • What are the issues? Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  3. Kepler (UCSD and UCDavis) • Scientific workflow management system based on Ptolemy II • Allows scientists to visually design and execute scientific workflows • Actor-oriented model with directors acting as the main workflow engine • Enables different models of computation. Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  4. Pegasus (USC/ISI) • Based on programming language principles • Leverages abstraction for workflow description to obtain ease of use, scalability, and portability • Provides a compiler to map from high-level descriptions to executable workflows • Correct mapping • Performance enhanced mapping • Relies on a runtime engine to carry out the instructions • Scalable manner • Reliable manner Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  5. Combing Kepler & Pegasus • Integration of Kepler visual programming environment with the grid mapping abilities of Pegasus • Giving Kepler users the ability to map their large workflows onto the grid • Giving Pegasus users a visual workflow composition tool • Differences in the level of abstraction of workflow description Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  6. Kepler Provenance Challenge Workflow Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  7. Concrete Workflow Generation and Mapping Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  8. Implementation Strategy • Develop Pegasus-specific entities • Abstract jobs • Directors and actors • “Pegasus Director” and “Pegasus Jobs (Actor Entities)” • act as the main grid components to execute a given grid computation • Focus mainly on abstract jobs in the Kepler environment • portable and resources-knowledge independent workflow descriptions Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  9. Integration Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  10. Pegasus Actor & Director Entities Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  11. Visual Abstract Workflow Creation Users can create visual models of abstract workflows and specify logical transformations without specifying grid resources Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  12. Job Abstract Configuration--Integration with the Transformation Catalog Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  13. Resultant Abstract Job on Kepler Canvas: A Pegasus abstract job can take in multiple input files as can output multiple output file Grid resources information is not expected in such an actor. Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  14. Support for Concrete Jobs--- useful for monitoring and debugging A concrete job requires specific grid resources information from the scientist. Allows the scientist to directly execute jobs on the grid Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  15. Pegasus Director/ DAX Generator • Controls the execution of all the job (actor) entities and creates a resulting directed acyclic graph in XML format • Generates a DAX • Gives it to DAGMan for execution Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  16. Sample DAX Generated : • <?xml version="1.0" encoding="UTF-8"?> • <!-- generated: 2006-12-03T19:27:27-08:00 --> • <!-- generated by: Nandita [??] --> • <adag xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.10.xsd" xmlns="http://www.griphyn.org/chimera/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.10" count="1" index="0" name="WorkflowTemplate_1_1156808715390"> • <!-- part 1: list of all referenced files (may be empty) --> • <!-- part 2: definition of all jobs (at least one) --> • <job id="Job1_id" namespace="keplerdax" name="Job1" version="1"> • <argument><filename file="FileA.png"/> <filename file="FileB.txt"/></argument> • <uses file="FileA.png" link="input"/> • <uses file="FileB.txt" link="output"/> • <uses file="FileC.txt" link="output"/> • </job> • <job id="Job2_id" namespace="keplerdax" name="Job2" version="1"> • <argument><filename file="FileB.txt"/> <filename file="FileE.png"/></argument> • <uses file="FileB.txt" link="input"/> • <uses file="FileE.png" link="output"/> • </job> Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  17. <job id="Job3_id" namespace="keplerdax" name="Job3" version="1"> <argument><filename file="FileC.txt"/><filename file="FileD.xml"/></argument> <uses file="FileC.txt" link="input"/> <uses file="FileD.xml" link="output"/> </job> <job id="Job4_id" namespace="keplerdax" name="Job4" version="1"> <argument><filename file="FileE.png"/> <filename file="FileF.png"/></argument> <uses file="FileE.png" link="input"/> <uses file="FileD.xml" link="input"/> <uses file="FileF.png" link="output"/> </job> <!-- part 3: list of control-flow dependencies (may be empty) --> <child ref="Job2_id"> <parent ref="Job1_id"/> </child> <child ref="Job3_id"> <parent ref="Job1_id"/> </child> <child ref="Job4_id"> <parent ref="Job2_id"/> </child> <child ref="Job4_id"> <parent ref="Job3_id"/> </child> </adag> Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  18. Provenance Challenge Workflow in Kepler/Pegasus In Kepler each node needs a unique name, so TC needs many duplicate entries Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  19. Integration Benefit for Pegasus users • Visualizing/ Debugging Existing Models: • Support a scientist trying to redo/visualize or easily re-configure existing DAX • Provide option to upload existing DAX files into the workspace • Convert the specified DAX file into a MoML (Kepler’s) format by passing it through an XSLT processor and generating the required directors and actors on the canvas • Issues of scalability (only small workflows can be visualized) • Scoping may need to be applied Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  20. Integration Issues • Kepler acts a visual programming environment • Actors represent single units of computation with data flow among each other • Some configuration not intuitive (TC entries) • There is no concept of representation of files separately in Kepler • Have multiport I/O ports for each job • The user is given the option to connect as many files going into and coming out of the port • Potential use of integrated environment for debugging • Not done • Integration with Pegasus data registry • No monitoring of execution in Kepler • Use of Kepler’s workflow execution engine • Support for Kepler actors in Pegasus Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

  21. Relevant Links • Kepler: http://kepler-project.org • Pegasus: http://pegasus.isi.edu • DAGMan: www.cs.wisc.edu/condor/dagman/ • Provenance challenge: http://twiki.ipaw.info/bin/view/Challenge/ • Workshop on Tuesday • NSF workshop on Challenges of Scientific Workflows: www.isi.edu/nsf-workflows06 Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

More Related