340 likes | 464 Views
UC DAVIS Department of Computer Science. San Diego Supercomputer Center. Workflow Topics for the Next-Generation SDM-Center. Ilkay Altintas altintas@ SDSC .edu Bertram Ludäscher ludaesch@UC DAVIS .edu. Sir Walter Raleigh. SciDAC SDM AHM Oct 5-6, 2005, NCSU Raleigh, NC. Overview.
E N D
UC DAVIS Department of Computer Science San Diego Supercomputer Center Workflow Topics for the Next-Generation SDM-Center Ilkay Altintas altintas@SDSC.edu Bertram Ludäscher ludaesch@UCDAVIS.edu Sir Walter Raleigh SciDAC SDM AHM Oct 5-6, 2005, NCSU Raleigh, NC
Overview • Kepler/SPA: • What we have (The GOOD) • What we don’t (yet) have (The BAD) • What we really need?? (The UGLY) Things we might do; prioritization
Macro Definitions … • #define KEPLER KEPLER/SPA • #define KEPLER KEPLER*SPA • By the end: • #define SPA KEPLERHPC
What we have – The GOOD • Big Heritage from Ptolemy II • Vergil GUI for design and (some) execution monitoring • Actor-Oriented Modeling & Design • Director / Actor Separation • Models of Computation: PN, SDF, DE, .. • Nested Workflows & Hierarchical Modeling • Research Results on Modeling Complex Systems • modal models, mobile models, reconfig’able models, model lifecycle management, higher-order actors, … • head-start for CCA Extensions, e.g. • SciRUN-2 Extensions (Steve P. et al.) • Self-Managing, Dynamically-Adaptive, Autonomous Components (Manish et al.)
What we have – The GOOD • Kepler Extensions (to Ptolemy II) • Mostly: loosely coupled, e.g. WS (web service) workflows • Many generic actors • ssh, scp, cmd-line,SRB, Globus, … • new R expression actor • Many custom actors • e.g. in PIW, TSI-1, TSI-2, GEON, SEEK, Resurgence, … • Several ad-hoc extensions & (initial) research, e.g. • External job scheduling (e.g. NIMROD, …) • Director extensions (fault tolerance via WS “retry”) • WF-Templates (structured combination of dataflow & control-flow: fault-tolerance, reusability) • Higher-order functions (map/3, iterate-over-array, … : simpler control-flow, optimization potential, …)
What we have – The GOOD • Kepler Extensions (Cont’d) • Some generic extensions • Metadata-based (EML/ADN) Dataset Search • Concept-based Actor Search (OWL) • Documentation Framework • Authentication & Authorization Framework (GAMA from GEON) • Improved component/WF archival & plug-in (KAR,…) • Provenance Recorder (“Listener”) PS … a growing open-source developers community … … and some scientific users … (TSI-1/2, PIW, GEON, SEEK, … )
Concept-based Actor Search Concept-based Actor Search • Implemented as proof-of-concept • Additional operations slated for next Kepler Release (data search, port-based actor search, etc.) Biggest Challenges • Building/searching a repository … • Making changes to MoML (see KAR) • GUI changes • Ontology management Workflow Components (MoML/KAR) Ontologies (OWL) Default + Other Semantic Annotations instance expressions urn ids
The GOOD: Kepler Archives • Purpose: Encapsulate WF data and actors in an archive file • … inlined or by reference • … version control More robust workflow exchange Easy management of semantic annotations Plug-in architecture (Drop in and use) Easy documentation updates • A jar-like archive file (.kar) including a manifest • All entities have unique ids (LSID) • Custom object manager and class loader • UI and API to create, define, search and load .kar files
KAR File Example <entity name="Multiply or Divide" class="ptolemy.kernel.ComponentEntity"> <property name="entityId" value="urn:lsid:localhost:actor:80:1" class="org.kepler.moml.NamedObjId"/> <property name="documentation" class="org.kepler.moml.DocumentationAttribute"></property> <property name="class" value="ptolemy.actor.lib.MultiplyDivide" class="ptolemy.kernel.util.StringAttribute"> <property name="id" value="urn:lsid:localhost:class:955:1" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="multiply" class="org.kepler.moml.PortAttribute"> <property name="direction" value="input" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="true" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="divide" class="org.kepler.moml.PortAttribute"> <property name="direction" value="input" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="true" class="ptolemy.kernel.util.StringAttribute"/> </property> <property name="output" class="org.kepler.moml.PortAttribute"> <property name="direction" value="output" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="false" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="semanticType00" value="http://seek.ecoinformatics.org/ontology#ArithmeticMathOperationActor" class="org.kepler.sms.SemanticType"/> </entity>
Kepler Object Manager • Designed to access local and distributed objects • Objects: data, metadata, annotations, actor classes, supporting libraries, native libraries, etc. archived in kar files • Advantages: • Reduce the size of Kepler distribution • Only ship the core set of generic actors and domains • Easy exchange of full or partial workflows for collaborations • Publish full workflows with their bound data • Becomes a provenance system for derived data objects => Separate SPA workflow repository and distribution
Provenance Framework • Provenance • Track origin and derivation information about scientific workflows, their runs and derived information (datasets, metadata…) • Need for Provenance • Association of process and results • reproduce results • “explain & debug” results (via lineage tracing, parameter settings, …) • optimize: “Smart Re-Runs” • Types of Provenance Information: • Data provenance • Intermediate and end results including files and db references • Process (=workflow instance) provenance • Keep the wf definition with data and parameters used in the run • Error and execution logs • Workflow design provenance (quite different) • WF design is a (little supported) process (art, magic, …) • for free via cvs: edit history • need more “structure” (e.g. templates) for individual & collaborative workflow design
Kepler Provenance Recording Utility • Parametric and customizable • Different report formats • Variable levels of detail • Verbose-all, verbose-some, medium, on error • Multiple cache destinations • Saves information on • User name, Date, Run, etc… Joint work with Oscar Barney
Provenance: Next Steps • .kar file generation, registration and search for provenance information • Possible data/metadata formats • Automatic report generation from accumulated data • A relational schema for the provenance info in addition to the existing XML • Smart re-runs
The Future • From GOOD via BAD to UGLY • The good news (about ‘bad’ and ‘ugly’) • Lots of interesting challenges! • … so ‘ugly’ is actually good!
What we don’t (yet) have … THE BAD • Much is still to do (or still ongoing) • Detached execution • many options; depend on requirements • Kepler WF repository w/ dynamic actor plug-in • Smart Reruns • avoid doing (old) work twice • Smarter Reruns (too smart?) • reuse previous results for speed-up of (new) work • NIMROD Director, CONDOR Director … • Task manager / monitor • Support for WF design & reuse • Semantic extensions • “Design Patterns”, Templates
What we don’t have … THE BAD cont’d • Vertical SDM Integration • Workflow layer could be used to embed other SDM components and glue them together • Scope & Architecture unclear • Data Mining tools new WF actors • Parallel-R new WF actors !? • SEA, Bitmap tools new !? • MPI-IO alternative to current Kepler data access!? • … • Not only a technical problem • e.g. need for driving use-cases that require combination of several SDM layers together
Challenges • Easier said … • “We’re not going to reinvent the wheel …” • “We just use XYZ …” • XYZ in {CCA, HDF5, PnetCDF, Ccaffeine, Condor, MPI-IO, parallel-R, …} • … than done … • Incompatible, isolated solutions and frameworks • Can’t use workflow/actor/director A with B • Coming up with a coherent, overall architecture is hard!
HTC Example (using: NIMROD) • need to make Kepler NIMROD/Condor/… “aware” • similar need for HPC support
Another Distribution Approach Source: Daniel Lázaro Cuadrado, Aalborg University Servers Service Locator(Peer Discovery) Client Simulation is orchestrated in a centralized manner Computer Network
What we don’t have … THE UGLY • Workflow Design & (Re-)Usability • Difficult Marriage of Dataflow and Control-flow • e.g. PIW, TSI-1/2, GEON-A-type-WF, … • WF development, deployment, maintenance, use • from (Mess…) to Art to Commodity ( next presentation) • support for WF whole life-cycle • Fault Tolerance • current embedding of control-flow into dataflow yields to non-maintainable workflows! • Close Coupling of Components for HPC • CCA-style • MPI-style • Memory-to-Memory (on single nodes) • large, efficient data transfer • …
WF-Design: Adapters for Semantic & Structural Incompatibility Adapters may: • be abstract (no impl.) • be concrete • bridge a semantic gap • fix a structural mismatch • be generated automatically (e.g., Taverna’s “list mismatch”) • be reused components(based on signatures) C D C D C D C1 C1 D1 C1 D D C2 C2 D2 C2 map f1 f1 f2 f2 [S] S T [S] [S] [T] map map f1 f1 f2 [[S]] S T [[S]] [[T]] [[S]] f2
… f1 … f1 f2 f2 Additional Design Primitives for Semantic Types Resulting Workflow Extended Transformations Starting Workflow Resulting Workflow t9: Actor Semantic Type Refinement (T T) T T t10: Port Semantic TypeRefinement (C C, D D) C D C D C D D D t11: AnnotationConstraint Refinement ( ) C D C C 1 2 1 2 1 2 t t t s s s t12: I/O Constraint Strengthening ( ) t13: Data Connection Refinement t14: Adapter Insertion t15: Actor Replacement f f t16: Workflow Combination (Map)
Workflow Design Primitives End-to-End Workflow Design and Implementation • Viewed as a series of primitive “transformations” • Each takes a WF and produces a new WF • Can be combined to form design “strategies” W0 Workflow Design Top-Down t W1 t Task Driven W2 Data Driven Bottom-Up … Structure Driven Wm Output Driven Semantic Driven t Workflow Implementation Wn Input Driven
Workflow Templates and Patterns New Ingredients Proposed Layered Architecture work w/ Anne Ngu, Shawn Bowers, Terence Critchlow
Use Ideas from Fault Tolerant Shell Good ideas in ftsh; some might be (semi-)low hanging fruits for Kepler … Source: Douglas Thain, Miron Livny The Ethernet Approach to Grid Computing
Kepler Coupling Components & Codes • Types of Coupling … • Loosely coupled (“1st Phase”) • Web Services (SPA, GEON, SEEK, …), • ssh actors, .. + reusability (behavorial polymorphism) + scalability (# components) – efficiency • Tight(er) coupling (“2nd Phase”) • Via CCA (SciRUN-2, Ccaffeine, …) (Cipres uses CORBA) • HPC needs: code-coupling as efficient & flexible as possible (e.g. Scott’s challenges…) • memory-to-memory (single node or shared memory), • MPI (multiple-nodes) • optimizations for transfer of data & control (streaming, socket-based connections)
Accord-CCA: Ccaffeine w/ Self-Managed Behavior cf. w/ mobile models, reconfiguration in Ptolemy II … begging for a Kepler design and implementation … Source: Hua Liu and Manish Parashar
Different “Directors” for Different Concerns • Example: • Ptolemy Directors – “factoring out” the concern of workflow “orchestration” (MoC) • common aspects of overall execution not left to the actors • Similarly: • “Black Box” (“flight recorder”) • a kind of “recording central” to avoid wiring 100’s of components to recording-actor(s) • “Red Box” (error handling, fault tolerance) • use ftsh ideas; tempaltes • “Yellow Box” (type checking) • for workflow design • “Blue Box” (shipping-and-handling) • central handling of data transport (by value, by reference, by scp, SRB, GridFTP, …) • “CCA++ Boxes” • Change behavior (e.g. algorithm) of a component • Change behavior (i.e., wiring) of a workflow in-flight SDF/PN/DE/… Provenance Recorder On Error Static Analysis SHA @ Component Mgr Composition Mgr
Summary • The GOOD: • lots to build upon • The BAD: • no common / integrated architecture use Kepler/SPA as a glue • this might be harder than it sounds • needs a mix of end-to-end application-drive and serious design effort for the integration architecture • The UGLY: • HPC challenges: close coupling, fault tolerance, … • The good news: there’s work to be done!
Use of Semantics in SWF… “Smart” Search • Concept-based, e.g., “find all datasets containing biomass measurements” Improved Linking, Merging, Integration • Establishing links between data through semantic annotations & ontologies • Combining heterogeneous sources based on annotations • Concatenate, Union (merge), Join, etc. Transforming • Construct mappings from schema S1 to S2 based on annotations Semantic Propagation • “Pushing” semantic annotations through transformations/queries
(≺) Helping with “shims” / adapters • Services can be semantically compatible, but structurally incompatible Ontologies (OWL) Compatible (⊑) SemanticType Ps SemanticType Pt Incompatible StructuralType Ps StructuralType Pt (⋠) (Ps) Desired Connection Source Actor Target Actor Pt Ps Source: [Bowers-Ludaescher, DILS’04]