330 likes | 467 Views
Flexible Scientific Workflows Using Dynamic Embedding. Anne H.H. Ngu, Nicholas Haasch Terence Critchlow Shawn Bowers, Timothy McPhilips, Bertram Ludaescher. Outline. Scientific Workflow Problems with static scientific workflow Frame actor Dynamic Embedding
E N D
Flexible Scientific Workflows Using Dynamic Embedding Anne H.H. Ngu, Nicholas Haasch Terence Critchlow Shawn Bowers, Timothy McPhilips, Bertram Ludaescher
Outline • Scientific Workflow • Problems with static scientific workflow • Frame actor • Dynamic Embedding • Implementation of Dynamic Embedding • TSI case study • Conclusion
Scientific Workflows A model of the way a scientist works with their data and tools • Mentally coordinate data export, import, analysis and visualization via various software tools Goals: • Design • Automation • Component reuse … make data analysis and management tasks easier for the scientist!
SPA/Kepler Scientific Workflow System • Scientific Process Automation (SPA) • Modeling Workflows using actor-oriented framework • Executing and Monitoring Workflows • Built on top of Ptolemy II (Berkeley) • Graphical User Interface • Similar to a charting program (Visio) • Drag-and-Drop Components • Connect components • Execute workflows • Monitor Execution SPA PtolemyII
Actor-Oriented Modeling Actors • single component or task • well-defined interface (signature) • given input data, produces output data
Actor-Oriented Modeling Parameter • Input that is configured statically • User changes at design • Produces data
Actor-Oriented Modeling Sub-Workflows (aka Composite Actors) • composite actors “wrap” sub-workflows • like actors, have signatures (i/o ports of sub-workflow) • hierarchical workflows (arbitrary nesting levels) for abstraction • versus ‘Atomic Actors'
Actor-Oriented Modeling Directors • define the execution semanticsof workflow graphs • schedule and execute workflow graphs • sub-workflows may be governed by different directors • Examples: Synchronous Data-Flow (SDF), Process Networks (PN), Discrete Event (DE), Finite State Machine (FSM)
Problems in current SPA/Kepler modeling framework • Workflow is static and must be completely specified before orchestration • Specific tools used in the workflow must be picked at design time (It may not be picking the best tool) • All alternative tools must be enumerated exhaustively (resulted in complex workflow)
Problems in current SPA/Kepler modeling framework (cont.) • Our work aims to provide an abstraction of a tool, a resource, or an algorithm • Specific tool or resource resource is selected during execution
Different TSI Workflows A Each containing a Submit Job Actor B C
TSI Submit Job Actors…when we look at the sub-workflows Each workflow does job submission in a different way C A B Each containing a remote execution
Goals • There exists actors (atomic and composite) that perform a similar task • Submit Job • Transfer Files • Process Files • RemoteExection • SSH2Exec • SSHWrapper • InvokeSubmitJob • Can we provide an abstraction for encapsulating different implementations of a specific task that can be reused across different workflows? • Can we execute the workflow flexibly? • Choosing a specific implementation depending on runtime condition
Requirements • Select and execute actor based on run time conditions • Execute with data process networks • Built in capability for streaming data • Built in concurrency • Selected actors instantiated on demand • Reusable actor can be nested • Actors are usable by scientist
a a F a Frames • Actors are concrete • Correspond to particular implementations • Frames are abstract • Placeholder for actor / composite actor • input, output, and parameter ports • An embedding occurs when a actor is placed in a frame • A refinement is an actor that can be embedded in a frame SSH Actor a RemoteExecutionFrame F F Embedding F[a]
SSH2 Exec SSH2 Exec Web Service Web Service Static Embedding Frames • A refinement to a Frame is embedded during design • Frames become concrete and cannot be reconfigured during workflow execution Design Time Run Time F SSH2 Exec Web Service
F F SSH2 Exec Web Service Static Embedding Frames (cont.) • Execute with data process networks • Refinement instantiated as needed • Can be nested • Actors are usable by scientist • Select refinements based on run time conditions Run Time
SSH2 Exec SSH2 Exec Web Service Web Service Dynamic Embedding Frames • Refinement to frames are embedded during execution • Frames are not concrete during workflow execution Run Time F SSH2 Exec Web Service
Why Dynamic Embedding? • Select refinements based on run time conditions • Execute with data process networks • Refinement instantiated as needed • Can be nested • Actors are usable by scientist
Implementation Of Dynamic Embedding Construct a new workflow to execute the actor Generates Remote Execution Frame Generated Workflow
Dynamic Embedding Process Remote Execution Frame • Wait for inputs to arrive to the Frame. • Select a refinement. • Transfer of input tokens from Frame to the refinement. • Select mappings • Input Port • Output Port • Parameter • Constructs internal workflow. • Run internal workflow. • Transfer of output tokens from the refinement to the Frame. Model Generated By Dynamic Frame
ModelReference Actor • A higher order actor that can execute a given model (workflow) through its input port. • It fits most of the requirement for dynamic embedding except: • User must pre-construct the given model • Output tokens from the given model are transferred only after completion of the internal workflow. • Our implementation of dynamic embedding leverage the capability of ModelReference actor with two major improvements: • The given model is constructed automatically • Output tokens are transferred synchronously • Frame is thus implemented as a subclass of ModelReference actor with four additional components: SelectActor(), FrameSourceActor(); FrameSinkActor() and PortWiring()
Implementing a Type of Frame • Subclass Frame and Implement • Selection Process • selectActor() • Configure Ports and Parameters • getIntputMappings() • getOutputMappings() • getParameterMappings()
Implements selection policy Returns a refinement Refinement that is returned gets automatically embedded Remote Execution String selectedActor selectActor(){ If(testWebService()) selectedActor = “webservice” return getWebServiceActor() else selectedActor = “ssh2exec” return getSSH2ExecActor() } Selection Process: selectActor()
Transfer Token Frame Input PortActor Input Port Actor Output PortFrame Output Port Expressed as list containing pairs of strings {“hostname”,”hostname”} {“command”,”cmd”} F hostname errors out hostname SSH2 Exec errors cmd command stdout Port Wiring Remote Execution String selectedActor getInputPortMapping(){ if(selectedActor==“SSH2Exec”) return {{“hostname”,”hostname”} {“command”,”cmd”}} else if(selectedActor==“webservice”) return {{“hostname”,”url”} {“command”,”method”}} }
TSI Case Study TSI-A Workflow SubmitJobFrame TSI-B Workflow
Remote Execution Frame TSI-A subworkflow TSI-B subworkflow SubmitJobFrame
Benefits • Select refinements based on run time conditions • Execute with data process networks • Refinement instantiated as needed • Can be nested • Actors are usable by scientist
Limitations • Limitations • Unable to type check internal workflow before execution • There is overhead of creating an additional workflow to execute a refinement • Change in selection process requires recoding/recompiling • Can not monitor internal workflow (Useful for debugging)
Future Work • Semantic binding of Ports and Parameters • Configurable selection criteria • Intelligent brokering • Ptolemy Expression Language • Python • Perl • … • Simplified refinement creation • Caching of generated workflows • Design time type checking of internal workflow
UCRL-ABS-226047 Work performed under the auspices of the U. S. Department of Energy by Lawrence Livermore National Laboratory under Contract W-7405-Eng-48