240 likes | 397 Views
A Hybrid Decomposition Scheme for Building Scientific Workflows. Wei Lu Indiana University. Our work. Application Decomposition. Large scientific applications require Decomposing the problem into manageable units Units need to be Self-described Self-encapsulated
E N D
A Hybrid Decomposition Scheme for Building Scientific Workflows Wei Lu Indiana University
Our work Application Decomposition • Large scientific applications require • Decomposing the problem into manageable units • Units need to be • Self-described • Self-encapsulated • Independently developed and deployed • composable • Two decomposition dimensions • Functional Decomposition (a.k.a. Spatial Decomposition) • C/C++, JAVA • Component • Temporal Decomposition • Unix Pipe • Workflow • however, • most PSEs provide only one approach to the exclusion of the other
Common Component Architecture (CCA) • Scientific computing imposes special requirements • Support for legacy software • Performance is crucial • languages, data types • Fortran, C/C++, Python, Java, etc. • Complex numbers and Arrays (as first-class objects) • Support the various parallel run-time platforms • CCA • Component framework specification • Designed for the scientific high performance computing • Aims at improving the scientific software reusing
IntegratorPort FunctionPort FunctionPort FunctionPort C Fortran Python MidpointIntegrator NonlinearFunction LinearFunction CCA Component • Each component describes • What functionality it fulfills • Provide port • What functionality it needs to fulfill its task • Use port • Use-Provide pattern • Plug-and-play • The port is described in SIDL • Scientific Interface Definition Language • Partially derived from CORBA IDL • With constructs to describe the complex number, array, etc. • Babel : Language Interoperability Tool
interface IntegratorPort extends gov.cca.Port { double integrate(in double lowBound, in double upBound, in int count); } Example of the CCA Composition
Ccaffeine • Parallel implementation of the CCA framework • SCMD (Single Component Multiple Data) • Inter-components communication • virtual function call in the same address space • Intra-components communication • could be MPI, PVM, etc.
URL GridFtp Classifier localFilePath Credential Kepler • Scientific workflow enviroment • Data-flow oriented • Basic unit: Actor • Input, Output • Typed dataflow structure • Lots of domain-specific actors supporting • biology, ecology, astronomy • General facility actors • Grid service actor • Web service actor • Wire the actors by piping
Actor Stands for one function Port Input/Output A data-structure definition Connection Producer to Consumer Compositions defines “How” Advantages Loosely coupled Supports distributed resource sharing Component Stands for one class Port Provide/Use An interface signature Connection Caller to Callee Composition defines “What” Advantages Good performance Supports parallel programming model Compare Side by Side
A Hybrid solution • Typical scientific applications • involve multiple distributed data processing phases. • Among those phases there are number of computationally intensive cores, • often the classical numerical algorithm • need the high performance execution environment. • The hybrid scheme • use the workflow scheme to decompose based on the distribution of the resource • Then use the component scheme to further decompose those computationally intensive sub-problems to form the parallel solution. • Benefit from both schemes
service create Task-specific service Service over Components • Building web service over the CCA • Web service = good interoperability • Kepler supports web service as the actor • More resource and protocols (e.g., WS-BEPL) • Façade pattern • External view by the coarse-grained web service • Internal functionality by the fine-grained components. • Factory pattern • Workflow needs • a task-specific service rather than meta-level service. • The task-specific Service • Should be created dynamically and on-demand • But service is not instantiable !
Architecture • Job • A specific task performed by a group wired components • Two phases execution • Compose the job • Run the job • Two explicitly separated web services (CCA-Services) • Factory Service • Job Proxy Factory Service Ccaffeine Framework IPC Composer Job description Job Proxy Invocation User
Job Factory Service • A Façade for the ccaffeine framework • Connects the ccaffeine muxer via a socket • Maintains the job tables, job lifecycle • Create • parameters • Gateway port • the task-specific interface • Composition Description: • how components wired to support the Gateway port • Convert the SIDL to WSDL • Gateway port definition to the equivalent WSDL • Forward the composition commands to the ccaffeine muxer • Will be executed in parallel • Maintain job records internally • Create the Job Proxy service • return its WSDL URL • Modify • Change the composition without impacting the service interface
Job Proxy Service • Façade for the wired components • With task-specific WSDL interface • When getting the SOAP message • Extract the argument from the message • Pass the argument to the ccaffeine • Invoke the ccaffeine • Get result from Driver and send SOAP response Job Proxy Driver Arguments SOAP request User
Go Job Proxy SOAP User Job WSDL Gateway port Example Job table Factory Service socket Composer Gateway port composition
SIDL Port interface (methods) object oriented Port interface A virtual interface inheritance, polymorphism Can be referred as the function parameter type No data structure so far WSDL PortType (operations) wire-format description PortType A group of message exchanges no inheritance, no polymorphism can’t be referred as the method parameter type Any type is data structure essentially (by XML Schema) Challenge No way to figure out the structural information from a SIDL port interface! Convert SIDL to WSDL Introducing structure in SIDL will alleviate the problem reasonably • Current workaround: • Only allow the methods with primitive argument type
Example interface IntegratorPort extends gov.cca.Port { double integrate(in double lowBound, in double upBound, in int count); } <wsdl:message name="integrateInput"> <wsdl:part name="lowBound" type="xsd:double"/> <wsdl:part name="upBound" type="xsd:double"/> <wsdl:part name="count" type="xsd:integer"/> </wsdl:message> <wsdl:message name="integrateOutput"> <wsdl:part name="return" type="xsd:double"/> </wsdl:message> <wsdl:portType name="integrator.IntegratorPort_PortType"> <wsdl:operation name="integrate"> <wsdl:input message="integrateInput"/> <wsdl:output message="integrateOutput"/> </wsdl:operation> </wsdl:portType>
Kepler Web Service Actor • Kepler provides a general web service actor • For a method defined in the WSDL • The actor will dynamically adjusts its input/output setting
Kepler CCA-Service Actor • For CCA-Serivce • Recall that we have 2 explicit steps • the JobProxy service is dynamically created • We need to hide the procedure of creating the JobProxy service from the user • CCA-Service Actor • Extended from the web service actor • First calls the JobFactory service to create the JobProxy service • With the WSDL of JobProxy, it does same thing as a general web service actor does
Change the GUI from Socket stream based to Soap message based.
Conclusion • A hybrid decomposition scheme for scientific application • Workflow scheme is used first based on the resource distribution • Component scheme is used to further decompose the core parts • Web service interface is the key to the integration • CCA integrates into Kepler as a special actor, with GUI supporting unified visual environment. • Converting SIDL to WSDL is inherently challenging, Structure is useful for distributed systems, so we need to introduce the Structure into SIDL
Thanks • Thanks for the valuable comment by the reviewers