310 likes | 462 Views
A Survey of Programming Frameworks for Dynamic Grid Workflow Applications. November 2 nd , 2007 Taura Lab Ken Hironaka. Background. Attempts to analysis databases of enormous size Genetic sequence database BLAST (Basic Local Alignment Search Tool) library MEDLINE journal abstract database
E N D
A Survey of Programming Frameworks for Dynamic Grid Workflow Applications November 2nd , 2007 Taura Lab Ken Hironaka
Background • Attempts to analysis databases of enormous size • Genetic sequence database • BLAST (Basic Local Alignment Search Tool) library • MEDLINE journal abstract database • Enju (a syntactic parser for English) • Improvements in algorithms are not enough to handle the overwhelming amount of data • Need to be able to parallelize computation
Basic Demands • Express the workload with ease • Don’t have to think about complex configuration files • Parallel Computation • No Distributed Computing experts required!
Well known frameworks • Batch Schedulers • Solution for cluster computers • Submit each task as a “Job” in input file(s) • The job is scheduled to an idle node • Good for embarrassingly parallel tasks • Tasks with no inter-task dependencies • Data sharing by NFS • Easy data collection Central Manager Submit Assign Busy Nodes Cluster
Arising Problems • Handling Workflows • Coping with Grid(multi-cluster) environments • Creation of tasks/aggregation of results
Handling Workflows • Most Tasks are not so embarrassingly parallel • Blindly scheduling jobs is not good enough • Workflows: Dependencies between tasks • Passing output files as input files bet. tasks • Eg: Natural Language Processing Task File Phonological Analysis Morphological Analysis Syntactic Analysis Semantic Analysis
Coping with Grid environments • Multiple Clusters • 1 huge cluster is rare • Connectivity in WANs • Firewalls, NATs • File sharing problems • Independent file systems • Dynamics • Nodes joining and leaving(failure) Fire Wall leave join
Task creation/data collection • “Task” in the conventional sense • Simple compute a given data • Manual task creation • Splitting the problems into sub-problems • Tedious manual work for large input/databases • Manual data collection • Collecting results afterwards • What if they are dispersed all over the Grid? • Not so trivial for modern settings • Such tasks need to be built into the framework
Detailed summary of the demands • Modern Parallelization frameworks ⇒Frameworks that facilitate Workflow applications on dynamic Grid environments • Handle workflowswith grace • Cope with WAN connectivity problems • Handle Data Transfers • Cope with dynamic changes in resources • Automatically create tasks/collect results • EASY TO USE!
The Organization of this Presentation • Background/Introduction • Existing Programming Frameworks • Conclusion/Future Work
Condor • One of many available batch schedulers • Maintains a pool of idle nodes in a cluster • Goes beyond a regular scheduler • Allows workflow expression for submitted tasks • File Transfer extension to handle data driven jobs • Grid-Enabled (Condor-G) • Uses the Globus toolkit to run on multiple-clusters • Allows jobs to be scheduled on different pools on different clusters
DAGMan: Expressing Workflows • Extension to Condor • Executes a set of Tasks with DAG dependencies • DAG(Directed Acyclic Graph) • An expression of workflows • A→B : Amust finish before Bstarts • eg: A→C, B→C, C→D, C→E • Can express general workflows • Fault-Tolerance • In case of failure, restarts from the job that failed • Eg. If Task C fails, Task A and B are not redone A D C B E
DAGMan: Howto • Create a script defining jobs and dependencies • Submit DAG • It will be automatically translated into Condor Jobs and will be scheduled accordingly in Condor ### sample.dag ### #define jobs Job A A.sh Job B B.py Job C C.pl Job D D.sh Job E E.sh #define dependencies PARENT A B CHILD C PARENT C CHILD D E • $condor_submit_dagsample.dag
Stork • Data Placement Scheduler for Condor • File transfer across file systems • ftp, scp, http, etc… • A transfer is treated as a DAGMan Job • Allows Jobs to pass files without shared FS • Inter-task data passing is not possible • Must use a third-party server to pass data ### trans.stork ### [ dest_url = "file:/tmp/hoge.tar.gz"; src_url = "ftp://www.foo.com/hog.tar.gz”; dap_type = transfer; ] ### sample2.dag ### DATA INPUT0 trans.stork JOB A A.sh PARENT INPUT0 CHILD A
Review of Condor • Strong in classical batch queuing system related topics • Pros • Handles workflows and fault-tolerance • Possible to deploy on multiple clusters • Cons • Condor and its extensions must be installed by the system administrator on all nodes on each cluster • Big initial overhead • Cannot add more nodes dynamically • Limited file transfer options • Inter-task data passing is not possible • Task creation and result collection done manually
Ibis(Satin) • Java-based parallel computation library • Distributed Object Oriented • Transparent location of distributed objects • Offers RMI(Remote Method Invocation) • Transparent delegation of computation • Divide-and-Conquer type applications • Satin foo.doJob(args) compute RMI foo
Divide-and-Conquer Fib(20) • One large problem may be recursively split into numerous smaller subproblems • Eg. Quick Sort, Fibonacci • SPAWN • Create sub-problems “children” • SYNC • By parent • Wait for sub-problem results • Can Express DAG workflows = Fib(19) + Fib(18) Parent – Child Relationship
Divide-and-Conquer: HowTo ### fib.java ### import ibis.satin.SatinObject; Class Fib extends SatinObject implements …{ public int fib(int N){ if(n<2)return n; int x = fib(N-1); int y = fib(N-2); sync(); return x + y; } } • Import the library • Define user class extending on the SatinObject library class • Define computation methods • Use recursion • Allows creation of sub-problems • sync() • Implicit definition of dependencies Implicit spawn Wait for results
Random Work Stealing • strategy to load-balance among participating nodes • An idle node steals an unfinished sub-problem from a random node • The result is returned to the victim node • Adapts to joining nodes • Automatically acquire tasks Node 0 STEAL Node 1 STEAL Node 2
Dealing with Failures • When a node fails, its sub-problems needs to be restarted. • Orphan Tasks • Sub-problems that lose the parent by which their results are used • Orphan Tasks results are circulated among nodes Node 0 Node 1 Results cached & circulated Node 2 Orphaned Sub-Problems
Review of Ibis(Satin) • Pros • Benefits from targeting divide and conquer applications • Able to handle workflow by using spawn and sync • Automatically creates tasks/collect results • Handles dynamic joining/leaving of nodes • Random work stealing • Recycling Orphan sub-problem results • Cons • Currently, only supports direct communication among nodes(not for Firewall or NAT) • Targeted for CPU intensive applications • No primitives for file transfer over the network
Map-Reduce • Framework for processing large homogenous data on clusters • Handling large databases in google • The user defines 2 functions • Map, Reduce Map() Reduce() Map() Reduce() Map() Output per reducer Input Data
Map-Reduce: Howto ### word count ### #key: document name #value: contents def Map(key, value): #emit 1 for each word for w in value: emit(w, 1) #key: word #values: list of 1s def Reduce(key,values): result = 0 #add up 1s for i in values: result += 1 emit(key, result) • Abstraction • Data file → set of key/values • Map • (k1,v1) → (k2, v2) • values with same key are combined • Reduce • (k2, list of v2) → list of v2
Implementation • Master – Worker Model • Worker: • Map Workers and Reduce Workers • Data is directly transferred from Map → Reduce • Master: coordinates flow of data bet. Workers • Work on failed workers are restarted • Distributed File System • Collection of results is made simple
Review of Map-Reduce • Abstracts data files to key/value sets • Computes on them using user defined function • Pros • Automatic task create/result collection • Automatic file transfers between Map/Reduce • Fault tolerant • Cons • Map – Reduce Model is still restrictive for many real-life application • Not for WAN • Cannot add nodes dynamically
Comparison of Frameworks - Each have their own strength and weaknesses - Not so trivial to make Grid workflow application easy for scientific computing users
Conclusion • We have presented a series of viable choices when one attempts to perform parallel workflow applications in a Grid environment • File Transfer, Task creation/data collection • Need tasks to be able to interact with external entities • Ibis: parent-child relationship • Map-Reduce: master – worker, worker – worker
Future Works • Workflow tasks cannot be isolated entities • Need means of interaction among them • Are raw sockets enough? • WAN, dynamic resource compatible • Grid enabled Workflow framework with following properties • RMI, file transfer primitive between tasks
Master Go Fetch Notify Input in Splits Map Reduce Map Worker Reduce Worker Map Worker Reduce Worker Map Worker Output per reducer
Node 0 Node 1 Node 2 Orphaned Sub-Problems
Adding Nodes • Possible to add more nodes at runtime • Uses a global server that is accessible from everywhere • A new node uses this server to bootstrap itself and join the already participating nodes • Random Work Stealing • Automatically load-balances in the face of new nodes Satin system Bootstrap Server Join and Steal