410 likes | 570 Views
A Framework for Flexible Programming in Complex Grid Environments. 04/24/08 Taura Lab. 2 nd Year 76426 Ken Hironaka. New Context for Grid Computing. Grid Computing: Computation across multiple clusters over WAN Conventionally, high performance computing Parallel programming experts
E N D
A Framework for Flexible Programming in Complex Grid Environments 04/24/08 Taura Lab. 2nd Year 76426 Ken Hironaka
New Context for Grid Computing • Grid Computing: • Computation across multiple clusters over WAN • Conventionally, high performance computing • Parallel programming experts • Broadening of demands and needs • Natural Language Processing • Genetic Sequence Analysis • The users are extending to Non-parallel programming experts
More Applications for Grid Computing • Just computing ⇒ computing is only 1 part • “Cloud Computing” • Applications with Grid computing backend • Handle intensive computation • load-balancing • e.g.: Web-Applications backend Publicly accessible Application Simple Job Submitter is not enough
Problems with conventional Frameworks Task File • Conventional Grid Computing • Distributed Task computation frameworks • No interaction • Answer to broader Application and Demands • Complex interaction • Need for flexible workflow coordination without loss of simplicity • Programming support for Grid Fine Grain Interaction
Problems with Grid Computing • Deployment Complexity on Grid Environments • Dynamically joining nodes • Node/Network failures • Network environment • Prevalence of NAT/firewall • Unreliable WAN connections Fire Wall leave Faulty link Configuration?Communication (sockets)? Error Handling? Need for simple deployment on complex environments join
Our Contribution • A distributed object-oriented Programming Framework that alleviates the burden of Grid environments • Flexibility of programming without loss of simplicity • Simplicity of deployment • Run on the Grid with minimal configuration • Real – Life Applications • Deployed an application on over900 cores across 9 clusters • “trouble-shooting” search engine on the Grid • As example of Cloud Computing
Agenda • Introduction • Related Work • Proposal • Preliminary Experiments • Conclusionand Future Work
Distributed Objects and RMI foo.doJob(args) • ProActive [Huet ‘04] • Distributed Object Oriented • Objects on remote nodes • Work Delegation • RMI (Remote Method Invocation) • Parallel Computation via asynchronous RMI • Possible race-conditions • Active Objects • 1 object = 1 thread • induces deadlocks compute RMI foo Async. RMI • Need for synchronization ⇒ • cluttered with locks/synchronization • Coding becomes complex deadlock b.f() b a a.g()
Handling Joins and Failures • JoJo [Nakada ‘04] • Master – Worker framework • Event driven coding • A handler is invoked for each event • Task completion • Node Joins • Node Failures Failure Handler Join Handler Join • Synchronization issues • Event driven programming • For more complex problems, coding easily becomes unreadable
Resolving Connectivity on the Grid NAT • ProActive [Huet ‘04] • overlay Network for communication • Resorts to manual network configuration files • Specify each connection Configure each link Firewall • configuration overhead becomes enormous on Grid scale Connection Configuration File
Agenda • Introduction • Related Work • Proposal • Preliminary Experiments • Conclusionand Future Work
Our Proposal • A distributed object oriented framework for the Grid • Distributed Objects with Grid programming support • deadlock-free synchronization • Additional constructs to cope with join/failure of node • Automatic and Adaptive Overlay Construction for Grid Runtime • object oriented with support for race-condition/join/failure : flexibility and simplicity • deployment requires minimal configuration : simplicity
Object Synchronization Model waiting threads owner thread object • parallel programming with minimal use of explicit locks • Distributed objects with ownership • Its method can only be executed by 1 thread at a time : the owner thread • Eliminates data races • Owner gives up ownership for blocking operations • Other threads may contest for ownership • Eliminates deadlocks for common cases Th Th Th Th new owner thread object Th Th Th block Give-up Owner ship Th re-contest for ownership object Th Th Th Th unblock
Adaptation to Dynamic Resources Objects in computation • programming support for joining/leaving nodes • Decentralized object lookup • Allow joining nodes to access other objects and join the computation • Node Failure ⇒ RMI Failure • Failure returned as exception to method invocation • The user can catch the exception, and perform rollback procedures if necessary lookup New object on joining node Exception! Object on failed node
Automatic Overlay Construction (1) • Automatic/Transparent communication • Configuration ONLY for firewalled clusters • Adapts to dynamic joins/leaves • Nodes create a TCP overlay network cooperatively • Each node picks a small number of nodes to connect • Created connected graph NAT Global IP Firewall Attempt connection established connections
Automatic Overlay Construction (2) • NAT Clusters • NAT nodes can connect to global nodes • Firewalled Clusters • Automatic SSH port-forwarding • User specifies points • Transparent Communication • Point-Point communication is routedover the network • Ad-hoc routing Protocol • AODV [Perkins ‘97] • Adapts to node joins/leaves Firewall traversal SSH P-to-P communication
Failure Detection on Overlay RMI handler • How do we detect failures on the overlay? • RMI Failure • Intermediate/end node failure ⇒ link failure • Path Pointers • Forwarding nodes remember the nexthop • RMI reply is returned the same way • For link failure along pointer, back-propagate the failure to the invoker Path pointer Backpropagate failure
Agenda • Introduction • Related Work • Proposal • Preliminary Experiments • Conclusionand Future Work
Experiment Cluster Settings 900 cores over 9 clusters Global IPs istbs tsubame hongo okubo chiba All packets dropped suzuk kyoto kototoi imade Private IPs Firewall
Overlay Construction Simulation • Evaluate the overlay construction scheme • For different cluster configurations, modified number of attempted connections per peer • 1000 trials per each cluster/attempted connection configuration Even for pathological case,20 connections per peer is enough
Dynamic Master-Worker • A master object distributes work to worker objects • 10,000 tasks all together • Task as RMIs • Worker nodes join/leave at runtime • New task for new node • Reassignment for tasks on failed nodes • No tasks were lost during computation
Dynamic Master-Worker As the number of workers change, the number of assigned tasks change accordingly. The Master adaptively distributes, rolls back, and redistribute tasks.
A Real-Life Application • Solving a combination optimization problem • Permutation Flow Shop Problem • Parallel Branch-and-Bound • Master-Workerstyle • Periodic updates • Work distribution • Divide search space evenly as subtasks • Load-balancing • Unfinished tasks are sub-divided and redistributed • Wasteful computation is quite possible
Master-Worker Coordination • Master does RMI to Worker • Worker: periodic bound exchange with master • Not a straightforward Master-Worker application • Requires flexible framework like ours Master exchange_bound() doJob() Worker
Runtime Speedup Lacks scalability with over 900 cores
Cumulative Computation Time Growth in Cum. Comp. time is attributed to increased re-execution of task If the Cum. Comp. time is taken into account, the speed up from 169 cores to 948 cores (5.64 times) is 4.94
Troubleshoot Search Engine • Ever stuck debugging, or troubleshooting? • Re-rank google queries and give weight to pages for web-forums and solutions • Natural language processing and machine learning • Parallel computation on Grid backend • Real time response Compute!! Compute!! backend Query: “vmware kernel panic” Search Engine Compute!!
Agenda • Introduction • Related Work • Proposal • Preliminary Experiments • Conclusionand Future Work
Conclusion • A distributed object oriented programming framework for Grid environments • A novel distributed object oriented programming model • Grid-enabled via automatic overlay construction • Showed that real-life Grid application needs can be addressed by our framework • Deployed actual parallel applications on over 900 cores over 9 clusters with NAT/Firewalls, joins, and failures • Implemented a Grid computing backend for troubleshooting search engine
Future Work • Reliable WAN communication for the Grid overlays • Node failure • Connection failure • Weakness of WAN connections • Router Policies • close connections after given period • Obscure kernel bugs with NAT • Connection resets Faulty link WAN links are more vulnerable, and failures will occur
Some Related Work Fewest Hop: High Reliability High Power Usage • Robust Tree Topologies for Sensor Networks [ England ‘06] • Create spanning tree for data reduction • Flat tree for high reliability • Fewest Hops • Tree with short distance for low power consumption • Shortest Path ⇒ Spanning Tree that merges the two metrics for the best of two worlds Shortest Path: Low Reliability Low Power Usage
Possible Future Direction • Our context: Grid computing • communication latency = metric for link reliability • Fewest Hops • Reliability for node failure • Shortest Distance • Reliability for link failure Short reliable links Long faulty links Can we construct an overlay connection topology that take the best of two worlds?
Publications • 1. Ken Hironaka, Hideo Saito, Kei Takahashi, KenjiroTaura. A Flexible Programming Framework for Complex Grid Environments. In 8th IEEE International Symposium on Cluster Computing and the Grid, May 2008 (Poster Paper. To Appear). • 2. Ken Hironaka, Hideo Saito, Kei Takahashi, KenjiroTaura. A Flexible Programming Framework for Complex Grid Environments. In IPSJ Transactions on Advanced Computing Systems. (Conditional Accept) • 3. Ken Hironaka, Hideo Saito, Kei Takahashi, KenjiroTaura. A Flexible Programming Framework for Complex Grid Environments. In Proceedings of 2008 Symposium on Advanced Computing Systems and Infrastructures. (To Appear) • 4. Ken Hironaka, Shogo Sawai, KenjiroTaura. A Distributed Object-Oriented Library for Computation Across Volatile Resources. In Summer United Workshops on Parallel, Distributed and Cooperative Processing. August 2007 • 5. Ken Hironaka, KenjiroTaura, Takashi Chikayama. A Low-Stretch Object Migration Scheme for Wide-Area Environments. In IPSJ Transactions on Programming. Vol 48 No.SIG 12 (PRO 34), pp.28-40, August 2007
Problems with Grid Computing (2) • Complexity of Programming on the Grid • Low Level Computing (sockets) • Communication • Multi-threaded Computing (Synchronization) • Heavy Burden on Non-experts • Flexibility and Integration • Grid Frameworks for task distribution • Independent parallel programming languages • Computing is not execution of many independent tasks • Need finer grained communication • Bad interface with user application • Java, Ruby, Python, PHP
Related Work • Discussed with respect to criteria necessary for modern Grid computing • Workflow Coordination • Flexibility without putting the burden on the user • Joining Nodes / Failure of resources • Handling these events should not dominate the programming overhead • Connectivity in Wide-Area Networks • Adaptation to networks with NAT/firewall with little manual settings
Workflow Coordination (1) Central Manager • Condor / DAGMan [Thain ‘05] • “Tasks” are expressed as script files and distributed on idle nodes • Dependency between tasks can be expressed in DAG (Directed Acyclic Graph) • Ibis / Satin [Wrzesinska ‘06] • framework for divide-and-conquer problems • Tasks can be broken into smallersub-tasks, on which it depends Assign Busy Nodes Cluster Task • Many computation cannot be expressed as “Tasks” with dependencies • A task’s communication is limited to others to which it has dependencies DAG Dependency Relationship
Object Synchronization Example In method f(), instance a invokes blocking method g() on object b class A: def __init__(self, x): self.x = x def f(self, b): self.x += 1 #blocking RMI b.g() self.x -= 1 return a b only 1 thread at a time b.g() block give-up ownership during RMI Atomic section Value x stays consistent Atomic section
Adaptation to Dynamic Resources signal • Signal delivery to objects • Unblocks any thread that is blocking in the object’s context • Can be used to notify asynchronous events • A joining node • Node Failure ⇒ RMI Failure • Failure returned as exception to method invocation • The user can catch the exception, and perform rollback procedures if necessary object unblock Th block exception
Preliminary Experiments • Overlay Construction Simulation • A Simple Master-Worker Applicationwith dynamically joining/leaving nodes • A Real-life Parallel Application on the Grid • A Troubleshoot-Search Engine