320 likes | 473 Views
Using Topology-Aware Communication Services in Grid Environments. Craig A. Lee, Eric Coe 1 , B. Scott Michel 2 , James Stepanek 2 , Ignacio Solis 3 , Matt Clark, Brooks Davis { lee | ecoe | scottm | stepanek | mclark | brooks }@aero.org The Aerospace Corporation
E N D
Using Topology-Aware Communication Servicesin Grid Environments Craig A. Lee, Eric Coe1, B. Scott Michel2, James Stepanek2, Ignacio Solis3, Matt Clark, Brooks Davis { lee | ecoe | scottm | stepanek | mclark | brooks }@aero.org The Aerospace Corporation 1Also at the University of Southern California 2Also at the University of California, Los Angeles 3isolis@cse.ucsc.edu, University of California, Santa Cruz Supported in part by a subcontract in the DARPA Active Networks Program Grids and Advanced Networks Workshop/CCGrid 2003 Tokyo, Japan, May 15, 2003
Introduction: Why Topology-Aware Communication Services? • Grids promise an unprecedented degree of distributed computing • A fabric of network-connected sites and resources • The network topology connecting these sites and resources can be exploited • Improve performance • Enable new functionality • As processors and networks get faster, grid computations will become increasingly latency-sensitive • Topology-awareness will become essential GAN/CCGrid 2003 Tokyo, Japan
Many Types of Services Improved or Enabled • Augmented Semantics • Caching (web caching), filtering, compression, encryption, quality of service, data-transcoding, etc. • Collective Operations • Accomplished “in the network” rather than using point-to-point msgs across the diameter of the grid • Communication Scope • Named topologies can denote a communication scope to limit problem size and improve performance • Content and Policy-based networking • Publish/subscribe, interest management, event services, tuple spaces, quality of service GAN/CCGrid 2003 Tokyo, Japan
Topo-Aware Comm Services Can Be Similar to an Overlay GAN/CCGrid 2003 Tokyo, Japan
Service Organization & Mgmt • Organization • Network of Servers • Middleware Layer – peer-to-peer virtual overlay • Active Networks – real overlays • Management • Creation, Configuration, Discovery, Termination • Natural application of grid tools! • This slide glosses over a multitude of issues! GAN/CCGrid 2003 Tokyo, Japan
A Case Study: Time Mgmt • Time Management enables temporal causality to be enforced in Distributed Simulations • Maintaining agreements about time within a “simulated world” is essential to its apparent reality and to the validity of observed results • Strict temporal synchronization is not necessary • Time must be synchronization only to the point where temporal causality is preserved • Topology-Aware Communication is a natural • Eliminates point-to-point communication • Increase performance for LBTS, the key TM algorithm GAN/CCGrid 2003 Tokyo, Japan
Fire and Destroy events observed out of order Observer Destroy Event Target Shooter Fire Event Wallclock Time A Simple Case ofViolated Causality Explicit references to simulated time, e.g., “the attack begins at 0300”, requires LBTS. GAN/CCGrid 2003 Tokyo, Japan
LBTS: the Heart of TM • Lower Bound Time Stamp algorithm • The lower time bound of all simulated entities (hosts) and all in-transit messages • Including in-transit messages means reduction of a variable number of values • Computing the reduction is the easy part • Knowing when you’re done is the hard part • LBTS is an instance of the Distributed Termination Detection (DTD) problem GAN/CCGrid 2003 Tokyo, Japan
Initiation Tags or “colors” used to distinguish multiple, simultaneous LBTSi computations Merging Multiple LBTSiinitiations must be merged Reduction Min time stamp of hosts and in-transit messages Announcement All hosts eventually need to know final LBTS value LBTS Algorithm Phases GAN/CCGrid 2003 Tokyo, Japan
Distinguished Root Node Single Spanning Tree In Grid Cloud H H DRN H H H GAN/CCGrid 2003 Tokyo, Japan
Active Time Mgmt Daemons implement DRN TM-Kit Provides modular place to implement TM Application requests TM-Kit to compute LBTS (through the HLA/RTI API) Several end-to-end comm topologies available Completely connected, Star, Butterfly All implemented using the same select() DRN was added as just another option Opens a single socket to the first-hop ATMD ATMD must sniff timestamps in data packets sent by the GaTech RTI Integrating DRN/ATMD w/ GaTech RTI GAN/CCGrid 2003 Tokyo, Japan
APSimScript Globus APSim APSim APSim HLA/RTI HLA/RTI HLA/RTI ATMD ATMD overlay Managing the Ensemble w/ Grid Tools discover start overlay request overlay config MDS-2 XBone Manager register GAN/CCGrid 2003 Tokyo, Japan
m Airplanes fly between n Airports Airplanes randomly fly from Airporti to Airportj Airplanes wait until runway available for take-off Flights last 1.0 hr + some + circling time Airplanes wait until runway available for landing Small random time for passenger handling, etc. Airports assigned to processors Airports process take-off, arrival, landing events Look-ahead of 1.0 hr. An HLA Application: Airport Sim GAN/CCGrid 2003 Tokyo, Japan
Part of six processor Aerospace Active Network Testbed 100 planes and 10 airports on 3 procs Run about 1000 simulated minutes Airport Sim Testbed Configuration GAN/CCGrid 2003 Tokyo, Japan
ATMD Instrumented with NetLogger Groups of Events for Each LBTS Computation Connected into ‘Lifelines’ GAN/CCGrid 2003 Tokyo, Japan
Metropolitan Testbed USC ISI Aerospace GAN/CCGrid 2003 Tokyo, Japan
Metro Testbed NetLogger GAN/CCGrid 2003 Tokyo, Japan
Perf. Results: LBTS Makespan (ms) Lab Testbed Metropolitan Testbed • Performance gap dramatically smaller! • DRN performance only slight affected by additional latency • Non-active performance 3x-4x slower GAN/CCGrid 2003 Tokyo, Japan
So, What To Do? • This configuration is too small to demonstrate real advantage • But going from the lab to across town made a huge difference in performance difference • Graph properties promise better performance but per node overhead is significant • Compared to “native” network hardware • Need to test larger configurations • Emulation • EmuLab – University of Utah • Simulation • Parsec – UCLA GAN/CCGrid 2003 Tokyo, Japan
EmuLab • Specialized cluster • 168 PCs (at the time) • Each with 5 x 100Mb ethernet interfaces connected to a programmable switch • Each experiment: • Has exclusive access to node subset • Can run NS scripts that program the switch to customize physical topology among the node subset • Can boot a custom OS • University of Utah, Network Emulation Testbed • http://www.emulab.net GAN/CCGrid 2003 Tokyo, Japan
EmuLab Experiments • Set of quasi-random, tree topologies generated • 4, 8, 16, 32, and 64 end-hosts • 9, 15, 22, 29, 34 interior service hosts • Trees topologies constrained to have an average 2.5 degree of connectedness • 550 LBTS calculations done for each topology • Just running the DRN algorithm GAN/CCGrid 2003 Tokyo, Japan
Example Graph: 32 end-hosts GAN/CCGrid 2003 Tokyo, Japan
LBTS Makespan on EmuLab (ms) GAN/CCGrid 2003 Tokyo, Japan
Parsec Experiments • Under EmuLab, both algos used same topology • Rebooting is time-consuming • NTP must settle down before experiment can begin • In real world, algos would use different paths • Simulations done with Parsec 1.1 • Using same tool, 50 random topologies generated • Shortest path routing determined for TM-Kit • Minimum latency spanning tree determined for DRN • 50 LBTS calculations done for each case • 100 Mb/sec w/ 1 ms per-hop latency GAN/CCGrid 2003 Tokyo, Japan
LBTS Makespan from Parsec (ms) GAN/CCGrid 2003 Tokyo, Japan
Questions, Questions, Questions • Clearly topology-aware communication wins • For the right application under the right conditions • What are the design issues? • What are the deployment issues? • What other capabilities are possible? • How many grid apps will really be able use it for a significant advantage? • Not many right now • But later? GAN/CCGrid 2003 Tokyo, Japan
Collective Operationsfor Message-Passing • We’ve just seen an example of min reduction • Over a variable number of elements! • Broadcasts • Typically requires multicast support • Topology-aware middleware could provide similar capability • Scatter/Gather • Scatter is similar to bcast • One-to-many communication but with different data • Aggregate messages get partitioned en route to destination • Barrier and Reductions • Split phase or fuzzy barrier/reductions could also be used with topology-aware middleware • Scans • Similar to a progressive reduction • Similar performance benefits possible GAN/CCGrid 2003 Tokyo, Japan
Communication Scope • Providing service on a per instance or per application basis greatly reduces problem size • Provides isolation for late composition of components with private communication needs • Such as MPI communicators • Issues • Creation, termination overhead & latency • Splitting, merging GAN/CCGrid 2003 Tokyo, Japan
Content-Based Networking • Content-Based Routing • Message-Passing with Associative Addressing • Requires an associative matching operation • A fundamental and powerful capability • Enables a number of very useful capabilities and services • Event services, resource discovery, coordination programming models • But notoriously expensive to implement • How can matching be done efficiently in a wide-area grid env? • Can users and apps find a “sweet-spot” where content-based routing is constrained enough to be practical and provide capabilities that can’t be accomplished any other way? • Scale of deployability GAN/CCGrid 2003 Tokyo, Japan
Implementation Challenges • Constrain/aggregate application comm space • Enhance scalability/deployability • Per application basis? Per instance basis? • Use of soft-state to enhance fault tolerance? • Effective use of network topology will often be an instance of the Steiner Tree Problem • For a graph with weighted edges, find a minimum weighted subgraph through a subset of the vertices • Practical topology construction • Current multicast group construction and routing techniques? • PIM Sparse and Dense Modes, i.e., single tree vs. tree per source • Network overlays? • IP tunnels among “active” sites managed via X-Bone • Peer-to-Peer group construction? • Hierarchical name space GAN/CCGrid 2003 Tokyo, Japan
Needed: A General Interface to Isolate Applications from the Details Applications API TACS Middleware Grid Services Any Impl. Any Deploy. “Hard-wired” Infrastructure GAN/CCGrid 2003 Tokyo, Japan
Conclusion • Topology-aware communication useful for many capabilities • Collective operations, associative operations, scope • Emulation and simulation results confirm expectations • Superior performance with as few as 8 end-hosts • Middleware layer seems to be most promising implementation approach at this time • Many outstanding issues… • Future Work • Develop prototype APIs • Evaluate with real apps on real grids GAN/CCGrid 2003 Tokyo, Japan