520 likes | 722 Views
B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, A. Joglekar. An integrated Experimental Environment for Distributed Systems and Networks. Presented by Sunjun Kim Jonathan di Costanzo 2009/04/13. Outline. Motivation Netbed structure
E N D
B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, A. Joglekar An integratedExperimentalEnvironment for Distributed Systems and Networks Presented by Sunjun Kim Jonathan di Costanzo 2009/04/13
Outline Motivation Netbed structure Validation and testing Netbed contribution Conclusion
Outline Motivation Netbed structure Validation and testing Netbed contribution Conclusion
Background • Researchersneed a platform in whichtheycandevelop, debug, and evaluate their systems • One labis not enough, lack of resources • Need more computers • Scalability in terms of distance and number of nodescan’tbereached • Requires a hugeamount of time to develop large scaleexperiments
Previousapproaches • Simulation: NS • Live networks: PlanetLab • Emulation: Dummynet, NSE controlled, repeatable environment Loses accuracy due to abstraction • Achieves realism Not easy to repeat the experimentagain controlled packet loss and delay • Manual configuration is boring
Netbedideas • Derives from “Emulab Classic” • A universally-available time- and space-shared network emulator • Automatic configuration from NS script • Add Virtual topologies for network experimentations • Integrates simulation, emulation, and live-network with wide-area nodes experimentation in a single framework
Netbed goals • Accuracy • Provide artifact-free environment • Universality • Anyonecan use anything the wayhewants conservative policy for the resource allocation • No multiplexing (virtual machine) • The resource of one node can be fully utilized
Resources • Local-Area Resources • Distributed Resources • Simulated Resources • Emulated Resources • WAN emulator (integratedyet) • PlanetLab • ModelNet (still in work)
Outline Motivation Netbed structure Validation and testing Netbed contribution Conclusion
Netbed structure Resource Life cycle
Local-Area resources • 3 clusters • 168 in Utah, 48 PCs in Kentucky & 40 in Georgia • Eachnodecanbeused as • Edgenode, router, traffic-shapingnode, trafficgenerator • Exclusivity of a machine during an experiment • The OS isgiven but entirelyreplaceable
Distributed resources • Alsocalledwide-area resources • 50-60 nodes in approximatively 30 sites • providescharacteristic live network • Very few nodes • Thesenodes are sharedbetweenmanyusers • FreeBSDJailmechanism (kind of Virtual machine) • Non-rootaccess
Simulated resources • Based on nse (NS-emulation) • Enables interaction with real traffics • Providesscalabilitybeyondphysical resources • Manysimulatednodescanbemultiplexed
Emulated resources • VLANs • Emulatewide-area links within a local-area • Dummynet • Emulates queue & bandwidth limitation , introducing delays and packet loss betweenphysicalnodes • nodes act as Ethernet bridges • transparent to experimental traffic
Netbed structure Resource Life cycle
A B DB Life cycle Global Resource Allocation Node Self-Configuration Experiment Control Specification Swap Out Parsing Swap In $ns duplex-link $A $B 1.5Mbps 20ms A B A B
AccessingNetbed • Experimentcreation • A project leader propose a project on the web • A netbed staff accept or reject the project • All the experimentwillbe accessible from the web • Experimentmanagment • Log on allocatednodes or on the usershost (fileserver) • The fileserversend the OS images, home and project directories to the othernodes
Specification • Experimenters use ns scripts withTcl • can do as manyfunctions & loops as theywant • Netbeddefines a small set of ns extension • Possibility of chosing a specfic hardware • simultation, emulation, or real implementation • Program objectscanbedefinedusing a Netbed-specificns extension • Possibility of usinggraphical UI
Parsing • Front-end Tcl/ns parser • Recognizessubset of ns relevant to topology & trafficgeneration • Database • Store an abstraction of everything about the exeriment • Fixedgeneratedevents • Information about Hardwares , users & experiments • procedures
Global Resource Allocation • Binds abstractions from the database to physical or simulatedentities • Best effort to match withspecifications • On-demand allocations (no reservations) • 2 differentalgorithms for local and distributednodes (differentconstraints) • Simulatedannealing • Geneticalgorithm
Global Resource Allocation • Over-reservation of the bottleneck • inter-switchbandwithis to small (2 Gbps) • Againsttheir conservative policy • Dynamic changes of the topology are allowed • Add and removenodes • Consistent namingacrossinstantiations • Virtualization of IP addresses and host names
NodeSelf-Configuration • Dynamiclinking and loadingfrom the DB • Let have the propercontext (hostname, disk image, script to start the experiment) • No persistent configuration states • Only volatile memory on the node • If requiered, the current soft state canbestored in the DB as a hard state • Swap out / Swap in
NodeSelf-Configuration • Local Nodes • All nodes are rebooted in parallel • Contact the masterhostwhichloads the kerneldirected by the database • A second level boot mayberequiered • Distributednodes • Boot from a CD-ROM then contact the masterhost • A new FreeBSDJailisinstantiated • Tested Master Control Client
Experiment Control • Netbed supports dynamicexperiment control • Start, stop and resumeprocesses, trafficgenerators and network monitors • Signalsbetweennodes • Used of a Publish/Subscribeeventrouting system • The staticevents are retrievedfrom the DB • Dynamics events are possible
Experiment Control • ns configuration files isonlyhigh-level control • Experimenterscan made somelow-levelcontrols • On local node: rootprivileges • Kernel modification & access to raw sockets • On distributed: Jail-restrictedrootprivileges • Access to raw socket with a specific IP address • Each local node support separated network isolatedfrom the experimental one • Enable to control a node via a tunnel as wewhere on itwithoutinterfering
Preemptionand Scheduling • Netbedtry to preventidling • 3 metrics: traffic, use of pseudo-terminal devices & CPU loadaverage • To be sure, a message is sent to the user whocandisapprovemanually • A challenge for distributednodeswithseveralJails • Netbed proposes automated batch experiments • When no interaction isrequired • Enables to wait for available resources
Outline Motivation Netbed structure Validation and testing Netbed contribution Conclusion
Validation • 1st row : emulation overhead • Dummynet gives better results than nse
Validation • They expect to have better results with future improvements of nse
Validation • 5 nodes are communicating with 10 links • Evaluation of a derivative of DOOM • Their goal is to sent 30 tics/sec
Testing • Challenges • Depends on physical artifacts (cannot be cloned) • Should evaluate arbitrary programs • Must run continuoustly • Minibed: 8 separated Netbed nodes • Test mode: prevent hardware modifications • Full-test mode: provides isolated hardware
Outline Motivation Netbed structure Validation and testing Netbed contribution Conclusion
Practical benefits • All-in-one set of tools • Automated and efficient realization of virtual topologies • Efficient use of resources through time-sharing and space-sharing • Increase of fault-tolerance (resource virtualization)
Practical benefits • Examples • The “dumbbell” network • 3h15 --> 3 min • Improvement in the utilization of a scarce and expensive infrastructure: 12 months & 168 PC in Utah • Time-sharing (swapping): 1064 nodes • Space-sharing (isolation): 19,1 years • Virtualization of name and IP addresses • No problem with the swappings
Key services • Experiment creation and swapping • Mapping • Reservation • Reboot issuing • Reboot • Miscellaneous • Double time to boot on a custom disk image
Key services • Mapping local resources: assign • Match the user’s requirements • Based on simulated annealing • Try to minimizes the number of switch and inter-switch bandwidth • Less than 13 seconds
Key services • Mapping local resources: assign
Key services • Mapping distributed resources: wanassign • Different constraints • Fully connected via the internet • “Last mile”: type instead of topology • Specific topologies may be guaranteed by requesting particular network characteristics (bandwidth, latency & loss) • Based on a genetic algorithm
Key services • Mapping distributed resources: wanassign • 16 nodes 100 edges : ~1sec • 256 nodes & 40 edges/nodes : 10min~2h
Key services • Disk reloading • 2 possibilities • complete disk image loading • incremental synchronization (hash tables on files or blocks) • Good • Faster (in their specific case) • No corruption • Bad • Waste of time when similar images are needed repeatly • Pace reloading of freed node (reserved for 1 user)
Key services • Disk reloading • Frisbee • Performance techniques: • Uses a domain-specific algorithm to skip unused blocks • Delivers images via a custom reliable multicast protocol • 117 sec for 80 nodes, write 550MB instead of 3GB
Key services • Scaling of simulated resources • Simulated nodes are multiplexed on 1 physical node • Must deal with real time taking into account the user’s specification : rate of events • Test of a live TCP at 2Mb CBR • 850MHz PC with UDP background 2Mb CBR / 50ms • Able to have 150 links for 300 nodes • Problem of routing in very complex topologies
Example of a new possibility • Possibility to program different batch experiment, with the modification of only 1 parameter by 1 • The Armada file system from Oldfield & Kotz • 7 bandwidths x 5 latencies x 3 application settings x 4 configs of 20 nodes • 420 tests in 30 hrs (4.3 min ~ per experiment)
Outline Motivation Netbed structure Validation and testing Netbed contribution Conclusion
Summary • Netbed deals with 3 test environments • Reuse of ns script • Quick setup of the test environment • Virtualization techniques provide the artifact-free environment • Enables qualitatively new experimental techniques