290 likes | 435 Views
Emulab and its lessons and value for A Distributed Testbed. Jay Lepreau University of Utah March 18, 2002. What?. A configurable Internet emulator in a room Today: 168+160 nodes, 1646 cables, 4x BFS (switch) virtualizable topology, links, software
E N D
Emulaband its lessons and value forA Distributed Testbed Jay Lepreau University of Utah March 18, 2002
What? • A configurable Internet emulator in a room • Today: 168+160 nodes, 1646 cables, 4x BFS (switch) • virtualizable topology, links, software • Bare hardware with lots of tools:Management Software • An instrument for experimental CS research • Universally available to any remote experimenter • Simple to use
Points • Programmable, automated mgmt, complete virtualization: • Qualitatively new environment • Most of it will work in wide area
New Stuff • Integrated event system • Underlying pub/sub system • Integrated into ‘ns’ (statically scheduled) • Start/stop programs • Replayable • Dynamic events • User-accessible • Traffic generation • Automatic, from ns script • New generators: • TG (tcp, udp) • ‘nse’ with udp, tcp, ftp, telnet
New Stuff (cont’d) • 4 node types: • Real, running in the local rack, controlled env. • Real, running ‘nse’ • [Simulated] • [Real, in wide-area] • Link configuration and monitoring • Latency, bw, plr, RED, queue size • Link monitoring and capture • GUI network config applet • Full-day SIGCOMM tutorial Aug’02
Sharks Sharks PC Internet Web/DB/SNMP Switch Mgmt Users PowerCntl Control Switch/Router Serial PC 168 160 “Programmable Patch Panel”
Fundamental Leverage: • Extremely Configurable • Easy to Use • Power • Performance • Virtualization
Key Design Aspects • Allow experimenter complete control • Configurable link bandwidth, latency, and loss rates, via transparently interposed “traffic shaping” nodes that provide WAN emulation • … but provide fast tools for common cases • OS’s, state mgmt tools, IP, batch, ... • Disk loading – 6GB disk image FreeBSD+Linux • Unicast tool: 88 seconds to load • Multicast tool: 40 nodes simultaneously in < 5 minutes • Virtualization • of all experimenter-visible resources • node names, network interface names, network addrs • Allows swapin/swapout, easily scriptable
Key Design Aspects (cont’d) • Flexible, extensible, powerful allocation algorithm • Matches desired “virtual” topology to currently available physical resources • Persistent state maintenance: • none on nodes, all in database • work from known state at boot time • Familiar, powerful, extensible configuration language: ns • Separate, isolated control network
Lessons for wide area testbed • Central control: at this scale (1000s) it’s easy • Database! • Control node for each site: great benefits, cheap marginal cost • Trusted, firewall, local disk cache, power control, console line • Ease of use is dominant driver
Lessons… • Generalized resource alloc/mapping algorithm is great (eg, vs Grid) • Get it going quickly, keep it going while add new stuff • Like a startup • Use feedback and demand • 2.5 years in • Simple authorization model • Most of our model and code will work in wide-area
Lessons… • Freedom for users is freedom for the management software and people “You’ve got root, use it.” Over-provision FreeBSD Jail, or Eclipse/BSD, or VMWare, or ….
Testing is tricky • Have real hardware that can’t virtualize • Test suite part of build • Clone DB works some… • 8-node minibed • Nightly regression testing • Schema evolution script/diff/check • Developers use/test 3 diff. browsers
Code Base Today • 24,100 Web front end • 23,900 Back end • 2000 ns front end • 4200 Resource mapping • 4900 Diskimg compression/casting/load • 8400 Scripts/daemons from nodes to DB • 5000 Event system • 6200 Remote console interaction/logging • 3300 Regression testing harness and tests • 700 Node health monitoring • 3700 Documention of internals
More stats • 21 “programs” • 318 “scripts” (including 90 php scripts, 71 small boot-time scripts) • 35% Perl • 32% C • 19% php • 12% html, Java, tcl, other
The Database Today • Started with ~18 tables • 54 tables, 413 columns • General categories • Physical world: 11 tables, 65 cols • Virtual world: 7 tables, 83 cols • Operational state: 22 tables, 180 cols • Admin data: 14 tables, 85 cols • Note how much operational state shows how much work needs to be done
Testbed Users • 30 active projects • more registered • 25 External • About 40/30/30%dist sys/activenets/traditional networking • ~110 users • 990 “experiments” in last 8 months • 7.5/day recently • 40% testbed development
More Sites • More emulab’s under construction: • Kentucky • Umass • Duke, CMU, Cornell, Stuttgart • Others stated intent:MIT, WUSTL, Princeton, HPLabs, Intel/UCB, Mt. Holyoke, …
Federation heteregeneous sites resource allocation Wireless nodes, mobile nodes IXP1200 nodes, tools, code fragments Routers, high-capacity shapers Simulation/emulation transparency Event system Scheduling system Topology generation tools and GUI Data capture, logging, visualization tools Microsoft OSs, high speed links, more nodes! Ongoing and Future Work
A Global-scale Testbed • Federation key • Bottom-up “organic” growth • Local autonomy and priority • Existing hardware resources • Provides diverse hardware • PCs • Wireless, mobile • Real routers, switches (Wisconsin, …) • Network processors (IXP’s) • Research switches (WUSTL) • But, top-down is much easier: a good start
NSF ITR Proposal (Nov 01) • Global-scale testbed • Utah primary • Research emphasis: software component for heterogeneity; resource allocation/mapping • Collaborators: • Brown, co-PI (resource allocation) • MIT (RON overlay, wireless) • Duke (ModelNet muxing, early adopter) • Mt. Holyoke (education)
Types of Sites • High-end facilities • Generic clusters • Generic labs • “Virtual machines” • Internet2 links between some sites
Result… • Loosely coupled distributed system • Controlled isolation • “Internet Petri Dish”
New Stuff: Extending to Wireless and Mobile Problems with existing approaches • Same problems as wired domain • But worse (simulation scaling, ...) • And more (no models for new technologies, ...)
Available for universities, labs, and companies, for research and teaching, at:www.emulab.net
A Few Research Issues and Challenges • Network management of unknown and untrusted entities • Security (root!) • Scheduling of experiments • Calibration, validation, and scaling • Artifact detection and control • NP-hard virtual --> physical mapping problem • Providing a reasonable user interface • ….
How To Use It ... • Submit ns script or GUI via web form • Behind the scenes: • Generates config from script & stores in DB • Maps specified virtual topology to physical nodes • Allocate resources • Provides user accounts for node access • Assigns IP addresses and host names • Configures VLANs • Loads disks, reboots nodes, configures Oss • Starts event system, traffic generators, link monitoring/control • Yet more odds and ends ... • User does his/her experiment • [Reports results if batch] • Takes ~3 min to set up 25 nodes, 5 secs/node
An “Experiment” • emulab’s central operational entity • Directly generated by an ns script, • … then represented entirely by database state • Steps: Web, compile ns script, map, allocate, provide access, assign IP addrs, host names, configure VLANs, load disks, reboot, configure OS’s, run, report