120 likes | 255 Views
GSS 2012 USC/ISI. IMC 2012 Boston, USA. Scenario Problem Statement Current Practice Improvements. Reducing Allocation Errors in Network Testbeds. National Science Foundation Grant No. 1049758. Jelena Mirkovi c Hao Shi Alefiya Hussain. *. +. Overview. Scenario
E N D
GSS 2012 USC/ISI IMC 2012 Boston, USA Scenario Problem Statement Current Practice Improvements Reducing Allocation Errors in Network Testbeds National Science Foundation Grant No. 1049758 Jelena Mirkovic Hao Shi Alefiya Hussain
* + Overview • Scenario • What is a testbed and how people use it? • Problem Statement • Emulab-based practice • Allocation Errors • A great portion can be avoided • Improvement • Deterministic-search based method
* + Scenario – an user case • How people launch multiple experiment instances in testbed
+ Scenario – features of resources • Limited quantities (until Jan 2011) • Heterogeneity: none of them has absolute advantages • Network Testbed Mapping Problem • how to allocate resources efficiently?
* + Problem Statement – Illustration • Network Testbed Mapping Problem
+ Problem Statement – Goals/Challenges • Economize inter-switch bandwidth • Accommodate heterogeneous nodes • Maximize possibility for future mappings • Generate one solution in a timely fashion
* + Current Practice – Emulab’s Algorithm (assign) • Simulated Annealing • A heuristic that performs a cost-function-guided exploration • Starts from a random solution and scores it using a cost function • Perturbs the solution using a generation function to find next one • If better: accept • If worse: accept with small possibility controlled by temperature • Cooling schedule converges algorithm to a single “best” solution • No guarantee that the best solution can be found
+ Current Practice – Performance • Allocation Errors • 11,176 TEMP errors (out of a total of 24,206 errors) • A huge space to improve!
* + Our Strategy – assign+ • Deterministic fashion • Explore 5 possible solution spaces using expert knowledge of possible network testbed architecture • 1) PART: minimizes partitions in the virtual topology • 2) SCORE: minimizes the score of the allocation strategy • 3) ISW: prefers physical machine classes (pclasses) that have high-bandwidth inter-switch links • 4) PREF: prefers pclassesthat share a switch with pclasses, which host neighbors of the allocating node • 5) FRAG: tries to use the smallest number of pclasses • Choose the solution with lowest inter-switch bandwidth as best
+ Our Strategy – Evaluation • Reconstruct DeterLab state on Jan 1, 2011 • Use virtual topology and state snapshot data from file system • hardware types, OS supported, switch connectivity, … • 255 available machines in the pool • Replay all successful and failed allocations in 2011 • start time, end time, experiment size, … • Failed allocations: generate their duration based on past successful distribution • Keep only the first instance if overlapping
+ Our Strategy – Performance • Allocation failure rates and Running time
* + Other key components in the paper • Relaxing virtual topology requirements can get better results • OS, node type, hardware, … • Most testbed usage patterns show heavy-tail distributions • experiment sizes, duration, … • due to human dynamics based on priorities • Potential improvements for allocation policy • Take-a-Break: release a long-running instance and queue it • Borrow-and-Return: borrow from long-running instance for 4 hours • For more details: • http://www-net.cs.umass.edu/imc2012/papers/p495.pdf • http://steel.isi.edu/TestbedUsageData