1 / 43

Software-defined Measurement

Software-defined Measurement. Minlan Yu University of Southern California. Joint work with Lavanya Jose, Rui Miao, Masoud Moshref , Ramesh Govindan , Amin Vahdat. Management = Measurement + Control . Accounting Count resource usage for tenants Traffic engineering

qabil
Download Presentation

Software-defined Measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software-defined Measurement Minlan Yu University of Southern California Joint work with LavanyaJose, Rui Miao, MasoudMoshref, Ramesh Govindan, Amin Vahdat

  2. Management = Measurement + Control • Accounting • Count resource usage for tenants • Traffic engineering • Identify large traffic aggregates, traffic changes • Understand flow characteristics (flow size, etc.) • Performance diagnosis • Why my application has high delay, low throughput?

  3. Yet, measurement is underexplored • Measurement is an afterthought in network device • Control functions are optimized w/ many resources • Limited, fixed measurement support with NetFlow/sFlow • Traffic analysis is incomplete and indirect • Incomplete: May not catch all the events from samples • Indirect: Offline analysis based on pre-collected logs • Network-wide view of traffic is especially difficult • Data are collected at different times/places

  4. Software-defined Measurement Controller Heavy Hitter detection Change detection 1 2 1 Configure resources Fetch statistics (Re)Configure resources • SDN offers unique opportunities for measurement • Simple, reusable primitives at switches • Diverse and dynamic analysis at controller • Network-wide view

  5. Challenges • Diverse measurement tasks • Generic measurement primitives for diverse tasks • Measurement library for easy programming • Limited resources at switches • New data structures to reduce memory usage • Multiplexing across many tasks

  6. Software-defined Measurement • OpenSketch • (NSDI’13) • DREAM • (SIGCOMM’14) • Sketch-based • commodity switch components • Flow-based OpenFlow TCAM • Data plane • Primitives • Optimization w/ Provable resource-accuracy bounds • Dynamic Allocation w/ Accuracy estimator Resource alloc across tasks • OpenSource • NetFPGA + Sketch library • networks of hardware switches and Open vSwitch Prototype

  7. Software-defined Measurement with Sketches(NSDI’13)

  8. Software Defined Networking Controller Configure devices and collect measurements API to the data plane (OpenFlow) Fields action counters Src=1.2.3.4drop, #packets, #bytes Rethink the abstractions for measurement Switches Forward/measure packets

  9. Tradeoff of Generality and Efficiency • Generality • Supporting a wide variety of measurement tasks • Who’s sending a lot to 23.43.0.0/16? • Is someone being DDoS-ed? • How many people downloaded files from 10.0.2.1? • Efficiency • Enabling high link speed (40 Gbps or larger) • Ensuring low cost (Cheap switches with small memory) • Easy to implement with commodity switch components

  10. NetFlow: General, Not Efficient • Cisco NetFlow/sFlow • Log sampled packets, or flow-level counters • General • Ok for many measurement tasks • Not ideal for any single task • Not efficient • It’s hard to determine the right sampling rate • Measurement accuracy depends on traffic distribution • Turned off or not even available in datacenters

  11. Streaming Algo: Efficient, Not General Data plane Control plane Query: 23.43.12.1 3 0 5 1 9 Hash1 # bytes from 23.43.12.1 5 3 4 0 1 9 3 0 Hash2 Hash3 1 2 0 3 4 Pick min: 3 • Streaming algorithms • Summarize packet information with Sketches • E.g. Count-Min Sketch, Who’s sending a lot to host A? • Not general:Each algorithm solves just one question • Require customized hardware or network processors • Hard to implement every solution in practice

  12. Where is the Sweet Spot? General Efficient NetFlow/sFlow (too expensive) Streaming Algo (Not practical) • OpenSketch • General, and efficient data plane based on sketches • Modularized control plane with automatic configuration

  13. Flexible Measurement Data Plane • Picking the packets to measure • Hashes to represent a compact set of flows • A set of blacklisting IPs • Classify flows with different resources/accuracy • Filter out traffic for 23.43.0.0/16 • Storing and exporting the data • A table with flexible indexing • Complex indexing using hashes and classification • Diverse mappings between counters and flows

  14. A three-stage pipeline 3 0 5 1 9 Hash1 # bytes from 23.43.12.1 0 1 9 3 0 Hash2 Hash3 1 2 0 3 4 • Hashing: A few hash functions on packet source • Classification: based on hash value or packets • Counting: Update a few counters with simple calc.

  15. Build on Existing Switch Components • A few simple hash functions • 4-8 three-wise or five-wise independent hash functions • Leverage traffic diversity to approx. truly random func. • A few TCAM entries for classification • Match on both packets and hash values • Avoid matching on individual micro-flow entries • Flexible counters in SRAM • Many logical tables for different sketches • Different numbers and sizes of counters • Access counters by addresses

  16. Modularized Measurement Libarary • A measurement library of sketches • Bitmap, Bloom filter, Count-Min Sketch, etc. • Easy to implement with the data plane pipeline • Support diverse measurement tasks • Implement Heavy Hitters with OpenSketch • Who’s sending a lot to 23.43.0.0/16? • count-min sketch to count volume of flows • reversible sketch to identify flows with heavy counts in the count-min sketch

  17. Support Many Measurement Tasks

  18. Resource management • Automatic configuration within a task • Pick the right sketches for measurement tasks • Allocating resources across sketches • Based on provable resource-accuracy curves • Resource allocation across tasks • Operators simply specify relative importance of tasks • Minimizing weighted error using convex optimization • Decompose to optimization problem of individual tasks

  19. OpenSketch Architecture

  20. Evaluation • Prototype on NetFPGA • No effect on data plane throughput • Line speed measurement performance • Trace Driven Simulators • OpenSketch, NetFlow, and streaming algorithm • One-hour CAIDA packet traces on a backbone link • Tradeoff between generality and efficiency • How efficient is OpenSketch compared to NetFlow? • How accurate is OpenSketch compared to specific streaming algorithms?

  21. Heavy Hitters: false positives/negatives • Identify flows taking > 0.5% bandwidth OpenSketchrequires less memory with higher accuracy

  22. Tradeoff Efficiency for Generality In theory, OpenSketch requires 6 times memory than complex streaming algorithm

  23. OpenSketch Conclusion • OpenSketch: • Bridging the gap between theory and practice • Leveraging good properties of sketches • Provable accuracy-memory tradeoff • Making sketches easy to implement and use • Generic support for different measurement tasks • Easy to implement with commodity switch hardware • Modularized library for easy programming

  24. Dynamic Resource AllocationFor TCAM-based MeasurementSIGCOMM’14

  25. SDM Challenges Many Management tasks Controller Heavy Hitter detection Change detection Heavy Hitter detection Heavy Hitter detection H Dynamic Resource Allocator 1 2 1 Configure resources Fetch statistics (Re)Configure resources Limited resources (TCAM)

  26. Dynamic Resource Allocator Recall= detected true HH/all • Diminishing return of resources • More resources make smaller accuracy gain • More resources find less significant outputs • Operators can accept an accuracy bound <100%

  27. Dynamic Resource Allocator Recall= detected true HH/all • Temporal and spatial resource multiplexing • Traffic varies over time and switches • Resource for an accuracy bound depends on Traffic

  28. Challenges • No ground truth of resource-accuracy • Hard to do traditional convex optimization • New ways to estimate accuracy on the fly • Adaptively increase/decrease resources accordingly • Spatial & temporal changes • Task and traffic dynamics • Coordinate multiple switches to keep a task accurate • Spatial and temporal resource adaptation

  29. Dynamic Resource Allocator Controller Heavy Hitter detection Change detection Heavy Hitter detection Heavy Hitter detection H Estimated accuracy Estimated accuracy Allocated resource Allocated resource Dynamic Resource Allocator • Decompose the resource allocator to each switch • Each switch separately increase/decrease resources • When and how to change resources?

  30. Per-switch Resource Allocator: When? Controller Detected HH: 14 out of 30 Global accuracy=47% Heavy Hitter detection Detected HH:5 out of 20 Local accuracy=25% Detected HH:9 out of 10 Local accuracy=90% A B • When a task on a switch needs more resources? • Based on A’s accuracy (25%) is not enough • if bound is 40%, no need to increase A’s resources • Based on the global accuracy (47%) is not enough • if bound is 80%, increasing B’s resources is not helpful • Conclusion: when max(local, global) < accuracy bound

  31. Per-Switch Resource Allocator: How? • How to adapt resources? • Take from rich tasks, give to poor tasks • How much resource to take/give? • Adaptive change step for fast convergence • Small steps close to bound, large steps otherwise

  32. Task Implementation Controller Heavy Hitter detection Change detection Heavy Hitter detection Heavy Hitter detection H Estimated accuracy Estimated accuracy Allocated resource Allocated resource Dynamic Resource Allocator 1 1 2 (Re)Configure resources Fetch statistics Configure resources

  33. Flow-based algorithms using TCAM New 36 Current *** 26 10 0** 1** 12 14 5 5 00* 01* 10* 11* 111 001 011 101 5 7 12 2 0 5 2 3 010 110 000 100 • Goal: Maximize accuracy given limited resources • A general resource-aware algorithm • Different tasks: e.g., HH, HHH, Change detection • Multiple switches: e.g., HHs from different switches • Assume: Each flow is seen at one switch (e.g., at sources)

  34. Divide & Merge at Multiple Switches New: A:00*, B:00*,01*, C:01* Current: A:0**, B:0**, C:0** 26 0** {A,B,C} {A,B} {B,C} 12 14 00* 01* • Divide: Monitor children to increase accuracy • Requires more resources on a setof switches • Example: Needs an additional entry on switch B • Merge: Monitor parent to free resources • Each node keeps the switch set it frees after merge • Finding the least important prefixes to merge is the minimum set cover problem

  35. Accuracy Estimation: Heavy Hitter Detection 76 *** 26 50 0** 1** 12 14 15 35 00* 01* 10* 11* Threshold=10 111 001 011 101 At level 2 missed <=2 HH 5 7 12 2 0 15 20 15 With size 26 missed <=2 HHs 010 110 000 100 • Any monitored leaf with volume > threshold is a true HH • Recall: • Estimate missing HHs using volume and level of counter

  36. DREAM Overview • Task type (Heavy hitter, Hierarchical heavy hitter, Change detection) • Task specific parameters (HH threshold) • Packet header field (source IP) • Filter (srcIP=10/24, dstIP=10.2/16) • Accuracy bound (80%) Prototype Implementation with DREAM algorithms on Floodlight and Open vSwitches 1) Instantiate task 2) Accept/Reject 5) Report 7) Allocate / Drop Task object 1 Task object n Resource Allocator 6) Estimate accuracy DREAM 4) Fetch counters SDN Controller 3) Configure counters

  37. Evaluation • Evaluation Goals • How accurate are tasks in DREAM? • Satisfaction: Task lifetime fraction above given accuracy • How many more accurate tasks can DREAM support? • % of rejected/dropped tasks • How fast is the DREAM control loop? • Compare to • Equal: divide resources equally at each switch, no reject • Fixed: 1/nresources to each task, reject extra tasks

  38. Prototype Results DREAM: High satisfaction for avg & 5th % of tasks with low rejection Mean 5th % Equal: only keeps small tasks satisfied Fixed: High rejection as over-provisions for small tasks 256 tasks (various task types) on 8 switches

  39. Prototype Results DREAM: High satisfaction for avg & 5th % of tasks at the expense of more rejection Equal & Fixed: only keeps small tasks satisfied

  40. Control Loop Delay Allocation delay is negligible vs. other delays Incremental saving lets reduce save delay

  41. DREAM Conclusion • Challenges with software-defined measurement • Diverseand dynamic measurement tasks • Limited resources at switches • Dynamic resource allocation across tasks • Accuracy estimators for TCAM-based algorithms • Spatial and temporal resource multiplexing

  42. Summary • Software-defined measurement • Measurement is important, yet underexplored • SDN brings new opportunities to measurement • Time to rebuild the entire measurement stack • Our work • OpenSketch:Generic, efficient measurement on sketches • DREAM: Dynamic resource allocation for many tasks

  43. Thanks!

More Related