1 / 31

DREAM: D ynamic Re source A llocation for Software-defined M easurement

DREAM: D ynamic Re source A llocation for Software-defined M easurement. (SIGCOMM’14). Masoud Moshref , Minlan Yu, Ramesh Govindan, Amin Vahdat. Measurement is Crucial for Network Management. Tenant:. Netflix. Expedia. Reddit. Management:. Traffic Engineering. Accounting.

Download Presentation

DREAM: D ynamic Re source A llocation for Software-defined M easurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DREAM: Dynamic Resource Allocation for Software-defined Measurement (SIGCOMM’14) Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat

  2. Measurement is Crucial for Network Management Tenant: Netflix Expedia Reddit Management: Traffic Engineering Accounting Anomaly Detection Accounting Failure Detection Traffic Engineering Measurement: Heavy Hitter detection Change detection Change detection Heavy Hitter detection Change detection Heavy Hitter detection Network:

  3. High Level Contribution: Flexible Measurement Management: Users dynamically instantiate complex measurements on network state Measurement: DREAM supports the largest number of measurement tasks while maintaining measurement accuracy, by dynamically leveraging tradeoffs between switch resource consumption and measurement accuracy Network: We leverage unmodified hardware and existing switch interfaces

  4. Prior Work: Software Defined Measurement (SDM) Controller Heavy Hitter detection Change detection 1 2 3 Fetch counters Update rules Install rules #Bytes=1M Source IP: 10.0.1.128/30 Source IP: 10.0.1.130/31 #Bytes=5M Source IP: 55.3.4.34/31 Source IP: 55.3.4.32/30

  5. Our Focus: Measurement Using TCAMs Existing OpenFlow switches use TCAMs which permit counting traffic for a prefix Focus on TCAMs enables immediate deployability Prior work has explored other primitives such as hash-based counters

  6. Challenge: Limited TCAM Memory Find source IPs sending > 10Mbps 31 Controller 26 5 Heavy Hitter detection 13 13 2 3 11 00 01 10 00 13MB 01 13MB 2 1 Fetch counters Install rules 10 3MB 11 2MB Problem: Requires too many TCAMs 64K IPs to monitor a /16 prefix >> ~4K TCAMs at switches

  7. Reducing TCAM Usage Monitor internal nodes to reduce TCAM usage 31 31 26 26 5 5 Monitoring 1* is enough because a node with size 5 cannot have leaves >10 13 13 13 13 2 2 3 3 00 00 01 01 10 10 11 11

  8. Challenge: Loss of Accuracy Fixed configuration misses heavy hitters as traffic changes 31 26 5 39 9 30 13 13 2 3 4 5 15 15 Missed heavy hitters

  9. Dynamic Configuration to Avoid Loss of Accuracy Find leaves >10Mbps using 3 TCAMs 39 9 30 4 5 15 15 Monitor children to detect HHs but using 2 TCAMs Merge Divide 39 Monitor parent to save a TCAM 9 30 4 5 15 15

  10. Reducing TCAM Usage: Temporal Multiplexing Required TCAM changes over time Task 1 Task 2 # TCAMs Required Time

  11. Reducing TCAM Usage: Spatial Multiplexing Required TCAMs varies across switches Switch A Switch B # TCAMs Required Time Only needs more TCAMs at switch A

  12. Reducing TCAM Usage: Diminishing Returns 7% Accuracy Bound 12% Can accept an accuracy bound <100% to save TCAMs

  13. Key Insight Leverage spatial and temporal multiplexing and diminishing returns to dynamically adapt the configuration and allocation of TCAM entries per task to achieve sufficient accuracy

  14. DREAM Contributions System Supports concurrent instances of three task types: Heavy Hitter, Hierarchical HH and Change Detection Algorithm Dynamicallyadapts tasks TCAM allocations andconfiguration over time and acrossswitches, while maintainingsufficient accuracy Evaluation Significantly outperforms fixed allocation and scales well to larger networks

  15. DREAM Tasks Anomaly detection Traffic engineering Network provisioning Accounting Management Network visualization DDoS detection Heavy Hitter detection Hierarchical HH detection Measurement Change detection DREAM Network

  16. DREAM Workflow • Task type • Task parameters • Task filter • Accuracy bound Instantiate task Report Task Instance 1 TCAM Allocation and Configuration Task Instance n DREAM SDN Controller Fetch counters Configure counters

  17. Algorithmic Challenges Dynamically adapts tasks TCAM allocations and configurationover time and across switches, while maintaining sufficient accuracy Dynamically adapts tasks TCAM allocationsand configuration over time and across switches, while maintaining sufficient accuracy Dynamically adapts tasks TCAM allocations and configuration over time and across switches, while maintaining sufficient accuracy Dynamically adapts tasks TCAM allocationsand configuration over time and across switches, while maintaining sufficient accuracy allocations allocations configuration switches switches sufficient accuracy How to allocateTCAMs for sufficient accuracy? Diminishing Return Which switchesto allocate? Temporal Multiplexing How to adapt TCAM configurationon multiple switches? Spatial Multiplexing

  18. Dynamic TCAM Allocation Allocate TCAM Estimate accuracy Measure Why iterative approach? Enough TCAMs  High accuracy  Satisfied We cannot know the curve for every traffic and task instance     Thus we cannot formulate a one-shot optimization Not enough TCAMs  Low accuracy  Unsatisfied Why estimating accuracy? We don’t have ground-truth Thus we must estimate accuracy

  19. Estimate Accuracy: Heavy Hitter Detection True detected HH True detected + Missed HHs True detected HH Detected HHs Recall = Precision = Is 1 because any detected HH is a true HH Estimate missed HHs

  20. Estimate Recall for Heavy Hitter Detection True detected HH True detected + Missed HHs Recall = Find an upper bound of missed HHs using size and level of internal nodes Threshold=10Mbps 76 With size 26: missed <=2 HHs At level 2: missed <=2 HH 26 50 12 14 15 35 5 7 12 2 0 15 20 15

  21. Allocate TCAM Goal: maintain high task satisfaction Fraction of task’s lifetime with sufficient accuracy How many TCAMs to exchange? Accuracy Accuracy Small  Slow convergence Large  Oscillations Time Time

  22. Avoid Overloading Not enough TCAMs to satisfy all tasks Solutions Reject new tasks Drop existing tasks

  23. Algorithmic Challenges Dynamically adapts tasks TCAM allocations and configurationover time and across switches, while maintaining sufficient accuracy How to allocate TCAMs for sufficient accuracy? Diminishing Returns Which switchesto allocate? Temporal Multiplexing How to adapt TCAM configuration on multiple switches? Spatial Multiplexing

  24. Allocate TCAM: Multiple Switches A task can have traffic from multiple switches Controller 30 HHs Heavy Hitter detection 20 HHs 10 HHs A B • Local accuracy is important • If a task is globally unsatisfied, increasing B’s TCAMs is expensive (diminishing returns) • Global accuracy is important • If a task is globally satisfied, no need to increase A’s TCAMs Use both local and global accuracy

  25. DREAM Modularity Task Independent Task Dependent TCAM Configuration: Divide & Merge Accuracy Estimation TCAM Allocation DREAM

  26. Evaluation: Accuracy and Overhead Accuracy Satisfaction of a task: Fraction of task’s lifetime with sufficient accuracy % of rejected/dropped tasks Overhead How fast is the DREAM control loop?

  27. Evaluation: Alternatives Equal: divide TCAMs equally at each switch, no reject Fixed: fixed fraction of TCAMs, reject extra tasks

  28. Evaluation Setting • Prototype on 8 Open vSwitches • 256 tasks (HH, HHH, CD, combination) • 5 min tasks arriving in 20 mins • Accuracy bound=80% • 5 hours CAIDA trace • Validate simulator using prototype • Large scale simulation (4096 tasks on 32 switches) • accuracy bounds • task loads (arrival rate, duration, switch size) • tasks (task types, task parameters e.g., threshold) • # switches per tasks

  29. Prototype Results: Average Satisfaction DREAM: High satisfaction of tasks at the expense of more rejection for small switches Average Satisfaction # TCAMs in Switch # TCAMs in Switch Fixed: High rejection as over-provisions for small tasks

  30. Prototype Results: 95th Percentile Satisfaction DREAM: High 95th percentile satisfaction 95th Percentile Satisfaction # TCAMs in Switch Equal and Fixed only keep small tasks satisfied

  31. Conclusion Measurement is crucial for SDN management in a resource-constrained environment • Dynamic TCAM allocation across measurement tasks • Diminishing returns in accuracy • Spatial and temporal multiplexing • Future work • More TCAM-based measurement tasks (quintiles for load balancing, entropy detection) • Hash-based measurements DREAM is available at github.com/USC-NSL/DREAM

More Related