1 / 47

Empowering Sketches with Machine Learning for Network Measurements

This paper explores optimizing network flow measurement accuracy using machine learning frameworks on sketches. It discusses techniques, implementations, and case studies to enhance estimation, detection, and classification of flow attributes. The research aims to address existing challenges in scalability and efficiency for network management needs.

anicole
Download Presentation

Empowering Sketches with Machine Learning for Network Measurements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Empowering Sketches with Machine Learning for Network Measurements Tong Yang, LunWang Peking University, China YulongShen, Xidian University, China Muhammad Shahzad, North Carolina State University, USA Qun Huang, ICT, CAS, China Xiaohong Jiang, Future University Hakodate, Japan Kun Tan, Huawei, China Xiaoming Li, Peking University, China Tong Yang, Peking University yangtongemail@gmail.com http://net.pku.edu.cn/~yangtong

  2. Outline PART 01 PART 02 PART 03 PART 04 PART 06 Background Machine Learning Framework Implementations Case Studies Conclusion PART 05 Experimental results

  3. 01 PART ONE Background

  4. 01 Background • Network management needs • Accurate and timely estimate of flow-level metrics. • 1) Flow size • 2) Top-k elephant/large flows • 3) Cardinality: # flows • Best technique: Sketch • 1) Memory efficient 2) Constant and Fast speed • 3) High accuracy

  5. 01 Background • Existing solutions focus on a trade-off among • 1) accuracy of estimates • 2) the memory usage • 3) and speed • A great number of works on sketches • 1) flow size: CM, CU, Count, CSM, Pyramid, TCM • 2) top-k: sketch+heap, SS, Hashpipe, CSS, Cold filter • 3) Cardinality: bitmap, virtual bitmap, FM • 4) Systems: UnivMon, FlowRadar, SketchVisor,Opensketch

  6. 01 Background • Room for improvement exists because • sketch accuracies vary a lot • with different datasets and parameters • Reason: • 1) process: insert and then estimate • 2) estimate makes assumptions of network traffic • 3) assumption does not always hold. • Rationale: improve when not accurate.

  7. 02 PART TWO Machine Learning Framework

  8. 02 Machine Learning Framework • Two steps: • 1) Sampling • 2) Machine learning Sampled set (S) Learning sketch Training set (T) Building learning sketch Featureextraction Training set generation Sampling Training Packet set (P) Learning Querying Building regular sketch Regular sketch

  9. 03 PART Three Implementations

  10. 03 Implementation Interfaces Server/MLSketch Core 1(Thread 1) Core 2 (Thread 2) Interface for MLSketch HardwareNIC I/ORX Flow IDs Switch/Router RX Machine Learning Sampled set or sketches Flow IDs or sketches (Para meters) SWRing 1 HWRing CM, CU, CSM, FM … sketches Core 3 (Thread 3) Mempoolfor packets (Pointers) SWRing 2 Answer queries Original sketches

  11. 04 PART Four Case Studies

  12. 04 Case Studies • 1) Flow Size Estimation • 2) Heavy Hitter Detection • 3) Cardinality: # flows

  13. 04 Case Study I: Estimating Flow Size Insertion +1 19 • +1 • 24 CM CU CSM • +1 • 26 • +1 • 18 Query ’frequency: 18 = Min{19, 24, 26, 18}

  14. 04 Case Study I: Estimating Flow Size • For flows with high accuracy, do nothing. • For error-prone flows, machine learning can help. • For sketches CM, CU, CSM • 1) $d$ hashed counters • 2) Observation of error-prone flow: the two smallest counters v1 and v2. |v1-v2| > T

  15. 04 Case Study I: Estimating Flow Size • use linear regression as our machine learning algorithm due to two reasons: • From our many tests, the actual size is almost a linear combination of the $d$ hashed counters. • low overhead of time and space

  16. 04 Case Study I: Estimating Flow Size • Feature Section: the $d$ hashed counters. • Hypothesis function: • Smallest counter - noise

  17. 04 Case Study I: Estimating Flow Size • Query: • First Check whether is it a error prone flow? • 1) No, original method, e.g., report the smallest one. • 2) Yes, use the Hypothesis functionto minimize noise

  18. 04 Case Study II: Finding Top-k Flows Insertion Min-heap +1 • +1 • +1 • +1 Hash table

  19. 04 Case Study II: Finding Top-k Flows • Two types of errors: • 1) estimation error: same as that of flow sizes • 2) misclassification error: small flow  large flows in heap • Two tasks: • 1) classification task • 2) estimation task

  20. 04 Case Study II: Finding Top-k Flows • Suspicious flow: significantly overestimated and kept in heap. • Observation: • Suspicious flow size rarely increases after inserted into heap. • Method: • Add another counter in every node of the min-heap. • The memory overhead is very small.

  21. 04 Case Study II: Estimating Top-k Flows • Classification task • Regular sketch, Learning sketch, learning min-heap • Algorithm: logistic regression/SVM • Features: hashed counters, additional counters in heap • class label: whether the flow should be in heap.

  22. 04 Case Study II: Estimating Top-k Flows • Hypothesis function:

  23. 04 Case Study II: Estimating Top-k Flows • Estimation task • Algorithm: linear regression • Feature Selection: the d hashed counters

  24. 04 Case Study II: Estimating Top-k Flows • Queries have two steps: • Eliminate suspicious flows • Estimate the size of the remaining flows using linear regression model

  25. 04 Case Study III: Estimating #Flows • The FM sketch: The most significant bit The least significant bit low bit L1 high bit 0 0 0 1 0 1 1 1 1 1 h1 h2 e 0 0 0 1 0 0 1 1 1 1 h3 0 0 1 0 1 0 0 1 1 1 … 0.125 0.25 0.5 Map probability 1:2928 * 2average(Li)

  26. 04 Case Study III: Estimating #Flows • Algorithm: linear regression • Feature Selection: • 1) Li, $d$ low-bits • 2) Hi, $d$ high-bits • hypothesis function:

  27. 05 PART Five Experimental Results

  28. 05 Experiments (Setup) • Real traffic: a tire-1 router • Flow ID: 5-tuple • 10 minutes traffic every hour on two days. • # packets in each [876303, 1124480], • 41.81% flows have one packet • Large flows have ~30000 packets • Average flow size: [8.67, 10.04], deviation [102.7, 146.9] • Metrics: AAE, ARE

  29. 05 Experimental Results • Results on Flow Size Estimation • the CM sketch • the CSM sketch • the CU sketch • 2) Results on Heavy Hitter Detection • 3) Results on Cardinality: # flows

  30. 05 Experimental Results • Results on Flow Size Estimation • the CM sketch • the CSM sketch • the CU sketch • 2) Results on Heavy Hitter Detection • 3) Results on Cardinality: # flows

  31. 05 Results on Flow size (CM sketch) CM: ARE Improvments: [5.22, 7.41] mean: 6.32.

  32. 05 Results on Flow size (CM sketch) CM: AAE Improvments: [2.65, 3.31] mean: 2.95

  33. 05 Results on Flow size (CSM sketch) CSM: ARE Improvments: [26.89, 44.98] Mean: 35.75

  34. 05 Results on Flow size (CSM sketch) CSM: AAE Improvments: [10.68, 12.42] mean: 11.47

  35. 05 Results on Flow size (CU sketch) CU: ARE Improvments: [2.89, 3.27] Mean: 3.15

  36. 05 Results on Flow size (CU sketch) CU: AAE Improvments: [2.04, 2.22] Mean: 2.13

  37. 05 Results on Flow size (CU sketch) Training once vs. Training always

  38. 05 Experimental Results • Results on Flow Size Estimation • the CM sketch • the CSM sketch • the CU sketch • 2) Results on Heavy Hitter Detection • 3) Results on Cardinality: # flows

  39. 05 Results on Top-k elephant flows Suspicious flows: ARE: 100 ~ 350

  40. 05 Results on Top-k elephant flows AAE Improvments: [1.11, 2.59] Mean: 1.75

  41. 05 Results on Top-k elephant flows ARE Improvments: [14.89, 202.15] Mean: 75.36

  42. 05 Experimental Results • Results on Flow Size Estimation • the CM sketch • the CSM sketch • the CU sketch • 2) Results on Heavy Hitter Detection • 3) Results on Cardinality: # flows

  43. 05 Results on Cardinality ARE Improvments: Up to 200 times

  44. 05 Results on Cardinality ARE Improvments: Mean: up to 49%

  45. 06 PART SIX Conclusion

  46. 07 Conclusion • 1) We propose the idea of applying ML to sketches for the first time, and propose a generic framework. • 2) We present three case studies: flow sizes, top-k flows, and cardinality. • 3) We implement our framework and perform many experiments using real traffic.

  47. THANKS Tong Yang Peking University, China Email: yangtongemail@gmail.com Homepage: http://net.pku.edu.cn/~yangtong/

More Related