490 likes | 505 Views
This paper explores optimizing network flow measurement accuracy using machine learning frameworks on sketches. It discusses techniques, implementations, and case studies to enhance estimation, detection, and classification of flow attributes. The research aims to address existing challenges in scalability and efficiency for network management needs.
E N D
Empowering Sketches with Machine Learning for Network Measurements Tong Yang, LunWang Peking University, China YulongShen, Xidian University, China Muhammad Shahzad, North Carolina State University, USA Qun Huang, ICT, CAS, China Xiaohong Jiang, Future University Hakodate, Japan Kun Tan, Huawei, China Xiaoming Li, Peking University, China Tong Yang, Peking University yangtongemail@gmail.com http://net.pku.edu.cn/~yangtong
Outline PART 01 PART 02 PART 03 PART 04 PART 06 Background Machine Learning Framework Implementations Case Studies Conclusion PART 05 Experimental results
01 PART ONE Background
01 Background • Network management needs • Accurate and timely estimate of flow-level metrics. • 1) Flow size • 2) Top-k elephant/large flows • 3) Cardinality: # flows • Best technique: Sketch • 1) Memory efficient 2) Constant and Fast speed • 3) High accuracy
01 Background • Existing solutions focus on a trade-off among • 1) accuracy of estimates • 2) the memory usage • 3) and speed • A great number of works on sketches • 1) flow size: CM, CU, Count, CSM, Pyramid, TCM • 2) top-k: sketch+heap, SS, Hashpipe, CSS, Cold filter • 3) Cardinality: bitmap, virtual bitmap, FM • 4) Systems: UnivMon, FlowRadar, SketchVisor,Opensketch
01 Background • Room for improvement exists because • sketch accuracies vary a lot • with different datasets and parameters • Reason: • 1) process: insert and then estimate • 2) estimate makes assumptions of network traffic • 3) assumption does not always hold. • Rationale: improve when not accurate.
02 PART TWO Machine Learning Framework
02 Machine Learning Framework • Two steps: • 1) Sampling • 2) Machine learning Sampled set (S) Learning sketch Training set (T) Building learning sketch Featureextraction Training set generation Sampling Training Packet set (P) Learning Querying Building regular sketch Regular sketch
03 PART Three Implementations
03 Implementation Interfaces Server/MLSketch Core 1(Thread 1) Core 2 (Thread 2) Interface for MLSketch HardwareNIC I/ORX Flow IDs Switch/Router RX Machine Learning Sampled set or sketches Flow IDs or sketches (Para meters) SWRing 1 HWRing CM, CU, CSM, FM … sketches Core 3 (Thread 3) Mempoolfor packets (Pointers) SWRing 2 Answer queries Original sketches
04 PART Four Case Studies
04 Case Studies • 1) Flow Size Estimation • 2) Heavy Hitter Detection • 3) Cardinality: # flows
04 Case Study I: Estimating Flow Size Insertion +1 19 • +1 • 24 CM CU CSM • +1 • 26 • +1 • 18 Query ’frequency: 18 = Min{19, 24, 26, 18}
04 Case Study I: Estimating Flow Size • For flows with high accuracy, do nothing. • For error-prone flows, machine learning can help. • For sketches CM, CU, CSM • 1) $d$ hashed counters • 2) Observation of error-prone flow: the two smallest counters v1 and v2. |v1-v2| > T
04 Case Study I: Estimating Flow Size • use linear regression as our machine learning algorithm due to two reasons: • From our many tests, the actual size is almost a linear combination of the $d$ hashed counters. • low overhead of time and space
04 Case Study I: Estimating Flow Size • Feature Section: the $d$ hashed counters. • Hypothesis function: • Smallest counter - noise
04 Case Study I: Estimating Flow Size • Query: • First Check whether is it a error prone flow? • 1) No, original method, e.g., report the smallest one. • 2) Yes, use the Hypothesis functionto minimize noise
04 Case Study II: Finding Top-k Flows Insertion Min-heap +1 • +1 • +1 • +1 Hash table
04 Case Study II: Finding Top-k Flows • Two types of errors: • 1) estimation error: same as that of flow sizes • 2) misclassification error: small flow large flows in heap • Two tasks: • 1) classification task • 2) estimation task
04 Case Study II: Finding Top-k Flows • Suspicious flow: significantly overestimated and kept in heap. • Observation: • Suspicious flow size rarely increases after inserted into heap. • Method: • Add another counter in every node of the min-heap. • The memory overhead is very small.
04 Case Study II: Estimating Top-k Flows • Classification task • Regular sketch, Learning sketch, learning min-heap • Algorithm: logistic regression/SVM • Features: hashed counters, additional counters in heap • class label: whether the flow should be in heap.
04 Case Study II: Estimating Top-k Flows • Hypothesis function:
04 Case Study II: Estimating Top-k Flows • Estimation task • Algorithm: linear regression • Feature Selection: the d hashed counters
04 Case Study II: Estimating Top-k Flows • Queries have two steps: • Eliminate suspicious flows • Estimate the size of the remaining flows using linear regression model
04 Case Study III: Estimating #Flows • The FM sketch: The most significant bit The least significant bit low bit L1 high bit 0 0 0 1 0 1 1 1 1 1 h1 h2 e 0 0 0 1 0 0 1 1 1 1 h3 0 0 1 0 1 0 0 1 1 1 … 0.125 0.25 0.5 Map probability 1:2928 * 2average(Li)
04 Case Study III: Estimating #Flows • Algorithm: linear regression • Feature Selection: • 1) Li, $d$ low-bits • 2) Hi, $d$ high-bits • hypothesis function:
05 PART Five Experimental Results
05 Experiments (Setup) • Real traffic: a tire-1 router • Flow ID: 5-tuple • 10 minutes traffic every hour on two days. • # packets in each [876303, 1124480], • 41.81% flows have one packet • Large flows have ~30000 packets • Average flow size: [8.67, 10.04], deviation [102.7, 146.9] • Metrics: AAE, ARE
05 Experimental Results • Results on Flow Size Estimation • the CM sketch • the CSM sketch • the CU sketch • 2) Results on Heavy Hitter Detection • 3) Results on Cardinality: # flows
05 Experimental Results • Results on Flow Size Estimation • the CM sketch • the CSM sketch • the CU sketch • 2) Results on Heavy Hitter Detection • 3) Results on Cardinality: # flows
05 Results on Flow size (CM sketch) CM: ARE Improvments: [5.22, 7.41] mean: 6.32.
05 Results on Flow size (CM sketch) CM: AAE Improvments: [2.65, 3.31] mean: 2.95
05 Results on Flow size (CSM sketch) CSM: ARE Improvments: [26.89, 44.98] Mean: 35.75
05 Results on Flow size (CSM sketch) CSM: AAE Improvments: [10.68, 12.42] mean: 11.47
05 Results on Flow size (CU sketch) CU: ARE Improvments: [2.89, 3.27] Mean: 3.15
05 Results on Flow size (CU sketch) CU: AAE Improvments: [2.04, 2.22] Mean: 2.13
05 Results on Flow size (CU sketch) Training once vs. Training always
05 Experimental Results • Results on Flow Size Estimation • the CM sketch • the CSM sketch • the CU sketch • 2) Results on Heavy Hitter Detection • 3) Results on Cardinality: # flows
05 Results on Top-k elephant flows Suspicious flows: ARE: 100 ~ 350
05 Results on Top-k elephant flows AAE Improvments: [1.11, 2.59] Mean: 1.75
05 Results on Top-k elephant flows ARE Improvments: [14.89, 202.15] Mean: 75.36
05 Experimental Results • Results on Flow Size Estimation • the CM sketch • the CSM sketch • the CU sketch • 2) Results on Heavy Hitter Detection • 3) Results on Cardinality: # flows
05 Results on Cardinality ARE Improvments: Up to 200 times
05 Results on Cardinality ARE Improvments: Mean: up to 49%
06 PART SIX Conclusion
07 Conclusion • 1) We propose the idea of applying ML to sketches for the first time, and propose a generic framework. • 2) We present three case studies: flow sizes, top-k flows, and cardinality. • 3) We implement our framework and perform many experiments using real traffic.
THANKS Tong Yang Peking University, China Email: yangtongemail@gmail.com Homepage: http://net.pku.edu.cn/~yangtong/