Empowering Sketches with Machine Learning for Network Measurements

Empowering Sketches with Machine Learning for Network Measurements Tong Yang, LunWang Peking University, China YulongShen, Xidian University, China Muhammad Shahzad, North Carolina State University, USA Qun Huang, ICT, CAS, China Xiaohong Jiang, Future University Hakodate, Japan Kun Tan, Huawei, China Xiaoming Li, Peking University, China Tong Yang, Peking University yangtongemail@gmail.com http://net.pku.edu.cn/~yangtong

Outline PART 01 PART 02 PART 03 PART 04 PART 06 Background Machine Learning Framework Implementations Case Studies Conclusion PART 05 Experimental results

01 PART ONE Background

01 Background • Network management needs • Accurate and timely estimate of flow-level metrics. • 1) Flow size • 2) Top-k elephant/large flows • 3) Cardinality: # flows • Best technique: Sketch • 1) Memory efficient 2) Constant and Fast speed • 3) High accuracy

01 Background • Existing solutions focus on a trade-off among • 1) accuracy of estimates • 2) the memory usage • 3) and speed • A great number of works on sketches • 1) flow size: CM, CU, Count, CSM, Pyramid, TCM • 2) top-k: sketch+heap, SS, Hashpipe, CSS, Cold filter • 3) Cardinality: bitmap, virtual bitmap, FM • 4) Systems: UnivMon, FlowRadar, SketchVisor,Opensketch

01 Background • Room for improvement exists because • sketch accuracies vary a lot • with different datasets and parameters • Reason: • 1) process: insert and then estimate • 2) estimate makes assumptions of network traffic • 3) assumption does not always hold. • Rationale: improve when not accurate.

02 PART TWO Machine Learning Framework

02 Machine Learning Framework • Two steps: • 1) Sampling • 2) Machine learning Sampled set (S) Learning sketch Training set (T) Building learning sketch Featureextraction Training set generation Sampling Training Packet set (P) Learning Querying Building regular sketch Regular sketch

03 PART Three Implementations

03 Implementation Interfaces Server/MLSketch Core 1(Thread 1) Core 2 (Thread 2) Interface for MLSketch HardwareNIC I/ORX Flow IDs Switch/Router RX Machine Learning Sampled set or sketches Flow IDs or sketches (Para meters) SWRing 1 HWRing CM, CU, CSM, FM … sketches Core 3 (Thread 3) Mempoolfor packets (Pointers) SWRing 2 Answer queries Original sketches

04 PART Four Case Studies

04 Case Studies • 1) Flow Size Estimation • 2) Heavy Hitter Detection • 3) Cardinality: # flows

04 Case Study I: Estimating Flow Size Insertion +1 19 • +1 • 24 CM CU CSM • +1 • 26 • +1 • 18 Query ’frequency: 18 = Min{19, 24, 26, 18}

04 Case Study I: Estimating Flow Size • For flows with high accuracy, do nothing. • For error-prone flows, machine learning can help. • For sketches CM, CU, CSM • 1) $d$ hashed counters • 2) Observation of error-prone flow: the two smallest counters v1 and v2. |v1-v2| > T

04 Case Study I: Estimating Flow Size • use linear regression as our machine learning algorithm due to two reasons: • From our many tests, the actual size is almost a linear combination of the $d$ hashed counters. • low overhead of time and space

04 Case Study I: Estimating Flow Size • Feature Section: the $d$ hashed counters. • Hypothesis function: • Smallest counter - noise

04 Case Study I: Estimating Flow Size • Query: • First Check whether is it a error prone flow? • 1) No, original method, e.g., report the smallest one. • 2) Yes, use the Hypothesis functionto minimize noise

04 Case Study II: Finding Top-k Flows Insertion Min-heap +1 • +1 • +1 • +1 Hash table

04 Case Study II: Finding Top-k Flows • Two types of errors: • 1) estimation error: same as that of flow sizes • 2) misclassification error: small flow  large flows in heap • Two tasks: • 1) classification task • 2) estimation task

04 Case Study II: Finding Top-k Flows • Suspicious flow: significantly overestimated and kept in heap. • Observation: • Suspicious flow size rarely increases after inserted into heap. • Method: • Add another counter in every node of the min-heap. • The memory overhead is very small.

04 Case Study II: Estimating Top-k Flows • Classification task • Regular sketch, Learning sketch, learning min-heap • Algorithm: logistic regression/SVM • Features: hashed counters, additional counters in heap • class label: whether the flow should be in heap.

04 Case Study II: Estimating Top-k Flows • Hypothesis function:

04 Case Study II: Estimating Top-k Flows • Estimation task • Algorithm: linear regression • Feature Selection: the d hashed counters

04 Case Study II: Estimating Top-k Flows • Queries have two steps: • Eliminate suspicious flows • Estimate the size of the remaining flows using linear regression model

04 Case Study III: Estimating #Flows • The FM sketch: The most significant bit The least significant bit low bit L1 high bit 0 0 0 1 0 1 1 1 1 1 h1 h2 e 0 0 0 1 0 0 1 1 1 1 h3 0 0 1 0 1 0 0 1 1 1 … 0.125 0.25 0.5 Map probability 1:2928 * 2average(Li)

04 Case Study III: Estimating #Flows • Algorithm: linear regression • Feature Selection: • 1) Li, $d$ low-bits • 2) Hi, $d$ high-bits • hypothesis function:

05 PART Five Experimental Results

05 Experiments (Setup) • Real traffic: a tire-1 router • Flow ID: 5-tuple • 10 minutes traffic every hour on two days. • # packets in each [876303, 1124480], • 41.81% flows have one packet • Large flows have ~30000 packets • Average flow size: [8.67, 10.04], deviation [102.7, 146.9] • Metrics: AAE, ARE

05 Experimental Results • Results on Flow Size Estimation • the CM sketch • the CSM sketch • the CU sketch • 2) Results on Heavy Hitter Detection • 3) Results on Cardinality: # flows

05 Results on Flow size (CM sketch) CM: ARE Improvments： [5.22, 7.41] mean: 6.32.

05 Results on Flow size (CM sketch) CM: AAE Improvments： [2.65, 3.31] mean: 2.95

05 Results on Flow size (CSM sketch) CSM: ARE Improvments： [26.89, 44.98] Mean: 35.75

05 Results on Flow size (CSM sketch) CSM: AAE Improvments： [10.68, 12.42] mean: 11.47

05 Results on Flow size (CU sketch) CU: ARE Improvments： [2.89, 3.27] Mean: 3.15

05 Results on Flow size (CU sketch) CU: AAE Improvments： [2.04, 2.22] Mean: 2.13

05 Results on Flow size (CU sketch) Training once vs. Training always

05 Results on Top-k elephant flows Suspicious flows: ARE: 100 ~ 350

05 Results on Top-k elephant flows AAE Improvments： [1.11, 2.59] Mean: 1.75

05 Results on Top-k elephant flows ARE Improvments： [14.89, 202.15] Mean: 75.36

05 Results on Cardinality ARE Improvments： Up to 200 times

05 Results on Cardinality ARE Improvments： Mean: up to 49%

06 PART SIX Conclusion

07 Conclusion • 1) We propose the idea of applying ML to sketches for the first time, and propose a generic framework. • 2) We present three case studies: flow sizes, top-k flows, and cardinality. • 3) We implement our framework and perform many experiments using real traffic.

THANKS Tong Yang Peking University, China Email: yangtongemail@gmail.com Homepage: http://net.pku.edu.cn/~yangtong/

Empowering Sketches with Machine Learning for Network Measurements

Empowering Sketches with Machine Learning for Network Measurements

Presentation Transcript

Network Measurements

Measurements for Network Operations

Empowering Personalized Learning With Big Data

Machine Learning with MapReduce

Practice with Quick Sketches

Personalizing Education With Machine Learning

Machine Learning with EM

Machine Learning with Apache Hama

Machine Learning with WEKA

Empowering learners with mobile learning

Machine Learning for Network Anomaly Detection

Empowering Student Learning with Personal Learning Networks for Appreciative College Experience

Empowering Staff Empowering Students for Virtual Learning Environments

Network Measurements

Artificial Neural Network for Machine Learning – Structure & Layers

Network Measurements

Empowering Staff Empowering Students for Virtual Learning Environments

Empowering Student Learning with Personal Learning Networks for Appreciative College Experience

Machine Learning with Weka

Machine Learning with Python

MACHINE LEARNING WITH PYTHON

Machine Learning with Python