Traffic Classification through Simple Statistical Fingerprinting

Traffic Classification through Simple Statistical Fingerprinting M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli ACM SIGCOMM Computer Communication Review, 2007 Networking Journal Club 9th July 2010

Outline • Introduction • (Related Work) • Protocol Fingerprints • Classification Algorithm • Experimental Analysis • Discussion • Future work and Conclusions

Introduction • Motivation: Traffic classification: • Allocation, control and management of resources • Intrusion detection • QoS-aware mechanisms • … • Methods: • Port-based • DPI • …

Protocol Fingerprints • TCP flows (HTTP, SMTP, SSH, …) • Unidirectional • Statistical properties of the flows: • Size of packets • Inter-arrival times • Order of arrivals • PDFi : • Probability density function of packet i-th on the plane (size,interarrival) • PDF: vector of L PDFi

Protocol Fingerprints • Anomaly score: “how statistically far” an unknown flow F is from a given protocol PDF • To smooth PDFi use Gaussian filter: Mi • Preliminary anomaly score: • Anomaly score: • Anomaly threshold: upper bound of the anomaly score to be considered of this protocol

Classification algorithm

Classification algorithm • Collect traffic traces (training set) • Pre-classify traces (the accuracy of the tool is critical) • Build protocol fingerprints • Start the classification engine • Periodically, update the fingerprints • Low computational load 

Experimental Analysis • Traffic traces collected in campus: 24 Mbps link • >60% TCP port: 80, 110, 25 • >40GB, 20K flows, of HTTP, POP3, SMTP • Performance parameters: • Hit rate • False positive rate • 4th packet

Sensitivity to parameters

Discussion • Accuracy of training sets • Complexity of the technique • Fclient or Fserver? Where’s the classifier? • On the precision of the measuring devices

Future Work • Application to a larger data set: VoIP, P2P… • Behavior in different networks • How does the classifier respond to imprecise training set? • Complexity of the algorithm: • memory occupation • amenability to HW-assisted implementation • computational costs of the training phase

Traffic Classification through Simple Statistical Fingerprinting

Traffic Classification through Simple Statistical Fingerprinting

Presentation Transcript

Google-based Traffic Classification

Internet Traffic Classification CSE881 Project

Simple Statistical Designs

DNA fingerprinting made simple Cat no 130

Traffic classification and applications to traffic monitoring

Fingerprinting

Fingerprinting

Fingerprinting

Fingerprinting

Fingerprinting

Statistical Approach to Classification

Fingerprinting

Fingerprinting

Simple Statistical Designs

Internet Traffic Classification KISS

Fingerprinting

Data Classification by Statistical Methods

Statistical Classification

ITK Statistical Classification

Simple statistical summaries

Traffic Management - 9 Simple Traffic Calming Steps

Fingerprinting