140 likes | 291 Views
Traffic Classification through Simple Statistical Fingerprinting. M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli ACM SIGCOMM Computer Communication Review, 2007 Networking Journal Club 9th July 2010. Outline. Introduction (Related Work) Protocol Fingerprints Classification Algorithm
E N D
Traffic Classification through Simple Statistical Fingerprinting M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli ACM SIGCOMM Computer Communication Review, 2007 Networking Journal Club 9th July 2010
Outline • Introduction • (Related Work) • Protocol Fingerprints • Classification Algorithm • Experimental Analysis • Discussion • Future work and Conclusions
Introduction • Motivation: Traffic classification: • Allocation, control and management of resources • Intrusion detection • QoS-aware mechanisms • … • Methods: • Port-based • DPI • …
Protocol Fingerprints • TCP flows (HTTP, SMTP, SSH, …) • Unidirectional • Statistical properties of the flows: • Size of packets • Inter-arrival times • Order of arrivals • PDFi : • Probability density function of packet i-th on the plane (size,interarrival) • PDF: vector of L PDFi
Protocol Fingerprints • Anomaly score: “how statistically far” an unknown flow F is from a given protocol PDF • To smooth PDFi use Gaussian filter: Mi • Preliminary anomaly score: • Anomaly score: • Anomaly threshold: upper bound of the anomaly score to be considered of this protocol
Classification algorithm • Collect traffic traces (training set) • Pre-classify traces (the accuracy of the tool is critical) • Build protocol fingerprints • Start the classification engine • Periodically, update the fingerprints • Low computational load
Experimental Analysis • Traffic traces collected in campus: 24 Mbps link • >60% TCP port: 80, 110, 25 • >40GB, 20K flows, of HTTP, POP3, SMTP • Performance parameters: • Hit rate • False positive rate • 4th packet
Discussion • Accuracy of training sets • Complexity of the technique • Fclient or Fserver? Where’s the classifier? • On the precision of the measuring devices
Future Work • Application to a larger data set: VoIP, P2P… • Behavior in different networks • How does the classifier respond to imprecise training set? • Complexity of the algorithm: • memory occupation • amenability to HW-assisted implementation • computational costs of the training phase