280 likes | 348 Views
Machine Learning for Identification of P2P Traffic. Victor Gau Yi-Hsien Wang 2007.12.07. Performance Metrics. False positive rate: X/N X: number of non-P2P flows which were detected as P2P flow N: number of P2P flows Detection rate: Y/N Y: number of P2P flows which were detected correctly.
E N D
Machine Learning forIdentification of P2P Traffic Victor Gau Yi-Hsien Wang 2007.12.07
Performance Metrics • False positive rate: X/N • X: number of non-P2P flows which were detected as P2P flow • N: number of P2P flows • Detection rate: Y/N • Y: number of P2P flows which were detected correctly
Review • T. Karagiannis et. al. (UC Riverside) • Based on the five-tuple key {source IP, destination IP, protocol, source port, destination port}and 64-second flow timeout, examine two primary heuristics: • TCP/UDP IP pairs • {IP, port} pairs
Methodology • Based on the five-tuple key {source IP, destination IP, protocol, source port, destination port}and 64-second flow timeout, examine two primary heuristics: • TCP/UDP IP pairs • {IP, port} pairs
TCP/UDP IP Pairs • Look for pairs of source-destination hosts that use both TCP and UDP, • Excluding
{IP, Port} Pairs • for the advertised destination {IP, port} pair of host A, • the number of distinct IPs connected to host A will be equal to • the number of distinct ports used to connect to host A. 2 IPs = 2 Ports {B, 15} {C, 10}
Exclusion • For HTTP server, a client will initiate usually more than one concurrent connection in order to download objects in parallel. • A higher ratio of the number of distinct ports versus number of distinct IPs • 4 ports / 2 IPs = 2 {B, 15} {B, 30} {C, 10} {C, 20}
Using Machine Learning • Backpropagation Neural Network (BPNN)
Feature Selection • Src IP, port • Dest IP, port • Service type • TCP flags field (ACK, SYN, FIN, …) • Time To Live (TTL) • Flow duration • Packet size per flow • Packet number per flow • Packet rate per flow • …
Packet Size Std. Dev. Per Flow Short duration Long duration
Observation • Package size switching frequency • The number of times that difference between current packet size and previous packet size exceeds a threshold.
Feature Selection • Average packet size per flow • Packet size switching frequency per flow • Packet size Std. Dev. per flow • Number of packet per flow • Total bytes per flow • Flow duration
Performance Comparison • T. Karagiannis et. al. • Detection rate: 95% • False rate: 8%~12% • BPNN • Detection rate: 96% • False positive rate: 2.7%
Datasets From NLANR(http://pma.nlanr.net/Special/sdsc1.html)