360 likes | 522 Views
Detecting P2P Traffic from the P2P Flow Graph . Jonghyun Kim Khushboo Shah Stephen Bohacek. Electrical and Computer Engineering. Outline. Introduction and Objectives Flow Data Identification Methods Class A-1 : Degree-Based P2P Detection Class A-2 : Known Port
E N D
Detecting P2P Traffic from the P2P Flow Graph Jonghyun Kim Khushboo Shah Stephen Bohacek Electrical and Computer Engineering
Outline • Introduction and Objectives • Flow Data • Identification Methods • Class A-1 : Degree-Based P2P Detection • Class A-2 : Known Port • Class B-1 : Repeated Communication • Class B-2 : P2P Port-Based Identification • Class B-3 : Triggered P2P Detection • Results • Conclusion • Future Work
Introduction • Why detection of P2P Traffic? • Helpful for network capacity planning, provisioning, traffic shaping/policing, etc. • How to detect P2P Traffic? • Portbased • Signaturebased • Behaviorbased • Machine learning based • Host graphbased
Objectives • No deep packet inspection • Simpler, but still be effective • P2P flow graph based
Flow Data • : source IP • SIP • : destination IP • DIP • : source port • SP • : destination port • DP • : protocol (tcp or udp) • PR • : flow start time • ST • : event ID (info for signature matching) • EID
SIP SP PR DP DIP 6881 60355 TCP Flow Data Pictorial view A B ST SYN time Mathematical expression Each flow has components.
Identification Methods • P2P flow graph by methods flow 1 Class B methods connect flow1 to flow 2 flow 2 Class A methods detect flow 1 (an initial P2P flow)
Class A-1 : Degree-based P2P Detection 2710 TCP 63135 Out-degree hosts X1 63120 5354 TCP X1 X2 X2 63138 6969 TCP X3 X3 60727 55038 TCP T X4 X7 8 21566 55038 UDP X10 X5 X11 33561 55038 UDP X6 X12 52334 63234 TCP t A X7 X13 33765 55038 TCP X8 In-degree hosts 27164 TCP 55038 X9 X4 51413 TCP 63320 T X10 X5 9090 TCP 63356 X11 5 X6 18636 UDP 55038 X12 X8 26675 UDP 55038 X13 X9
Class A-1 : Degree-based P2P detection • Out-degree • In-degree • P2P active time (ID is not considered) • Detector
Class A-2 : Known Port • P2P active Time • Detector
Identification Methods • P2P flow graph by methods flow 1 Take a look at Class B methods flow 2 Done with Class A methods
Class B-1 : Repeated Communication between Known P2P Peers A X 52334 63234 TCP A X A X
Class B-1 : Repeated Communication between Known P2P Peers • Detector given an initial P2P flow P2P peers = • Detector given a set of P2P flows
Class B-2 : P2P Port Identification and Port-Based P2P Detection
Class B-2 : P2P Port Identification and Port-Based P2P Detection 2710 TCP 63135 X1 63120 5354 TCP X2 63138 6969 TCP X3 52334 63234 TCP A X7 51413 TCP 63320 X10 9090 TCP 63356 X11 18636 UDP 55038 X12 26675 UDP 55038 X13
Class B-2 : P2P Port Identification and Port-Based P2P Detection 2710 TCP 63135 X1 63120 5354 TCP X2 63138 6969 TCP X3 52334 63234 TCP A X7 51413 TCP 63320 X10 9090 TCP 63356 X11 18636 UDP 55038 X12 26675 UDP 55038 X13
Class B-2 : P2P Port Identification and Port-Based P2P Detection Incoming outgoing TCP or UDP TCP or UDP T … … P2P port IP T
Class B-2 : P2P Port Identification and Port-Based P2P Detection • Detector given an P2P flow
Class B-3 : Triggered P2P Detection 1 sec … … A X 1 sec Nearby flows tend to be P2P flows
Class B-3 : Triggered P2P Detection • Detector given an P2P flow P2P peers =
Summary Class A : T : time window offset R : threshold for # of peers connected T R peers T T ↓, R ↑ Conservativeness ↑
Summary Class A : Class B : : Kth iteration : until convergence
Results : Number of P2P flows Detected 7 x 10 8 1 0.8 6 KPF480, 250 0.6 AC15,100 4 GH∞ Fraction of flows # of flows TGH∞ 0.4 2 0.2 0 0 C1 C2 C3 C1 C2 C3 Combination Combination
Results : Vertex Degree F1 F2 F8 Degree = 8 Single P2P flow F7 F3 type1 = any F4 F6 type2 = UDP F5 type3 = TCP, DIP = internal IP type4 = TCP, DIP = external IP : by GH1
Results : Vertex Degree type1 = any type2 = UDP type3 = TCP, DIP = internal IP type4 = TCP, DIP = external IP 0 10 -1 10 type1 type2 CCDF type3 -2 10 type4 -3 10 0 1 2 3 4 5 6 10 10 10 10 10 10 10 Degree
Results : Vertex Degree 131.118.57.37:45574 131.118.59.241:3723 131.118.58.62:60350 131.118.54.10:29842 131.118.59.241:3730 131.118.54.12:39144 131.118.55.210:34016 131.118.58.66:12648 131.118.59.241:3716 131.118.54.8:14471 • 131.118.39.53:4226 131.118.59.241:3727 131.118.53.66:62660 131.118.51.14:44744 131.118.59.241:3715 131.118.58.157:22559 131.118.59.241:3706 131.118.55.166:10067 131.118.55.189:26757 131.118.54.8:11025 Single P2P flow 131.118.55.188:36294 131.118.59.108:35127 131.118.51.35:44744 131.118.51.34:38323 131.118.59.241:3725 131.118.57.37:45540 131.118.59.241:3726 131.118.59.241:3722 131.118.59.241:3719 131.118.51.199:20288 131.118.59.241:3713 131.118.59.241:3684 131.118.59.241:3712 131.118.55.224:22033 131.118.39.54:4226 131.118.52.132:27136 131.118.58.117:52500 131.118.58.184:12648 131.118.53.2:22800 131.118.51.14:29836 131.118.57.79:37113 131.118.51.182:13511 131.118.51.37:42644 131.118.58.43:22559 131.118.59.39:31809 131.118.59.108:33302 131.118.59.76:36542 131.118.55.189:62885 131.118.51.135:13511 131.118.59.134:4226 131.118.51.53:42644 131.118.59.241:3708 131.118.59.241:3724 131.118.55.216:49898 131.118.52.132:48522 131.118.59.241:3707 131.118.59.241:3714 131.118.59.134:33302 131.118.54.23:40234 131.118.59.241:3710 131.118.59.84:36542 131.118.51.49:42644 131.118.54.30:45464 131.118.56.15:27494 131.118.59.241:3711 131.118.55.166:54690 131.118.54.8:10381 131.118.58.91:11099 131.118.51.50:42644 131.118.59.241:3709 131.118.54.8:27453 131.118.56.35:54636 131.118.59.241:3718 131.118.57.52:13363 131.118.58.193:18246 131.118.59.241:3728 131.118.59.241:3720 131.118.55.189:41540 131.118.54.8:53205 131.118.59.241:3731 131.118.59.155:31809 131.118.59.241:3729 131.118.59.241:3705 131.118.54.8:19229 131.118.54.76:55820 131.118.55.210:56289 131.118.58.33:58003 131.118.59.241:3721 131.118.51.41:42644 131.118.54.30:50659 131.118.51.105:30329 131.118.58.91:52500 131.118.54.86:50659 131.118.59.241:3732 131.118.59.241:3717 72.20.34.145:6881
Results : Large Connected Component Single P2P flow : by GH2 : by GH1
Results : Large Connected Component 1 type1 = any type2 = UDP 0.8 type3 = TCP, DIP = internal IP type4 = TCP, DIP = external IP 0.6 CCDF 0.4 0.2 0 … 7 5 5 x 10 x 10 x 10 1 2 7 0 # of flows reachable
Visualization of P2P Flow Graph GH link TA link large connected component small connected components
Conclusion • Even if Class A methods detect the small number of P2P flows by setting parameters conservatively, Class B recursive methods identify almost the rest of P2P flows. • There exists the large connected component (LCC) in P2P flow graph, so the identification of a single P2P flow in LCC leads to all flow detection in LCC.
Future Work • Real-time Identification • Complexity Analysis
Port white list : well-known port : NFS : MMS : Symantec AntiVirus : msft-gc : World of Warcraft : Yahoo! Messenger : AOL Instant Messenger : NAT Port Mapping Protocol : HTTP alternate < 1024 1025 1755 2967 3268 3724 5050 5190 5351 8080
Known P2P port : 6881~6889, 6969, 2710 : 6346~6349 : 2323, 3306, 4242, 4500, 4501, 4661~4674, 4677, 4678, 7778 : 1214, 1215, 1331 : 19114, 8081 : 2234, 5534 BitTorrent Gnutella Edonkey FastTrack Freenet Soulseek