120 likes | 256 Views
A fast identification method for P2P flow based on nodes connection degree. LING XING, WEI-WEI ZHENG, JIAN-GUO MA, WEI-DONG MA Apperceiving Computing and Intelligence Analysis (ICACIA), 2010. Reporter 廖楚喬. Outline. Introduction
E N D
A fast identification method for P2P flow based on nodes connection degree LING XING, WEI-WEI ZHENG, JIAN-GUO MA, WEI-DONG MA Apperceiving Computing and Intelligence Analysis (ICACIA), 2010 Reporter 廖楚喬
Outline • Introduction • Traffic identification model based on nodes connection degree • Experiment verification • Conclusion
Introduction • P2P traffic has become one of the most significant portions in the network traffic. • P2P traffic identification can be classified into four categories: • Port based classification • TCP or UDP port number • Payload based classification • The special content of the datagram • Behavior patterns based classification • Flow properties and behavioral characteristics of flows • Machine learning based classification • Model training
Introduction • With network development, port number and payload-based methods are increasingly display limitations. • Machine learning based method requires training step, they generally have complex rules and time complexity and space complexity. • Finding certain behavior patterns for P2P flow is extremely significant for behavior-based identification method. • The author presents a new classification method based on connection degree, which can improve accuracy in a short time.
Traffic identification model based on nodes connection degree • Compared with other network applications, P2P traffic has the feature of multi-IP connecting. • In P2P network, nodes continually exchange information with each other. • Super nodes mean they have more connections with other nodes.
Traffic identification model based on nodes connection degree • Definition 1. “Link” is defined as the number of IP addresses connected with others’ IP addresses in unit time. • Definition 2. “D_link” is defined as the number of different IP addresses connected with others’ IPaddresses in unit time. • Definition 3. Ratio “P” is defined as P = D_link/Link. • Parameters Link, D_link and P are considered as judging condition for traffic identification.
Traffic identification model based on nodes connection degree
Experiment verification • Experiment environment • Active user’s IP range is 222.196.33.0/24 • The users connect external network through campus core-router(CISICO) • The traffic of active nodes is gathered by collection server
Experiment verification • Two steps of data collecting and preprocessing: • Wireshark[17] is used as traffic collection tool to collect network traffic from well-know ports. • IP flow is filtered and S space is formed by IP packets. • Where S_IP, D_IP, S_Port, D_Portand Class represents source IP address, destination IP address, source port number, destination port number and flow type respectively. [17] http://sourceforge.net/projects/weka/files/
Experiment verification • Results HTTP OICQ DNS Edonkey
Experiment verification • Through adjusting parameters, they find that it can get good classification result when link>5, D_link>5, P>30%. • Results
Conclusion • Nodes connection degree based identification method has two improvements: • It focuses on the essence features of P2P traffic, so it has higher identification accuracy. • Experimental results show that this solution can solve problems that dynamic port and content encryption bring out.