1 / 34

A P2P flow Identification Model Based On Bayesian Network

A P2P flow Identification Model Based On Bayesian Network. Published in : Wireless Communications, Networking and Mobile Computing (WiCOM), 2011 7th International Conference on Date of Conference:  23-25 Sept. 2011. 102062626 黃柏勛 資工碩一.

Download Presentation

A P2P flow Identification Model Based On Bayesian Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A P2P flow Identification Model Based On BayesianNetwork • Published in: • Wireless Communications, Networking and Mobile Computing (WiCOM), 2011 7th International Conference on • Date of Conference: 23-25 Sept. 2011 102062626 黃柏勛 資工碩一 Authors: JIN Fenglin Department of Computer Science and Technology Nanjing University ICA, PLA University of Science and Technology Nanjing, China fljin@sina.com DUAN Yifeng Institute of Command Automation PLA University of Science and Technology Nanjing, China dyf.rhy@263.com 1/31

  2. Abstract • 1.Constitute A uniform P2P flow identification model. – UFIM • (Uniform Flow Identification Model) • 2.An idea to describe UFIM abstractly utilizing Bayesian network model is advanced.  • 3. We make 6 measurements to denote identification performance.  • 4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly. • 5. . All these works establish the base of giving new identification method further. 2/31

  3. Introduction • P2P flows could be sorted to 4 classes: • 1.Port identification 2. application layer characteristic word identification 3.transport layer heuristic identification 4.machine learning identification • Erman et al. utilized two datasets, and contrasts 3 unsupervised clustering algo.:K-means, DBSCAN, and AutoClass . • He contrasted the accuracy, time-consuming, but without processing rate, real-time, CPU and memory consuming. • Though many P2P flow identification method exist, but we • are lack of detailed contrasting and analyzing of different • identification method. 3/31

  4. Introduction • This essay gives a UFIM (Uniform Flow Identification Model) to describe different P2P flow identification method and give a theory of abstractly describing UFIM using Bayesian Network. • And group the current flow identification characteristic to two categories: “basic characteristic” and “statistical characteristic” to decreasing the implementation complexity. => A Bayesian network model method to construct specific identification access. • And give 6 measurements to analyze identification method. 4/31

  5. II. P2P FLOW IDENTIFICATION MODEL • Current p2p flow identification methods are different in • implementation access but have same essential characteristic—set mapping. • Supposing that flows denotes the identified and classified flow sets, Y denotes the known application protocol set, then arbitrary identification method • could be denoted as F : flows →Y , namely the mapping • form flow set flows to application protocol set Y. 5/31

  6. II. P2P FLOW IDENTIFICATION MODEL 6/31

  7. II. P2P FLOW IDENTIFICATION MODEL • UFIM consists of 3 part mainly: • (1) Characteristic set X = {A1 A2.. Am } Ai is random variable and denotes the flow identification characteristic • (2) Application protocol set Y = {y1 y2 ...yn } yi is an arbitrary vector, and m denotes m random variables • corresponding to X and identify different application protocols; • (3) Mapping function F. for a given flow i flow , F could • judge the belonged application protocol of yk , 7/31

  8. II. P2P FLOW IDENTIFICATION MODEL 8/31

  9. II. P2P FLOW IDENTIFICATION MODEL • We could take out a flow record (1)flow i from flows set and (2)construct a value vector a = {a1^0 , a2^0 , ..., am^0 } , which is corresponding to m characteristics in X. Then we (3)contrast a with n vectors in Y, and (4)output the application protocol yk , which has the highest similarity, as result. 9/31

  10. II. P2P FLOW IDENTIFICATION MODEL • The accuracyUFIM A of UFIM is decided by 3 parts: • ① set Y, this set is related with application protocol • classification and the accuracy of vector and denoted as (1)Ay ; • ② (2)Aflow , it is related to accuracy of characteristic value a when construct unknown flow flowi ; • ③(3)Af , it is related to the accuracy of mapping function F. 10/31

  11. 1.Constitute A uniform P2P flow identification model. – UFIM (Uniform Flow Identification Model) 2.An idea to describe UFIM abstractly utilizing Bayesian network model is advanced.  3. We make 6 measurements to denote identification performance.  4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly. 5. . All these works establish the base of giving new identification method further. Abstract 11/31

  12. III. Bayesian network description 12/31

  13. III. Bayesian network description • Back ground • 1.Baysian Network • 2.Basic and Statistical characteristic • (Characteristic selection is important for identification.) 13/31

  14. Baysian Network • -Directed acyclic graphical model is a probabilistic graphical model. 14/31

  15. IIII.Bayesian network description 15/31

  16. IIII.Bayesian network description • --Basic characteristics represents the characteristics that can be extracted directly from a single block, denoted by Ai^0 , the basic characteristic set is denoted as • --Statistical characteristics represents the characteristics that can be extracted from basic characteristics of multiple messages, denoted as Ai^j , where i represents the basic characteristics of Ai . 16/31

  17. IIII.Bayesian network description • Through studying different existing identification methods and the 248 kinds of characteristics mentioned in literature [10], we selected 7 basic characteristics, as TableⅠ shows. 17/31

  18. IIII.Bayesian network description 18/31

  19. IIII.Bayesian network description 19/31

  20. 1.Constitute A uniform P2P flow identification model. – UFIM (Uniform Flow Identification Model) 2. An idea to describe UFIM abstractly utilizing Bayesian network model is advanced.  3. We make 6 measurements to denote identification performance.  4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly. 5. . All these works establish the base of giving new identification method further. Abstract 11/31

  21. IV. Performance measurements • Def .1: flow identification rate T: it denotes the maximum • packets needed for flow identification, that f constructs all the captured packets for identification characteristics • Because ni denotes the packet quantity needed for constructing Xi . 20/31

  22. IV. Performance measurements • Def. 2: protocol distinguishing rate I: suggest that f • distinguish the belonged application protocol of flow with • conditional probability , then I denotes the • probability that packets were mis-distinguished, that is the proportion of misidentified flow in total flows. 21/31

  23. IV. Performance measurements • Def 3: characteristic offset W: packets belongs to χ is • regarded as unknown flow, then W denotes the proportion of unknown flow in total flows. • Def 4: identification robustness H: it denotes whether the • correctness of f is correlated with the packet arriving order. 22/31

  24. IV. Performance measurements • Def 5: flow identification consuming L:it denotes the time • needed for flow identification and equals to the time • complexity of f. • Def 6: flow identification space S: it denotes the memory • space of f needed for identifying flow and equals to the time complexity of f. 23/31

  25. IV. Performance measurements • T reflects the real-time of f, • I and W reflect the correctness of f, • H reflects robustness of function f, • L and S reflect the complexity of f. 24/31

  26. 1.Constitute A uniform P2P flow identification model. – UFIM (Uniform Flow Identification Model) 2. An idea to describe UFIM abstractly utilizing Bayesian network model is advanced.  3. We make 6 measurements to denote identification performance.  4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly. 5. . All these works establish the base of giving new identification method further. Abstract 11/31

  27. V.Experiment Analysis • -F denotes the number of error identification P2P flows, include false negative and false positive. • -U denotes as unknown flow • [A,B] denotes a set of testing data 25/31

  28. V.Experiment Analysis • -I and W denote protocol erroneous judgment and characteristic offset. • [A,B] denotes a set of testing data 26/31

  29. V.Experiment Analysis • From the identification result we could conclude that the • proportion of F and U in P2P traffic is same to I and W. so I and W could be used to denote the identification accuracy of I and W. 27/31

  30. 1.Constitute A uniform P2P flow identification model. – UFIM (Uniform Flow Identification Model) 2. An idea to describe UFIM abstractly utilizing Bayesian network model is advanced.  3. We make 6 measurements to denote identification performance.  4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly. 5. . All these works establish the base of giving new identification method further. Abstract 11/31

  31. VI. Summary and My report • we could analyze and compare performance of different identification method in uniform model. 28/31

  32. VI. Summary and My report • we could analyze and compare performance of different identification method in uniform model. • Math is important. 29/31

  33. Question • ? 30/31

  34. By Bee 31/31

More Related