A P2P flow Identification Model Based On Bayesian Network

A P2P flow Identification Model Based On BayesianNetwork • Published in: • Wireless Communications, Networking and Mobile Computing (WiCOM), 2011 7th International Conference on • Date of Conference: 23-25 Sept. 2011 102062626 黃柏勛資工碩一 Authors: JIN Fenglin Department of Computer Science and Technology Nanjing University ICA, PLA University of Science and Technology Nanjing, China fljin@sina.com DUAN Yifeng Institute of Command Automation PLA University of Science and Technology Nanjing, China dyf.rhy@263.com 1/31

Abstract • 1.Constitute A uniform P2P flow identification model. – UFIM • (Uniform Flow Identification Model) • 2.An idea to describe UFIM abstractly utilizing Bayesian network model is advanced. • 3. We make 6 measurements to denote identification performance. • 4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly. • 5. . All these works establish the base of giving new identification method further. 2/31

Introduction • P2P flows could be sorted to 4 classes: • 1.Port identification 2. application layer characteristic word identification 3.transport layer heuristic identification 4.machine learning identification • Erman et al. utilized two datasets, and contrasts 3 unsupervised clustering algo.:K-means, DBSCAN, and AutoClass . • He contrasted the accuracy, time-consuming, but without processing rate, real-time, CPU and memory consuming. • Though many P2P flow identification method exist, but we • are lack of detailed contrasting and analyzing of different • identification method. 3/31

Introduction • This essay gives a UFIM (Uniform Flow Identification Model) to describe different P2P flow identification method and give a theory of abstractly describing UFIM using Bayesian Network. • And group the current flow identification characteristic to two categories: “basic characteristic” and “statistical characteristic” to decreasing the implementation complexity. => A Bayesian network model method to construct specific identification access. • And give 6 measurements to analyze identification method. 4/31

II. P2P FLOW IDENTIFICATION MODEL • Current p2p flow identification methods are different in • implementation access but have same essential characteristic—set mapping. • Supposing that flows denotes the identified and classified flow sets, Y denotes the known application protocol set, then arbitrary identification method • could be denoted as F : flows →Y , namely the mapping • form flow set flows to application protocol set Y. 5/31

II. P2P FLOW IDENTIFICATION MODEL 6/31

II. P2P FLOW IDENTIFICATION MODEL • UFIM consists of 3 part mainly: • (1) Characteristic set X = {A1 A2.. Am } Ai is random variable and denotes the flow identification characteristic • (2) Application protocol set Y = {y1 y2 ...yn } yi is an arbitrary vector, and m denotes m random variables • corresponding to X and identify different application protocols; • (3) Mapping function F. for a given flow i flow , F could • judge the belonged application protocol of yk , 7/31

II. P2P FLOW IDENTIFICATION MODEL 8/31

II. P2P FLOW IDENTIFICATION MODEL • We could take out a flow record (1)flow i from flows set and (2)construct a value vector a = {a1^0 , a2^0 , ..., am^0 } , which is corresponding to m characteristics in X. Then we (3)contrast a with n vectors in Y, and (4)output the application protocol yk , which has the highest similarity, as result. 9/31

II. P2P FLOW IDENTIFICATION MODEL • The accuracyUFIM A of UFIM is decided by 3 parts: • ① set Y, this set is related with application protocol • classification and the accuracy of vector and denoted as (1)Ay ; • ② (2)Aflow , it is related to accuracy of characteristic value a when construct unknown flow flowi ; • ③(3)Af , it is related to the accuracy of mapping function F. 10/31

1.Constitute A uniform P2P flow identification model. – UFIM (Uniform Flow Identification Model) 2.An idea to describe UFIM abstractly utilizing Bayesian network model is advanced. 3. We make 6 measurements to denote identification performance. 4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly. 5. . All these works establish the base of giving new identification method further. Abstract 11/31

III. Bayesian network description 12/31

III. Bayesian network description • Back ground • 1.Baysian Network • 2.Basic and Statistical characteristic • (Characteristic selection is important for identification.) 13/31

Baysian Network • -Directed acyclic graphical model is a probabilistic graphical model. 14/31

IIII.Bayesian network description 15/31

IIII.Bayesian network description • --Basic characteristics represents the characteristics that can be extracted directly from a single block, denoted by Ai^0 , the basic characteristic set is denoted as • --Statistical characteristics represents the characteristics that can be extracted from basic characteristics of multiple messages, denoted as Ai^j , where i represents the basic characteristics of Ai . 16/31

IIII.Bayesian network description • Through studying different existing identification methods and the 248 kinds of characteristics mentioned in literature [10], we selected 7 basic characteristics, as TableⅠ shows. 17/31

1.Constitute A uniform P2P flow identification model. – UFIM (Uniform Flow Identification Model) 2. An idea to describe UFIM abstractly utilizing Bayesian network model is advanced. 3. We make 6 measurements to denote identification performance. 4. The contrasting result in theory analysis and experiments shows that UFIM can denote various type of P2P flow identification method abstractly. 5. . All these works establish the base of giving new identification method further. Abstract 11/31

IV. Performance measurements • Def .1: flow identification rate T: it denotes the maximum • packets needed for flow identification, that f constructs all the captured packets for identification characteristics • Because ni denotes the packet quantity needed for constructing Xi . 20/31

IV. Performance measurements • Def. 2: protocol distinguishing rate I: suggest that f • distinguish the belonged application protocol of flow with • conditional probability , then I denotes the • probability that packets were mis-distinguished, that is the proportion of misidentified flow in total flows. 21/31

IV. Performance measurements • Def 3: characteristic offset W: packets belongs to χ is • regarded as unknown flow, then W denotes the proportion of unknown flow in total flows. • Def 4: identification robustness H: it denotes whether the • correctness of f is correlated with the packet arriving order. 22/31

IV. Performance measurements • Def 5: flow identification consuming L:it denotes the time • needed for flow identification and equals to the time • complexity of f. • Def 6: flow identification space S: it denotes the memory • space of f needed for identifying flow and equals to the time complexity of f. 23/31

IV. Performance measurements • T reflects the real-time of f, • I and W reflect the correctness of f, • H reflects robustness of function f, • L and S reflect the complexity of f. 24/31

V.Experiment Analysis • -F denotes the number of error identification P2P flows, include false negative and false positive. • -U denotes as unknown flow • [A,B] denotes a set of testing data 25/31

V.Experiment Analysis • -I and W denote protocol erroneous judgment and characteristic offset. • [A,B] denotes a set of testing data 26/31

V.Experiment Analysis • From the identification result we could conclude that the • proportion of F and U in P2P traffic is same to I and W. so I and W could be used to denote the identification accuracy of I and W. 27/31

VI. Summary and My report • we could analyze and compare performance of different identification method in uniform model. 28/31

VI. Summary and My report • we could analyze and compare performance of different identification method in uniform model. • Math is important. 29/31

Question • ? 30/31

By Bee 31/31

A P2P flow Identification Model Based On Bayesian Network

A P2P flow Identification Model Based On Bayesian Network

Presentation Transcript

Network Tomography Based on Flow Level Measurements

Bayesian Network

BAYESIAN NETWORK

Inference in a Bayesian Network based on Stochastic Simulation

Bayesian Network

A Bayesian Network Based Pipeline Risk Management

Bayesian Subtree Alignment Model based on Dependency Trees

New Threat Based Chinese P2P Network

A Practical Model Blending Technique Based on Bayesian Model Averaging

Flow Identification

Network Flow-based Bipartitioning

On Distributing a Bayesian Network

A Control Flow Integrity Based Trust Model

An Agent-based Bayesian Forecasting Model for Enhancing Network Security

Network – P2P

Bayesian Network

A primer on network flow visualization

A fast identification method for P2P flow based on nodes connection degree

A Bayesian Network Model of Stromatolite Formation

A Hierarchical Bayesian Language Model based on Pitman-Yor Processes

Network Tomography Based on Flow Level Measurements

A Control Flow Integrity Based Trust Model