1 / 24

Internet Traffic Classification Using Bayesian Analysis Techniques

Overview. Statistical MethodUses Supervised Machine learning Uses only flow recordsBased on descriminators of the flows - port, inter-packet gap etc

bob
Download Presentation

Internet Traffic Classification Using Bayesian Analysis Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Internet Traffic Classification Using Bayesian Analysis Techniques Presentation by Umamaheswararao K

    2. Overview Statistical Method Uses Supervised Machine learning Uses only flow records Based on descriminators of the flows - port, inter-packet gap etc Applies Nave Bayesian techniques Reasonably high accuracy

    3. Machine Learned Classification Deterministic Approach Assigns data points to one of mutually exclusive classes Probabilistic Approach assigns the flow with probabilties of belonging to certain class - Current technique falls into this category

    4. Probabilistic Approach: Can Identify similar Characteristics of flows after their probabilistic class assignment Robust to measurement error Provides a mechanism for quantifying class assignment probabilities Available in many implementations

    5. Terminology Objects: Entities to be classfied here traffic-flows which is a tuple of src/dst IP, protocol, src/dst port Discriminators: Characteristics parameterizing the flow behaviour flow duration, TCP port etc - Here only complete TCP connections are considered

    6. Discriminators/Categories

    7. Analysis Tools Nave Bayesian Classifier

    8. Bayes Tech: Contd.. Assumptions Discriminators Independent TCP header length proportional to pak len or vice versa Discriminator distribution is assumed to be normal (Gaussian) - Distribution can be multimodal

    9. Example

    10. Example: contd

    11. Nave Bayes: Kernel Estimation Descriminator distribution is not Gaussian

    12. Nave Bayes vs Kernel

    13. Descriminator selection Remove Irrelevant descriminators Cannot differentiate the class Same distribution for all classes Remove Redundant descriminators highly correlated with another discriminator

    14. Descriminator reduction: Filter -Uses characteristics of training data to see how relevant the descriminator to the class - degree of correlation b/w discriminator & class Wrapper -uses results of a classifier to build optimal set

    15. FCBF Fast-correlation based filter for discriminator filtering Two stage process Identifying the relevance of a discriminator Identifying the redundancy of a feature with respect to discriminators

    16. Results

    17. Results: contd.. Accuracy: Correctly classified flows/Total number of flows Trust: Probability that a flow that has been classified into some class in fact from this class

    18. Nave Bayes- Trust

    19. Trust: Kernel est.

    20. Results for new data set

    21. Identification of discriminators

    22. Strengths Payload access not needed Some mentioned in earlier slides High accuracy and Trust with FCBF Easily implementable Single flow based (a strength and a weakness) Allows any categorization

    23. Weaknesses Bunch of them but then ? Accuracy/Trust depends mainly on how good the training set is Trust of some classes is really poor works on flow based, characterization some flows require to see many flows (eg. Attacks) Temporal stability is not really good Discriminators are dependent on network dynamics

    24. Weaknesses: Contd Training is not automatic Assumes discriminator independence Gaussian distribution assumption inaccurate

    25. Future Work A significantly new approach hence can lead to many ideas Spatial independence of traffic classification Check from weaknesses section

More Related