1 / 50

Internet Traffic Classification KISS

Internet Traffic Classification KISS. Dario Bonfiglio, Alessandro Finamore, Marco Mellia , Michela Meo, Dario Rossi. Traffic Classification & Measurement. Why ? Identify normal and anomalous behavior Characterize the network and its users Quality of service Filtering … How?

kathie
Download Presentation

Internet Traffic Classification KISS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Internet Traffic ClassificationKISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi

  2. Traffic Classification & Measurement • Why? • Identify normal and anomalous behavior • Characterize the network and its users • Quality of service • Filtering • … • How? • By means of passive measurement • UsingTstat

  3. http://tstat.tlc.polito.it External Servers Internal Clients Edge Router Tstat • Traffic classifier • Deep packet inspection • Statistical methods • Persistent and scalable monitoring platform • Round Robin Database (RRD) • Histograms

  4. Tstat at a Glance

  5. Worm and Viruses? Did someone open a Christmas card? Happy new year to Windows!!

  6. Anomalies (Good!) Spammer Disappear McColoSpamNet shut off on Tuesday, November 11th, 2008

  7. New Applications – P2PTV Fiorentina 4 - Udinese 2 Inter 1 - Juventus 0

  8. Traffic classification Look at the packets… Internet Service Provider Tell me what protocol and/or application generated them

  9. It fails more and more: P2P Encryption Proprietary solution Many different flavours Typical approach: Deep Packet Inspection (DPI) Skype Bittorrent ? ? Port: Port: Internet Service Provider ? Payload: “bittorrent” Payload: Gtalk eMule ? ? Port: Port: 4662/4672 Payload: Payload: E4/E5 RTP protocol

  10. The Failure of DPI 11.05.2008 12:29 eMule 0.49a released 1.08.2008 20:25 eMule 0.49b released

  11. Possible Solution: Behavioral Classifier Phase 3 Phase 1 Phase 2 Verify Traffic (Known) Feature Decision (Operation) (Training) Statistical characterization of traffic (given source) Look for the behaviour of unknown traffic and assign the class that better fits it Check for possible classification mistakes

  12. OurApproach Phase 3 Phase 1 Phase 2 Verify Traffic (Known) Feature Decision • Statistical characterization of bits in a flow • Do NOT look at the SEMANTIC and TIMING • … but rather look at the protocol FORMAT • Test c 2

  13. Chunking and Expected distribution (uniform) Observed distribution UDP header First N payload bytes C chunks Each of b bits Vector of Statistics c c c c 2 2 2 2 [ ] The provides an implicit measure of entropy or randomness , … , 1 C

  14. Consider a chunk of 2 bits: and different beaviour Random Values Deterministic Value Counter Oi 0 1 2 3 0 1 2 3 0 1 2 3

  15. x x x x 4 bit long chunks: evolution random c 2

  16. Deterministic 0 0 0 1 4 bit long chunks: evolution random c 2

  17. x 0 0 0 x 0 x 0 0 x x x 4 bit long chunks: evolution deterministic mixed random c 2

  18. Chi Square Classifier • Split the payload into groups • Apply the test on the groups at the flow end: each message is a sample • Some groups will contain • Random bits • Mixed bits • Deterministic bits 0 8 16 24 --------------------- | ID | FUNC | ---------------------

  19. CSC

  20. 2 byte long counter MSG L2 L1 LSG Most Significant Group Less Significant Group And the counter example?

  21. Protocol format asseenfrom the c 2

  22. OurApproach Phase 3 Phase 1 Phase 2 Verify Traffic (Known) Feature Decision • Statistical characterization of bits in a flow • Test • Decision process • Minimum distance / maximum likelihood c 2

  23. C-dimensionspace [ ] , … , j 1 C Iperspace Class Classification Regions ? My Point Class Euclidean Distance Support Vector Machine c c c c 2 2 2 2 i

  24. Example considering the c 2

  25. Euclidean Distance Classifier j • Centroid • Center of mass c c 2 2 i

  26. Euclidean Distance Classifier j • Centroid • Center of mass True Negative Are “Far” True Positives Are “Nearby” c c 2 2 i

  27. Euclidean Distance Classifier j • Centroid • Center of mass False Positives • Iper-sphere c c 2 2 i

  28. Euclidean Distance Classifier j • Centroid • Center of mass • Iper-sphere False negatives • Radius c c 2 2 i

  29. Euclidean Distance Classifier j • Centroid • Center of mass • Iper-sphere • min { False Pos. } • min { False Neg. } • Confidence • The distance is a measure of the condifence of the decision c c 2 2 i

  30. How to define the sphere radius? True Positive – False positive Radius

  31. Support Vector Machine • Kernel functions • Move point so that borders • are simple Space of samples (dim. C) Kernelfunction Space of feature (dim. ∞)

  32. Support Vector Machine Support vectors • Kernel functions • Move point so that borders • are simple • Borders are planes • Simple surface! • Nice math • Support Vectors • LibSVM Support vectors

  33. Support Vector Machine • Kernel functions • Borders are planes p (  class ) • Simple surface! • Nice math • Support Vectors • LibSVM • Decision • Distance from the border • Confidence is aprobability

  34. OurApproach Phase 3 Phase 1 Phase 2 Verify Traffic (Known) Feature Decision • Statistical characterization of bits in a flow • Test • Decision process • Minimum distance / maximum likelihood c 2 • Performance evaluation • How accurate is all this?

  35. Per flow and per endpoint • What are we going to classify? • It can be applied to both single flows • And to endpoints • It is robust to sampling • Does not require to monitor all packets, not the first packets

  36. Realtraffictraces Internet Trace 1 day long trace RTP eMule DNS other Other Unknown Traffic Oracle (DPI + Manual ) 20 GByte di UDP traffic Training • Known + Other False Negatives • Known Traffic False Positives • Unknown traffic Fastweb

  37. Definition of false positive/negative DNS Traffic Oracle (DPI) eMule RTP Other Classifing “known” Classifing “other” KISS KISS true negatives true positives false negatives false positives

  38. Results Euclidean Distance SVM Known traffic (False Neg.) [%] Other (False Pos.) [%]

  39. Real traffic trace RTP errors are oracle mistakes (do not identify RTP v1) DNS errors are due to impure training set (for the oracle all port 53 is DNS traffic) EDK errors are (maybe) Xbox Live (proper training for “other”) FN are always below 3%!!!

  40. Tuning trainset size True positives Small training set For “known”: 70-80 Mbyte For “other”: 300 Mbyte % (confidence 5%) False positives Samples per class

  41. Tuning num of packets for True positives Protocols with volumes at least 70-80 pkts per flow % False positives (confidence 5%) c 2 packets

  42. P2P-TV applications • P2P-TV applications are becoming popular • They heavly rely on UDP at the transport protocol • They are based on proprietary protocols • They are evolving over time very quickly • How to identify them? • ... After 6 hours, KISS give you results

  43. The Failure of DPI

  44. And for TCP?

  45. Chunking and Expected distribution (uniform) Observed distribution TCP UDP First N payload bytes C chunks Each of b bits Vector of Statistics c c c c 2 2 2 2 [ ] The provides an implicit measure of entropy or randomness , … , 1 C

  46. Results

  47. Results

  48. Pros and Cons • KISS is good because… • Blind approach • Completely automated • Works with many protocols • Works even with small training • Statistics can start at any point • Robust w.r.t. packet drops • Bypasses some DPI problems • but… • Learn (other) properly • Needs volumes of traffic • May require memory (for now) • Only UDP (for now) • Only offline (for now)

  49. Papers • D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli “Revealing skype traffic: when randomness plays with you”, ACM SIGCOMM, Kyoto, JP, August 2007 • D. Rossi, M. Mellia, M. Meo, “A Detailed Measurement of Skype Network Traffic”, 7th International Workshop on Peer-to-Peer Systems (IPTPS '08), Tampa Bay, Florida, February 2008 • D. Bonfiglio, M. Mellia, M. Meo, N. Ritacca, D. Rossi, “Tracking Down Skype Traffic”, IEEE Infocom, Phoenix, AZ, 15,17 April 2008 • D. Bonfiglio, M. Mellia, M. Meo, D. Rossi Detailed Analysis of Skype Traffic IEEE Transactions on Multimedia "1", Vol. 11, No. 1, pp. 117-127, ISSN: 1520-9210, January 2009 • A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st Traffic Monitoring and Analysis (TMA) Workshop Aachen, 11 May 2009

  50. And for TCP

More Related