530 likes | 758 Views
Revealing Skype Traffic: When Randomness Plays with You. D. Bonfiglio 1 , M. Mellia 1 , M. Meo 1 , D. Rossi 2 , P. Tofanelli 3 Dipartimento di Elettronica, Politecnico di Torino 1 ENST T é l é com Paris 2 Motorola Inc. 3 ACM Sigcomm 2007. Presented by Te-Yuan Huang. Outline. Goal
E N D
Revealing Skype Traffic:When Randomness Plays with You D. Bonfiglio1, M. Mellia1, M. Meo1,D. Rossi2, P. Tofanelli3Dipartimento di Elettronica, Politecnico di Torino1 ENST Télécom Paris2 Motorola Inc.3 ACMSigcomm 2007 Presented by Te-Yuan Huang
Outline • Goal • Contribution • Know More about Skype • Classifiers • Experiments • Conclusions
Outline • Goal • Contribution • Know More about Skype • Classifiers • Experiments • Conclusions
Goal • Identify Skype Traffic among • aggregated traffic • Direct session • Either UDP or TCP • The algorithm should be • Work in Real-Time • Reliable • Able to detect short flows (only last several seconds)
Outline • Goal • Contribution • Know More about Skype • Classifiers • Experiments • Conclusions
Importance of Skype Traffic Identification • Interest of network operator • Network Design & Provisioning • Traffic and Performance Monitoring • Tariff Policies • Traffic Differentiation
Difference from Related Work • K.T. Chen et al.“Quantifying Skype USI” • Only identify UDP traffic • Need Skype login phase to be monitored • Fail on backbone links • Fail if any modification on Skype login proc. • K. Suh et al.“Characterizing and Detect relayed traffic: A case study using Skype” • Only identify relayed Skype traffic
Outline • Goal • Contribution • Know More about Skype • Classifiers • Experiments • Conclusions
Let’s get hands dirty – Know more about Skype traffic sources A Skype Message
Skype Parameters • Rate • Codec Rate • Delta T • Skype Message Framing Time • The time between two subsequent Skype Message • RF (Redundancy Factor) • The number of past blocks that Skype retransmits
Skype Communication Mode • End-to-End (E2E) • Skype user call Skype user • End-to-Out (E2O) • Skype-in/Skype-out • PSTN involved • Only voice data • No video / file transfer / IM
Skype Codec • Codecs • Automatically selected • ISAC • The preferred codec for E2E • G.729 • The preferred codec for E2O
More on Skype Message • Skype encrypt the message • TCP: • Reliable transport • Receive packet in correct sequence(from application layer point of view) • encrypt the whole content of the message • UDP: • Unreliable • Maybe out-of-order • Application layer header is needed • to resolve incorrect order • Only can be obfuscated • Only encrypt partial message
Byte 1 2 3 TCP E2E Message Frame • All ciphered
Byte 1 2 3 4 … UDP E2E Message ID Frame Fun • Identified Field • ID: 16-bit long identifier. • Randomly selected • Fun: 5-bit long field masked by 0x8f • Used to stating the payload type • 0x02, 0x03, 0x07,0x0f : signaling message • 0x0d : Data message (all 4 types DATA) • Not Random, but obfuscate (Mixed) • Frame: ciphered information
Byte 1 2 3 4 … E2O Message CID Frame • Identified Field • CCID: 4 bytes • Connection Identifier (CID) of PSTN gateway • Deterministic • After initial signaling
Outline • Goal • Contribution • Know More about Skype • Classifiers • Experiments • Conclusions
How to Identify Skype Traffic? • Chi-Square Classifier (CSC) • Utilize the knowledge of ciphering mechanism • Naïve Bayes Classifier (NBC) • Utilize the general characteristics of VoIP traffics • Payload-Based Classifier (PBC) • Look into the non-ciphered SoM • Only used for traffic in UDP
Chi-Square Classifier (CSC) • Purpose: • To Know whether message portion is encrypted • Rationale • Given a message, • Only the third bytes is not random • Probably, E2E Skype flow by UDP • The first four bytes are deterministic, others are ciphered • Probably, E2O Skype flow by UDP • The whole message is ciphered • Probably, Skype flow transported by TCP
Chi-Square Classifier (CSC) – Cont. • Chi-Square Distr. • Observing the objects’ ouput for nTOTtimes • There are n possible output • For ith output, it is expected to occur Ei times among nTOT, and is observed to occur Oi times • Then,is Chi-Square Distr. With n-1 degree of freedom
Chi-Square Classifier (CSC) – Cont. • For each flow, take first G group of b bits • For each group g, there are 2b possible output • If the content of the flow is random, then Eifor each group is nTOT / 2b b bits b bits b bits ….. b bits ….. 1 2 3 …… G ……
Chi-Square Classifier (CSC) – Cont. • Evaluate the test statistic as: • Define the thresholds by
Chi-Square Classifier (CSC) – Cont. • G = 16, b = 4bits are used • E2E over UDP • The block g = 5 or 6 is mixed • Others are random • Classified Criteria
Chi-Square Classifier (CSC) – Cont. • E2O over UDP • E2E or E2O over TCP • Not Skype • Otherwise
Chi-Square Classifier (CSC) – Cont. • Deterministic test satistics • Linear with nTOT
Chi-Square Classifier (CSC) – Cont. • Mixed block: • If one bit is fixed and the others are random • Linearly increase with nTOT
Chi-Square Classifier (CSC) – Cont. • Chi-Square works only if the observation is large enough, that is Ei = nTOT/2b >=5 • Namely, nTOT >= 80 • Choose nTOT = 100 • Also, set
Naïve Bayes Classifier • Feature vector x = [xi] • P{C|x} : the probability that the object is belong to class C, given the feature x is observed • P{x|C}: the probability that the feature x will be observed, given the object is belong to class C • Bayes Rule • P{C|x} = P{x|C}P{C} / P{x}
Naïve Bayes Classifier – cont. • Naïve : features are independent • P{x|C} called belief
NBC – Feature Selection • VoIP • Small Message Size • Less burstier than data traffic • Feature • Message size • Observe a window of message at a timex = [s1, s2, …, sw] • Average-Inter Packet Gap (average-IPG)
NBC – Feature Selection • Belief • How to determine • P{si|C} &
NBC – Feature Characterization • For each codec, the message size is determined by • Rate • Header length • Redundancy factor (RF) • Message framing time (delta T) • The message size can be represented by Gaussian distribution
NBC – Feature Characterization • Map each codec to a Gaussian distr. • Model average-IPG to a Gaussian distr. with For Constant Bit Rate Codec For variable Bit Rate Codec
NBC – Make Decision • Let • Define a threshold Bmin • If B > Bmin • Valid Skype flow • Otherwise • Not Skype flow
Payload Based Classifier (PBC) • Used as cross check for previous two classifier • Only useful for UDP traffic • Two Part • Per-flow Identification • Per-host Identification
PBC - Per-flow Identification Utilize the knowledge about UDP E2E Message • Fun: 5-bit long field masked by 0x8f • Used to stating the payload type • 0x02, 0x03, 0x07,0x0f : signaling message • 0x0d : Data message (all 4 types DATA) Byte 1 2 3 4 … ID Frame Fun
PBC - Per-flow Identification • Terminology • nTOT: the total number of packets in the flow • nsig: the number of Skype signaling message • nE2E: the number of Skype E2E data/video/chat/voice message • nE2O: the number of Skype E2O voice message
PBC - Per-flow Identification • Criteria
PBC - Per-host Identification • Known: a Skype client always uses the same UDP port to send/receive traffic • Before start conversation, • Signaling messages are sent between two clients • Able to identify a Skype client running at a specific IP and port
PBC - Per-host Identification • Criteria to identify the Skype client IP/port
Experiment • Two Data Set • Campus – 95 hours took on 2006/5/29 • No P2P traffic is allowed • Most traffic are TCP data flows • ISP – one day took on 2006/5/15 • All traffic is allowed • More heterogeneous • Expect little Skype traffic