520 likes | 650 Views
Dealing with P2P traffic in modern networks: measurement, identification and control. Silvio Valenti Tél é com ParisTech , France 21 September 2011. Directeur de thèse: Dario Rossi. Outline. Traffic classification State of the art Behavioral classification for P2P traffic – Abacus
E N D
Dealing with P2P traffic in modern networks: measurement, identification and control SilvioValentiTélécom ParisTech, France 21 September 2011 Directeur de thèse: Dario Rossi
Outline • Traffic classification • State of the art • Behavioral classification for P2P traffic – Abacus • Methodology • Experimental campaign • Dataset & metrics • Abacus vs KISS • Abacus & sampling • Context and motivation • P2P applications • P2P traffic diffusion • Contributions of this thesis • Traffic classification • Data reduction • Congestion control for P2P traffic • Summary and Conclusion S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
P2P applications • Client-server systems • resources on the server • contents on the server • clients exploit server resources • Peer-2-Peer systems • hosts share their resources with the others • clients talk directly to each other and collaborate • robust, scalable, autonomous • many services file-sharing, VoIP, live-streaming S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
P2P timeline PhD Thesis! 2005 1999 2000 2001 2002 2007 2008 2011 2003 2004 t Sopcast File-Sharing Live streaming VoIP uTorrent 3.0 Napster Bittorrent PPLive Joost eMule TVAnts Spotify Gnutella Kazaa Coolstreaming Kademlia Megaupload Limewire Web based Music streaming P2P inbrowsers Skype Search Chord S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
P2P traffic in modern networks Source: Ipoque, Internet studies 2008-2009 • Decline in the last few years • Video traffic (YouTube) • Web hosting (MegaUpload) • …but likely not to disappear • Absolute volume increases • Users go back to P2P [2] • New services still emerging S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • High volumes: in 2009, 40-70% of total traffic • Concerns among ISPs: especially for P2P-TV [1]
Content of this thesis 1.Traffic Classification 3.Congestion controlfor P2P 2.Datareduction ? P2P? File-sharing? Bittorrent? Sampling S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic
Content of this thesis 1.Traffic Classification 3.Congestion controlfor P2P 2.Datareduction ? P2P? File-sharing? Bittorrent? Sampling S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic
P2P Traffic classification ? S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Problem: Identify P2P traffic in the network • …to better manage it • Management: QoS, Differential queuing • Security: Intrusion detection, Lawful intercept • Technical Challenges • encryption, tunneling, proprietary protocols • CPU power
Abacus classifier S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Contribution:Abacus • Behavioral classifier tailored for P2P-TV applications • Later generalized to P2P in general • Open source demo software • Features • Behavioral approach • Based only flow-level data • counts of packets and bytes • Fine-grained classification • Robust, portable • As accurate as a payload-based classifier
Content of this thesis 1.Traffic Classification 3.Congestion controlfor P2P 2.Datareduction ? P2P? File-sharing? Bittorrent? Sampling S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic
Data reduction and classification • Sampling common practice among ISPs • reduces load, amount of data… • …and information! • Goal: is traffic classification possible • with flow-level data? (Netflow) • with flow-sampling? (routing) • with packet-sampling? • Contributions: • studied Abacus with Netflow-data and flow-sampling S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Packet Sampling Train = sampled Flow accuracy Train = unsampled Sampling step S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Studied impact of packet sampling on classification • tstatflow-monitor modified to apply sampling • different sampling policies (systematic, random…) and rates • Findings • heavy distortionno matter the policy • information content of features less impacted • classification possible when sampled data used for training (homogeneous policy)
Content of this thesis 1.Traffic Classification 2.Datareduction 3.Congestion controlfor P2P ? P2P? File-sharing? Bittorrent? Sampling S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Goal: Develop tools and protocols to help operators deal with P2P traffic
Congestion control for P2P S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Goal: a low-priority protocol for P2P applications • Requirements: • efficient use of bandwidth, detect congestion early • automatically yield to other traffic (interactive, web) • Contributions: • implemented new BitTorrent protocol (LEDBAT or uTP) • delay-based low-priority congestion control
Congestion control for P2P Proposed effective solution Verified also analytically Discovered a fairness issue Latecomer advantage S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Contributions: • Evaluated through measurements and simulation
Outline • Traffic classification • State of the art • Behavioral classification for P2P traffic – Abacus • Methodology • Experimental campaign • Dataset & metrics • Abacus vs KISS • Abacus & sampling • Context and motivation • P2P applications • P2P traffic diffusion • Contributions of this thesis • Traffic classification • Data reduction • Congestion control for P2P traffic • Summary and Conclusion S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Behavior analysis(Abacus) Deep Packet Inspection (DPI) Statistical classification Classifiers families (1) Specific Keyword Flow properties Algorithm design GET MAIL FROM: +s1 -s2 +s3 -s4 -s5 +s6 BT +s1 -s2 -s3 +s4 -s5 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Taxonomy of traffic classification S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Abacus: the idea APP1 APP2 S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Different kinds of people in a party • chat briefly with many others • talk at length with few others • …and different kinds of applications • download small pieces of video from many peers • download all video from almost the same peers • Leverage this to classify traffic • Observe a host for a given time • Count the packet received by others • What kind of application?
Phase 1 Phase 2 Phase 3 Classification process Verify Traffic (Known) Signature Decision (Operation) (Training) S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Statistical characterization of traffic • Assign traffic to the class that best fits it • Support Vector Machines (or other learning tool) • Validate the classification accuracy • Cfr with an “oracle” that knows the truth
Phase 1 Phase 2 Phase 3 Classification process Verify Traffic (Known) Signature Decision (Operation) (Training) S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Statistical characterization of traffic • Assign traffic to the class that best fits it • Support Vector Machines (or other learning tool) • Validate the classification accuracy • Cfr with an “oracle” that knows the truth
Abacus: Signature definition X Y1 Y2 Y3 Y4 Y5 Freq. 1 2 3-4 5-8 9-16 Distribution = [1, 1, 3, 0] Signature = [0.2, 0.2, 0.6] S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Procedure • Observe host X for ∆T = 5s • Count packets received from peers Yi • Divide peers in bin (exponential size) • Normalize over total number of peers • Repeat for bytes • The distribution is the Abacus signature • Pros • Only lightweight operations • No access to packet payloads • Focus on incoming traffic • more stable throughput for video
Signature comparison PPLive Tvants Pmf Pmf Time [steps of 5sec] Time [steps of 5sec] Sopcast Joost Pmf Pmf Time [steps of 5sec] Time [steps of 5sec] S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Phase 1 Phase 2 Phase 3 Classification process Verify Traffic (Known) Signature Decision (Operation) (Training) S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Statistical characterization of traffic • Assign traffic to the class that best fits it • Support Vector Machines (or other learning tool) • Validate the classification accuracy • Cfr with an “oracle” that knows the truth
Support Vector Machines Training Space of samples (dim. C) Kernel trick Space of feature (dim. ∞) Classificationdecision = S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Signatures are points in a multi-dimentional space • complex surfaces separating regions • SVM training phase • starting from a set of labeled points • kernel maps points in a higher-dimentionality space • simple hyperplanes separating points • Support Vectors individuate the planes • Decision phase • map the new sample in the higher space • label the point according to the region it falls into • Unknown traffic • rejection criterion or additional class
Phase 1 Phase 2 Phase 3 Classification process Verify Traffic (Known) Signature Decision (Operation) (Training) S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Statistical characterization of traffic • Assign traffic to the class that best fits it • Support Vector Machines (or other learning tool) • Validate the classification accuracy • Cfr with an “oracle” that knows the truth
Overview of experiments S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Dataset and metrics • Experimental results • accuracy results • portability analysis • Abacus with Netflow • Abacus in the core
Overview of experiments S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Dataset and metrics • Experimental results • accuracy results • portability analysis • Abacus with Netflow • Abacus in the core
Dataset Sopcast Bittorrent PPLive eMule Joost TVAnts Skype S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Known issues • Ground-truth vs representativeness • Our dataset • Active traces from European testbed (2008) • P2P-TV apps, 40 hosts, 26 GB of data • Reliable ground-truth • High-heterogenity (access network, location) • Passive traces from ISP, Campus (2006–2009) • Other P2P apps, and generic traffic, ~4GB of data • Ground-truth with DPI or GT[8] • Representative of generic environment
Metrics • Metrics • True Positive Rate (TPR), • percentage of traffic correctly classified • Misclassified (Mis) • percentage of traffic classified as the wrong applications • Other (Ot) • percentage of traffic classified as “unknown” • Percentage are computed… • signature-wise • related to the performance of the classification engine • byte-wise • related to the bulk of traffic (interesting for ISPs) S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Overview of experiments S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Dataset and metrics • Experimental results • accuracy results • portability analysis • Abacus with Netflow • Abacus in the core
Baseline results • TP higher than 95%in term of signature and 98% in term of bytes • Misclassification for signatures carrying less bytes • FPR for unknown traffic 0.1% • Effective rejection criterion S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Portability S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Are Abacus signature portable across… • Networks? • train on one network, test on another one • loss 6% worst case • Access technologies? • divide peers with High Bandwidth 10Mbps and ADSL 2Mbps • ok, train=HB has some difficult when test=ADSL • Channel popularity? (# of peers in swarm) • 2nd experiment with unpopular channel • problems when train=popular and test=unpopular • Time? • traces of P2P-TV from 2006 as test (train 2008) • classification possible unless software version changes
Overview of experiments S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Dataset and metrics • Experimental results • baseline results • Portability analysis • Abacus with Netflow • Abacus in the core
Abacus and Netflow (1) • ipsrc, dst • port src, dst • ip protocol • #packets • #bytes • begin, end time • … For each flow Netflowrouter Collector S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Netflowde facto standard for flow monitoring • routers exports data on flows • when flow terminates (explicitly or for timeout) • Netflow data has larger time granularities (minutes)
Abacus and Netflow (2) S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Most significant signatures are correctly classified
Abacus in the core (1) Flows seen Host1 Host2 Host3 Host4 Classifier Target host S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis Abacus needs all traffic for one host (only on the edge) In the core, it is no longer possible due to routing
Abacus in the core(2) • Abacus signature are normalized • if there is no bias in peers selection, classification possible • Randomly sampled network with rate 1/2, 1/4, 1/8 • train with unsampled,test with sampled traffic • Results • Byte and signature accuracy degrade smoothly • Test with real routing tables agrees with our experiments S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Conclusion S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • P2P has a central role in today’s Internet traffic • Operators need tools to manage such traffic • Abacus our contribution to traffic classification • behavioral classifier as accurate as payload based algorithms (byte accuracy > 98%) • portable (time, space) • robust (low false alarm rate <0.1%) • works with Netflow data • may be deployed in the core
Future work S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Behavioral classification • test Abacus with TCP and other kind of traffic • Data reduction • test abacus with packet sampling • evaluate other smart policies • evaluate portability of sampled flow records • Congestion control • evaluate LEDBAT in the real world • evaluate Bittorrent+LEDBAT in the real world • improve LEDBAT definition
References S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis X. Hei, C. Liang, J. Liang, Y. Liu, and K. W. Ross. A Measurement Study of a Large-Scale P2P IPTV System. IEEE Transactions on Multimedia, Dec. 2007. A. Finamore, M. Mellia, M. Meo, M. Munafo, and D. Rossi. Experiences of internet traffic monitoring with tstat. IEEE Network Magazine, Special Issue on Network Traffic Monitoring and Analysis, May 2011. V. Paxson. Bro: a system for detecting network intruders in real-time. Elsevier Comput. Netw., 31:2435–2463, December 1999 A. Finamore, M. Mellia, M. Meo, and D. Rossi. Kiss: Stochastic packet inspection classifier for udp traffic. IEEE/ACM Trans. Netw., 18(5):1505–1515, 2010. M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli. Traffic classification through simple statistical fingerprinting. ACM SIGCOMM Computer Communication Review, 37(1):5–16, January 2007. A. Finamore, M. Mellia, M. Meo, and D. Rossi. Kiss: Stochastic packet inspection classifier for udp traffic. IEEE/ACM Trans. Netw., 18(5):1505–1515, 2010. T. Z. J. Fu, Y. Hu, X. Shi, D.-M. Chiu, and J. C. S. Lui. PBS: Periodic Behavioral Spectrum of P2P Applications. In Proc. of PAM ’09, Seoul, South Korea, Apr 2009 F. Gringoli, Luca Salgarelli, M. Dusi, N. Cascarano, F. Risso, and k. c. claffy. GT: picking up the truth from the ground for internet traffic. SIGCOMM Comput. Commun. Rev. 39, 5 2009
Publications • S. Valenti, D. Rossi, Fine-grained behavioral classification in the core: the issue of flow sampling, In TRaffic Analysis and Classification (TRAC) Workshop at IWCMC 2011 • P. Bermolen, M. Mellia, M. Meo, D. Rossi, S. Valenti, Abacus: Accurate behavioral classification of P2P-TV traffic, Elsevier Computer Networks, 55(6):1394-1411, April 2011 • S.Valenti, D. Rossi, Identifying key features for P2P traffic classification, in IEEE ICC'11, Kyoto, Japon, June 2011 • G. Carofiglio, L. Muscariello, D. Rossi and S. Valenti, The quest for LEDBAT fairness, In IEEE Globecom'10, • A. Finamore, M. Mellia, M. Meo, D. Rossi and S. Valenti, Peer-to-peer traffic classification: exploiting human communication dynamics, In IEEE Globecom'10, Demo Session, • A. Pescape, D. Rossi, D. Tammaro and S. Valenti, On the impact of sampling on traffic monitoring and analysis, In Proceedings of the 22nd International Teletraffic Congress (ITC22), 2010. • D. Rossi, C. Testa, S. Valenti and L. Muscariello, LEDBAT: the new BitTorrent congestion control protocol, In International Conference on Computer Communication Networks (ICCCN'10) • D.Rossi, S. Valenti, Fine-grained traffic classification with Netflow data, In TRaffic Analysis and Classification (TRAC) Workshop at IWCMC 2010 • A.Finamore, M. Meo, D. Rossi, S. Valenti, Kiss to Abacus: a comparison of P2P-TV traffic classifiers, In Traffic Measurement and Analysis (TMA) Workshop at PAM'10 • D. Rossi, C. Testa, S. Valenti, Yes, we LEDBAT: Playing with the new BitTorrent congestion control algorithm, In Passive and Active Measurement (PAM) 2010 • D. Rossi, E. Sottile, S. Valenti and P. Veglia, Gauging the network friendliness of P2P applications., In SIGCOMM Demo Session, • S. Valenti, D. Rossi, M. Meo, M.Mellia and P. Bermolen, Accurate and Fine-Grained Classification of P2P-TV Applications by Simply Counting Packets, In Traffic Measurement and Analysis (TMA) Workshop at IFIP Networking'09 • S. Valenti, D. Rossi, M. Meo, M. Mellia and P. Bermolen, An Abacus for P2P-TV traffic classification, In IEEE INFOCOM 2009, Demo S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Thank you for your attention! S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Abacus and KISS 3 bit = 1 counter ID det random F1 pkt1 cb d2 ... 02 60 F1 pkt2 cc d5 ... 02 08 F2 pkt1 01 da ... 02 65 F1 pkt3 cd c0 ... 02 d9 F2 pkt2 02 c1 ... 02 5c F2 pkt3 03 dc ... 02 11 • KISS[6] recognizes protocol syntax • analyze first payload bytes • use a c2-like to recognize fields • Abacus has same accuracy • Abacus outperform KISS for computation cost S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Performance metrics S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Confusion matrix representation • Indexes • TP rate (or Recall) = TP / (TP + FN) • recognizing the application traffic • FP rate = FP / (FP + TN) • recognizing other traffic
Sensitivity S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Impact of classifier parameters • Time interval • shorter windows (1s) -> difficult • longer windows 10, 15, 30, 60 s -> similar performance • Training set size • we used 20% of dataset (4000 signatures per app) • with 300 signatures -> 10% reduction for some apps • Training set diversity • 1 or 2 peers per network is enough for a robust training • SVM Kernel and bin size • Gaussian kernel is better than linear • Exponential binning is more efficient of linear binning
Packet Sampling Systematic Random Stratified SYN Homogeneous Flow accuracy Heterogeneous Sampling step S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Studied impact of packet sampling on classification • Tstat export flow-level feature • Modified to apply sampling • Different policies and rates • Findings • heavy distortion in the measurement, no matter the policy • information content of metrics less impacted • classification possible when sampled data used for training
For R=0.5 high TPR low FPR For R~1 high TPR high FPR For R~0 low TPR low FPR Rejection criterion selection S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis
Rejection criterion Labeled as “unknown” Labeled as “green” R R Center of the class Training points New points S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis • Hyper-space is partitioned • every point is given a label • even “unknown” apps • Need a way to recognize them • Define a center for each class • Define a threshold R • d = distance between sampled and the center of the assigned class • If d > R mark the new point as unknown • Bhattacharyya distance BD • Distance between p.d.f.
Signature comparison (mean) PPLive Tvants 0.5 0.4 0.3 0.2 0.1 0.0 0.5 0.4 0.3 0.2 0.1 0.0 Pmf Pmf Bins Bins Sopcast Joost 0.5 0.4 0.3 0.2 0.1 0.0 0.5 0.4 0.3 0.2 0.1 0.0 Pmf Pmf Bins Bins S. Valenti, "Dealing with P2P traffic in modern networks", PhD Thesis