140 likes | 305 Views
Lightweight Application Classification for Network Management. Hongbo Jiang Case Western Reserve University. Andrew W. Moore University of Cambridge. Zihui Ge Adverplex Inc. Shudng Jin Case Western Reserve University. Jia Wang AT&T Labs - Research.
E N D
Lightweight Application Classification for Network Management Hongbo Jiang Case Western Reserve University Andrew W. Moore University of Cambridge Zihui Ge Adverplex Inc. Shudng Jin Case Western Reserve University Jia Wang AT&T Labs - Research ACM SIGCOMM Workshop on Internet Network Management (INM) Kyoto, Japan, August 31, 2007
Why do Network Traffic Classification? • Network planning • Traffic engineering • Accounting and billing • Security profiling • …
Our Contribution • A lightweight application classification scheme based on NetFlow data • Evaluation & Sensitivity Analysis • Trivial features • Derivative features • Training-set size • Packet sampling
Flow-level Traffic Classification • Previous traffic classification use features derived from streams of packets • Can achieve good accuracy (e.g., 95%) • Have high complexity and cost • Commonly available flow-level statistics (Cisco NetFlow, Juniper cflowd, Huawei NetStream,…) • Sampling further reduces the cost
Class of membership Prior Probability box Object Characteristics In Training In Training Pr = .15 In Use Object Characteristics ? Probability box Prior In Use Pr = .33 Probability of membership (estimate of membership) Probabilistic Method Example Training Set Pr = .97
Our Approach (cont.) • Features ranked by importance • Use Symmetric Uncertainty (based on entropy) (See paper and references therein for details.) Ranked features allows for a • sensitivity analysis, and the • removal of irrelevant and redundant features.
Evaluation • Dataset (not from AT&T!) • Full-duplex 1Gbps access-link; 1000 researchers • Data was hand-classified into a number of application classes: e.g. web-browsing, email, FTP, attack, P2P, … • Focused on TCP/IP flows only • 800,000 simplex TCP/IP application-level flows (97% of traffic by byte-volume) • Netflow Generation • Software simulation of Cisco NetFlow v5 engine • Independent training and test sets • Flows randomly assigned to each
Baseline and Derivative Features Comparison: Port based: 50-70%, Packet based: 95%
Highly Relevant Features Refers to specific privileged services and protocols Differentiate Email and FTP from Web-browsing Compact features
Reducing Feature Complexity Runtime: 600x (s) Runtime: 1x (s) Accuracy remains high even after removing irrelevant and redundant features.
Reducing Training SetSize More features may lead-to noise (insufficiently representative)
Impact of Packet Sampling • NetFlow characteristic: Observed flow-count will decrease as sampling rate decreases Packet sampling has little impact on accuracy
Conclusion & Future Works • Conclusion • Application Classification can be done with Flow-level (NetFlow) information • Trivially-derived features improve accuracy • Packet sampling have minimal impact • Future works • NetFlow v9?? • Other M-L methods?