Profiling Internet Backbone Traffic: Behavior Models and Applications

Profiling Internet Backbone Traffic: Behavior Models and Applications Kuai Xu, Zhi-Li Zhang (Univ. of Minnesota) Supratik Bhattacharyya (Sprint ATL) SIGCOMM 2005

Outline • Introduction • Background: Entropy & Relative Uncertainty (RU) • Significant Clusters Extraction • Cluster Behavioral Classification • Structural Models (Cluster Behavioral Interpretation) • Applications • Conclusion & Comments Speaker: Li-Ming Chen

Introduction • Why profile traffic? • Changes in Internet traffic dynamics • Increase in unwanted traffic • Wide diversity of end-hosts, applications and services • New services on traditional ports • Traditional services on non-standard ports • Existing tools • Port-based, volume-based, content-based • Need better techniques to discover behavior patterns (especially interesting behavior..) Speaker: Li-Ming Chen

Problem Settings • Problems • How to characterize communication patterns? • Are these patterns meaningful? • How to automatically discover such patterns? • Challenges • Vast amount of traffic data • Large number of end hosts • Diverse applications • A more specific problem settings • Use one-way traffic data from single backbone link • Use only packet header information • No assumption of normal (or anomalous) behavior Speaker: Li-Ming Chen

Objectives • Develop a general methodology for profiling Internet backbone traffic • Automatically discovers significant behaviors of interest from massive traffic data • Automatically interpret these behaviors • Easy to understand • Quickly to identify anomalous events of significance • Help network operators secure and manage networks Speaker: Li-Ming Chen

Methodology raw packets flows clusters clusters • Data pre-processing • Aggregate packets into 5-tuple flows • Group flows into clusters • Extract significant clusters • Data reduction step using entropy • Classify cluster behavior • Based on similarity/dis-similarity of communication patterns • Clusters classified into behavior classes (BCs) • Interpret behavior classes • Structural modeling for dominant activities clusters clusters clusters clusters clusters clusters clusters clusters clusters BC2 BC1 clusters clusters clusters dstPrt(.) → (srcPrt(*), dstIP(*)) [Scanning attacks] srcPrt(.) → dstIP(…) → dstPrt(*) [server replying to a few hosts] Speaker: Li-Ming Chen

Datasets (& data pre-processing) • Collect packet header traces from multiple backbone links in a large ISP network (Sprint) • Aggregate packets header traces into 5-tuple flows • Group flows associated with same end hosts/ports into clusters • 4-feature space: srcIP, dstIP, srcPrt, dstPrt Speaker: Li-Ming Chen

Entropy • Assume a random variable X has NX discrete values • Suppose we randomly sample X for m times • An empirical probability distribution on X: • The (empirical) entropy of X: • Max. Entropy of X: Speaker: Li-Ming Chen

Relatively Uncertainty (RU) • RU of X: • Provide an index of variety or uniformity regardless of the support or sample size • RU(X) -> 0, X is deterministic • Most of the observations of X are of the same kind • RU(X) -> 1, X is randomly distributed • All observation of X are different or unique • Nearly indistinguishable.. when m < NX Speaker: Li-Ming Chen

Extract Significant Clusters • Focus on significant clusters • Sufficiently large amount of flows • Represent behavior of significant interest • (One definition) using a fixed threshold • A clustering is significant if containing at least x% of flows • How to choose “x” for all links? • (Authors’ definition)adaptive thresholding using RU • A cluster is significant if “stand out” from the rest • Use RU to quantify whether the rest looks random! Speaker: Li-Ming Chen

Extract Significant Clusters (an example) • S is a subset of A, • say S contains the most significant values of A if S is the smallest subset of A such that: • the prob. of any value in S is larger than those of the remaining values • the (conditional) prob. distribution on the set of the remaining values R := A – S, is close to being uniformly distributed • An efficient approximation algorithm is presented.. β = 0.9 Speaker: Li-Ming Chen

An Approximation Algorithm(for significant clusters extraction) (e.g., α0 = 2%) Feature values of A is ordered based on their prob., PA(a1) > PA(a2) > … End at largest “cut-off” threshold, The remaining R is close to uniformly distributed.. Speaker: Li-Ming Chen

Clusters (Significant vs. Total) srcIP dimension dstIP dimension clusters are extracted in every 5-minute time slot. srcPrt dimension dstPrt dimension Speaker: Li-Ming Chen

Clusters (Significant vs. Total) - Cut-off threshold in the Approx. Algo. srcIP dimension dstIP dimension srcPrt dimension dstPrt dimension Speaker: Li-Ming Chen

Summary: Significant Clusters • (Observation) Behavior changes: • While the total number of distinct values (clusters) may not fluctuate very much, the number of significant feature values (clusters) may vary dramatically. • Also result in different cut-off threshold being used • The dramatic changes in the number of significant clusters also signifies major changes in the underlying traffic patterns Speaker: Li-Ming Chen

Behavior characterization • The flows in each cluster share the same cluster key (i.e., srcIP) • The other 3 “free” dimensions can take any possible value (exhibit some behaviors) • RU vector [RUX, RUY, RUZ](3 free dim.) • (e.g.) RU vector of a srcIP cluster is • [RUsrcPrt, RUdstPrt, RUdstIP] (one-hour) low medium high srcPrt dstPrt srcIP (multi-modal) Speaker: Li-Ming Chen

Behavior classifications • Group clusters into similar behaviors (RU vector) • [L(RUX), L(RUY), L(RUZ)] {0, 1, 2}3 27 possible BCs Speaker: Li-Ming Chen

27 Behavior Classes • What is the difference between BCs? • Are there common vs. rear BCs? • Are BCs have many or a few clusters? • Are membership in BCs stable? • Temporal properties of BCs (the metrics): • Popularity: number of times we observe a particular BC appearing • (Avg.) Size: avg. number of clusters belonging to a given BC • (Membership) Volatility: does a BC contain the same clusters over time? Speaker: Li-Ming Chen

BC2 BC20 num. of Unique Clusters BC2 BC20 High Volatility (BC2, BC20) (24-hour) Speaker: Li-Ming Chen (24-hour)

How about Individual Clusters? • Behavior characteristics of individual clusters over time (Dynamic or Stable ?) • The relation between the frequency of a cluster and the BCs it appears in • The behavior stability of a cluster if it appears multiple times • Whether a cluster tends to re-appear in the same BC or different BC’s? Speaker: Li-Ming Chen

Behavior of Individual Clusters (heavy-tailed distrbution) Most frequent clusters all fall into the five popular but non-volatile BCs. (BC6, BC7, BC8, BC18, BC19) Majority of the least frequent clusters belong to BC2 and BC20 (log-log scale) A2: few behavior transitions & most of the behavior transitions are between akin BSs. 89.6% Cluster ID ordered based on its frequency 90.3% Speaker: Li-Ming Chen

Summary: Behavior Classifications • Behavior classes classify similar clusters based on communication patterns • Behavior classes have distinct temporal properties • Popularity, avg. size and membership volatility • Clusters in general evince consistent behavior over time • How can we interpret observed behavior ?? Speaker: Li-Ming Chen

Dominant State Analysis • Each cluster has hundreds or thousands of flows • An exhaustive approach is not practical • Need a compact summary • Dominant State Analysis • Explore dominant activities of the clusters • Observation: • Clusters within the same BCs have similar structural models • They could have different dominant state (or activities) Speaker: Li-Ming Chen

How? (Structural Modeling) • An Example: (A web server from srcIP perspective) • Re-order the 3 free dimensions based on their RU values (i.e., A<B<C) • RUsrcPrt < RUdstIP < RUdstPrt • Find substantial values in A, B and C hierarchically (conditionally) • srcPrt 80 has 95% • srcPrt 80, dstIP 1 has 50 % • srcPrt 80, dstIP 1, dstPort 1025 has x%... clusters srcPrt 443 srcPrt 80 5% 95% dstIP … dstIP 1 50% <1% dstPrt 1025 dstPrt … <1% …% Speaker: Li-Ming Chen

Dominant State for srcIP ‧ specific value … multiple values ＊ any (large number of the target) Speaker: Li-Ming Chen

Canonical behavior profiles • Large majority of the (significant) clusters fall into three canonical profiles: (variability) (avg. flow sizes per cluster) [0,2,x] [2,0,x] [2,0,x] [0,2,x] Speaker: Li-Ming Chen [x,0,2]

Deviant or Rare Behaviors • Building a comprehensive traffic profile can also lead to the identification of possible deviant behaviors • Clusters in rare behavior classes • e.g., dstPrt BC15 [1,2,0] -> DDoS • Behavior changes for clusters • e.g., srcIP (a Yahoo web server) BC8 -> BC6 -> BC8 • Unusual profiles for popular service ports • Clusters associated with common service ports should follow their canonical profiles.. Speaker: Li-Ming Chen

Conclusion • Develop a systematic methodology to automatically discover and interpret communication patterns • Use information-theoretical techniques to build behavior models of end hosts and applications • Apply dominant state analysis to explain traffic behavior • Discover typical behavior profiles as well as rare and deviant behaviors Speaker: Li-Ming Chen

Comments • Observe the behavior in different points of view.. • Flow (source - destination) • Connection (initiator/requester – replier/responder) • Connection-level statistics • Lack of P2P application analysis • Hard to choose the observation period for generating and analyzing clusters • Tradeoff between the timeliness and data size • Correlating behavior profiles across multiple backbone links Speaker: Li-Ming Chen

Profiling Internet Backbone Traffic: Behavior Models and Applications

Profiling Internet Backbone Traffic: Behavior Models and Applications

Presentation Transcript

Tor: The Second-Generation Onion Router

Introduction to Generalized Linear Models

Disentangling Age-Period-Cohort Effects: New Models, Methods, and Empirical Applications

Program Planning: Models and Theories

Measurement, Modeling, and Analysis of the Internet: Part II

Chapter 19

(Hidden) Information State Models

Tutorial 2: QSAR modeling of compounds profiling

Traffic Flow models for Road Networks

Models of Organizational Behavior

Microscopic Behavior of Internet Control

Models of Work Motivation

Internet and Java Foundations, Programming and Practice

Applications of non-equilibrium models in biological systems

Fig. 33-1

SHIKANSEN

Models of Work Motivation

Air Traffic Control

ANIMAL BEHAVIOR

Two Internet Marketing Services You Definitely have to Use