200 likes | 325 Views
Mixture Models of End-host Network Traffic John Mark Agosta, Jaideep Chandrashekar , Mark Crovella , Nina Taft and Daniel Ting Toyota-ITC, Technicolor, Boston U., Technicolor, Facebook. Outline.
E N D
Mixture Models of End-host Network Traffic John Mark Agosta, JaideepChandrashekar, Mark Crovella, Nina Taftand Daniel Ting Toyota-ITC, Technicolor, Boston U., Technicolor, Facebook
Outline • We collected traffic at the end-host; something rarely monitored. • Conventional distributions don’t fit heavy tailed data • The dense part of the distribution doesn’t look Pareto, & just fitting the Pareto tail doesn’t describe the data. • Fit by mixture models – but not the typical Gaussian mixtures – of a Pareto tail with exponentials as a proxy for the dense part. • Model Selection – best number of components constrained by complexity penalty & returns a model of the entire distribution. • Uses: • Better tail parameter estimates than conventional measures. • Soft clustering – assign traffic to exponential v/s Pareto components, by protocol • More stable threshold setting
Data collection effort End-host flows: • Collected at Laptop network port • Collection moved around with device • Assembled from packet trace headers • On enterprise XP build • Periodic server uploads • Logged with user & CPU activity, to eliminate off periods. Data Sets: • 270 personal machine data sets • 90% laptops • 5 week duration • 400G raw data, total. • Flow initiation counts are binned • in intervals from 4 to 512 seconds • Removed zero-count intervals • Median sample 9800 points • Max sample size 264k
Heavy tailed data is extremely wide compared to conventional distributions. • Fitting any exponential family distribution (e.g. Gaussian, Poisson…) fails. Any exponential tail is too steep. • Fitting a mixture of exponential families requires an impractical number of components. • But just fitting the power law tail ignores most of the probability mass Best fit normal
Heavy tailed data is extremely wide compared to conventional distributions. • Fitting any exponential family distribution (e.g. Gaussian, Poisson…) fails. Any exponential tail is too steep. • Fitting a mixture of exponential families requires an impractical number of components. • But just fitting the power law tail ignores most of the probability mass Best fit normal
The distribution looks like an exponential above and a power law below Exponential fit Power law fit bad fit good fit bad fit good fit
Exponential – Pareto mixture models. • A mixture model is a hierarchical model where the mixing weights determine the probability of each of the component models, which in turn generate the sample points. • Since all components share the same support, any sample point could in principle have been generated by any component, by its mixing probability. We consider three models: • Pareto: One power-law component • Exponential – Pareto: One of each • 2 Exponentials, one Pareto: • Any more exponential components cannot be resolved.
By modeling the entire data set, mixture models give more accurate tail α-parameter estimates than methods that consider only the tail data. When tested on synthetic Pareto-tailed data, EP mixture model estimator performs significantly better than the well-known AEST method. (AEST estimates are shown on the left, and EP-based estimates on the right in each pane.)
Model Selection versus Goodness-of-Fit • Goodness-of-fit tests, while useful for initial characterization, don’t have an explicit acceptance criterion, and, as data set size increases, will eventually reject all models. • A Model selection is a relative, pairwise criterion that derives from comparison of likelihoods. • We use the Bayes Information Criterion to approximate the Bayes Factor terms. It penalizes the maximum likelihood by the model degrees of freedom, d, so that models of different number of parameters can be compared. • With the BIC approximation, the log Bayes Factor becomes • The Bayes Factor is the ratio of the marginal likelihood of one model (EP) to another (P). For instance a log Bayes Factor of 5 indicates the probability of the data given one model versus the other is over a 100:1.
Pairwise BIC comparisons of the reveal large log BF values for EP vs P and smaller values for EEP vs EP Boxplot of BIC comparison for Pareto vs. EP Mixture Model. Boxplot of BIC comparison for EP vs. EEP Mixture Model. EEP EP EP P
Model Selection Results EEP EP Model selection results based on Bayes Factors, over all users. Each bar represents the same user set with a different binning time window. For the P, EP, and EEP models -- • P: Only a handful of users are given the Pareto-only model, • EP: Overall, the EP model is selected for 50-85% of the users, depending upon the bin size, and • EEP: Between 15%-40% of user machines are best modeled by EEP, again depending upon the bin size. P
Histograms of Heavy-Tail Parameters’ Variation, EP Model. • The difference across users is significant.
Partitioning traffic into Exponential and Pareto ranges Mixture fractions as a function of connections indicate (soft) membership of the data into a component. In this example, bins with less than 82 counts are almost entirely exponential, and those with greater than 82, almost entirely Pareto. This way different sources of the traffic can be characterized as heavy-tailed or not. Mixture Fractions, User 256 mPareto mexp P(traffic)
Traffic Fractions, in Exponential and Pareto Components, by Protocol Although Exponential traffic dominates in all cases, the long tail (i.e. Pareto) traffic appears largely from bursts of ICMP, DNS and web traffic flows.
In summary 1. We have modeled • traffic as flow initiations from end hosts in an enterprise, • using mixture models, • employing model selection. 2. We have discovered • Strong evidence that the traffic, is almost always heavy-tailed, • with the Pareto component contributing about 1/4 of the probability mass. • and with power law scaling parameter with mean = 1.6 that varies widely, between 1.0 and 2.0. 3. Apparently DNS, ICMP and some web traffic make up the tail component. See the full paper at http://arxiv.org/abs/1212.2744
Anomaly thresholds derived from models are more stable than empirical thresholds.
Component parameters are independent This implies that the exponential and Pareto components are generated by separate sources.