240 likes | 401 Views
Characterizing and Modeling Internet Traffic Dynamics of Cellular Devices. M. Zubair Shafiq 1 , Lusheng Ji 2 , Alex X. Liu 1 , Jia Wang 2 1 Michigan State University, East Lansing, MI 2 AT&T Labs – Research, Florham Park, NJ. 6/11/2011. Motivation and Objective.
E N D
Characterizing and Modeling Internet Traffic Dynamics of Cellular Devices M. Zubair Shafiq1, Lusheng Ji2, Alex X. Liu1, Jia Wang2 1Michigan State University, East Lansing, MI 2AT&T Labs – Research, Florham Park, NJ 6/11/2011
Motivation and Objective • Explosive increase in the data traffic volume over cellular networks • Subscriber growth • Increased network capacity • Increased device diversity • Improved device capabilities • Understanding the Internet traffic dynamics of cellular devices • Study large scale traffic • Study behavior of cellular devices • Study behavior of network applications • Develop predictive models
Agenda • Data • Network architecture • Data collection • Differentiating device types • Measurement • Temporal dynamics • Application usage distribution • Modeling • Aggregate model • Multi-class model • Conclusions
Architecture Overview • Cellular network: (1) radio access network, (2) core network • Mobile device connects to the network and establishes a Packet Data Protocol (PDP) context • IP Tunnel between mobile and GGSN using GPRS Tunneling Protocol (GTP)
Data Collection • Anonymized and aggregated IP traffic records from the core network (Gn links) • Data covers a state in the United States over the period of one week • Information • Traffic volume, e.g. byte, packet, flow • Application type • Device type • Refer to Erman et al., WWW, 2009 for more details about data collection
Differentiating Device Types • Type Allocation Code (TAC) in International Mobile Equipment Identifier (IMEI) number • GSM Association's TAC database contains the maker, model, version, and registration time of TAC numbers • Example IMEI: 01180800XXXXXXX • Make: iPhone • Model: 3G • Version: MB704LL • Year: 2008 • We study two popular smart phone families (A, B) and one wireless modem cards family (W)
Temporal Dynamics • Interesting trends across weekdays and weekends • Smart phone B devices are favored more by business users and smart phone A devices are popular among general consumers
Application Usage Distribution • Each device family has different traffic behaviors • Still, most top peaks in the volume distribution are for same applications mail mime www www mail mime mail mime www W A B
Diversity vs. Volume • Diversity, characterized by information entropy – higher entropy more diversity • Wireless modem W devices tend to have the highest entropy and total volume • Entropy and total volume for smart phone A devices is more than those of smart phone B devices W A B
Modeling • Aggregate Model • Traffic distribution • Temporal dynamics • Multi-class Model • Incorporate differences across device types • Cluster devices by unsupervised clustering algorithm • Based on traffic distribution • Based on temporal dynamics • Develop separate models for each cluster
Modeling Aggregate
Traffic distribution • Top 10% of the applications constitute about 99% of the flows • Highly skewed distribution • Zipf-like models, zipf with exponential cutoff, stretched exponential
Temporal dynamics • Model the temporal dynamics as a random process • Order of the random process? • Autocorrelation analysis:
Temporal dynamics • 23rd order discrete time Markov chain model • State merging (many-to-one mapping) to reduce the amount of required training data • Inaccuracies due to: • Changing device behavior • Changing device population composition
Modeling Multiclass
Multiclass: Device Clustering • Select appropriate number of clusters • Use intra-cluster distortion measure • Find the knee of the curve, gap statistic based heuristic k=3 k=3 Temporal Spatial
Multiclass: Clustering Results • Spatial feature clustering • 100 element tuple: average traffic volume per application • Temporal feature clustering • 24 element tuple: average traffic volume per hour
Multiclass: Models • Separate models for each of the three temporal and spatial clusters • 3 Zipf-like application distribution models • 3 Markov chain based temporal models • More accurate than the aggregate model Temporal Spatial
Conclusions • Analyzed Internet traffic dynamics of cellular devices in a large cellular network • Findings have implications on cellular network design, troubleshooting, performance evaluation, and optimization • Devices families have subtle differences • Devise separate billing schemes • Skewness in application usage • Manufacturers and software developers can focus on the smaller subset of high-volume applications • Diurnality in temporal dynamics • Differentiate between peak and non-peak hour usage
References • Data collection: J. Erman, A. Gerber, M. T. Hajiaghayi, D. Pei, and O. Spatscheck. Network-aware forward caching. In WWW, 2009. • Location information: Q. Xu, A. Gerber, Z. M. Mao, and J. Pang. AccuLoc: Practical localization of performance measurement in 3G networks. In ACM MobiSys, 2011. • Clustering: R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63:411-423, 2001.