200 likes | 292 Views
Characterizing Geospatial Dynamics of Application Usage in a 3G Cellular Data Network. M. Zubair Shafiq 1 , Lusheng Ji 2 , Alex X. Liu 1 , Jeffrey Pang 2 , Jia Wang 2 1 Michigan State University, East Lansing, MI 2 AT&T Labs – Research, Florham Park, NJ. 3/28/2012. Motivation (1/2).
E N D
Characterizing Geospatial Dynamics of Application Usage in a 3G Cellular Data Network M. Zubair Shafiq1, Lusheng Ji2, Alex X. Liu1, Jeffrey Pang2, Jia Wang2 1Michigan State University, East Lansing, MI 2AT&T Labs – Research, Florham Park, NJ 3/28/2012
Motivation (1/2) • Cellular data has tripled for three years in a row (237 petabytes/month in 2010) [CISCO] • Explosive increase in the data traffic volume over cellular networks • Cellular operators have limited radio frequency spectrum • Need to optimize network planning and management to improve KPIs • How to optimize different dimensions? • Application mix? • Geo-location?
Motivation (2/2) • A typical question? “Do cell sectors see dramatically different application mixes?” • Motivation: RRC transition timers trade-off radio-resources/energy and user response time • Interactive apps like web would like longer timers, but background streaming would like shorter timers • If answer is YES, then network operators should tune cells differently
Agenda • Data • Network architecture • Data collection • Measurement • Aggregate analysis • Cell clustering • Geospatial analysis • Cluster composition analysis • Intensity function analysis • Conclusions
Architecture Overview • RNCs in radio access network control transmission scheduling and handovers • GGSN in core network anchors IP Tunnel to UE using GPRS tunneling protocol (GTP)
Data Collection (1/2) • Jointly study two anonymized data sets (1) From core network containing flow-level IP info. • Contains inaccurate location information from GTP (2) From radio network containing fine-grained location and handover information
Data Collection (2/2) • Covers a large metropolitan area in the United States over the duration of 32 hours • Applications identified using port, HTTP host, and user-agent information, other heuristics • Contains protocol, class, “app” name if from “App Store” • Not just HTTP URLs, as in prior work • Example core network record: 123456789|UserID|tcp|moviesite.com|video_streaming|12345|6 • Example radio network record: 123456789|UserID|Location
Aggregate Analysis (1/2) • Question: Do different applications enjoy same overall popularity? • Answer: No. • High volume applications have priority in network optimization web web email email streaming streaming
Aggregate Analysis (2/2) • Question: Do different application enjoy same popularity at different locations? • How do define application popularity? (Byte, flow, users?) • Answer: No. • Web is most ubiquitous, dating is most scarce Byte volume Unique user count
Cell Clustering (1/3) • Question: Every cell has a different distribution, how to conduct analysis? • Answer: Cluster application distributions of cells into a handful number of clusters • 19 element feature vector for each cell • [V1, V2, V3, …. , V19] • K-means clustering • Use Gap statistic to determine the suitable number of clusters
Cell Clustering (2/3) • Cluster centroids for byte, packet, flow, and user distributions • Helps to identify cells with distinct traffic patterns Music & audio 8% cells Web browsing 36% cells
Cell Clustering (3/3) Email 15% cells Streaming 11% cells MMS 6% cells Multiple 76% cells
Cluster Composition Analysis • Question: Do different geographical regions have different application mixes? • Suburbs have more streaming and music use • Downtown and university have more web Cluster composition analysis for byte distributions
Intensity Function Analysis (1/2) • Kernel estimated intensity function unbiased intensity estimator edge bias correction kernel function Flow Byte Packet User Intensity function for web browsing
Intensity Function Analysis (2/2) • Question: Can we identify specific geographical areas with conflicting QoS requirements? • Take differences between intensity functions to identify Music & Audio is streaming, email is best effort Music & Audio – (email + web browsing)
Summary and Implications • Application distributions significantly vary for byte, packet, flow, and user counts • Application mix significantly varies across neighborhoods (downtown, suburb, etc.) • The popularity of different applications varies even within a given neighborhood Geospatial correlations in application usage can be leveraged to optimize cellular network parameters for KPI improvement
References • Data collection: J. Erman, A. Gerber, M. T. Hajiaghayi, D. Pei, and O. Spatscheck. Network-aware forward caching. In WWW, 2009. • Location information: Q. Xu, A. Gerber, Z. M. Mao, and J. Pang. AccuLoc: Practical localization of performance measurement in 3G networks. In ACM MobiSys, 2011. • Prior Work: I. Trestian, S. Ranjan, A. Kuzmanovic, and A. Nucci. Measuring serendipity: Connecting people, locations and interests in a mobile 3G network. In ACM IMC, 2009. • Prior Work: F. P. Tso, J. Teng, W. Jia, and D. Xuan. Mobility: A double-edged sword for HSPA networks. In ACM MobiHoc, 2010. • Prior Work: M. ZubairShafiq, LushengJi, Alex X. Liu, Jia Wang, Characterizing and Modeling Internet Traffic Dynamics of Cellular Devices, In ACM SIGMETRICS, 2011. • Prior Work: U. Paul, A. P. Subramanian, M. M. Buddhikot, and S. R. Das. Understanding traffic dynamics in cellular data networks. In IEEE Infocom, 2011.