Accurate & scalable models for wireless traffic workload

COST-ACTION: TMA meeting @ Samos’08 Accurate & scalable models for wireless traffic workload Maria Papadopouli Assistant Professor Department of Computer Science, University of Crete & Institute of Computer Science, Foundation for Research & Technology-Hellas (FORTH) Joint research with: F. Hernandez-Campos, M. Karaliopoulos, H. Shen, E. Raftopoulos 1IBM Faculty Award, EU Marie Curie IRG, GSRT “Cooperation with non-EU countries” grants

Wireless landscape • Growing demand for wireless access • Mechanisms for better than best-effort service provision • Performance analysis of these mechanisms  Typically using simplistic traffic models • Empirically-based measurements impel modeling efforts to produce more realisticmodels  Enable more meaningful performance analysis studies

1 2 3 0 Wireless infrastructure Internet disconnection Router Wired Network Switch AP3 Wireless Network User A AP 1 AP 2 roaming roaming User B Associations Flows Packets

Dimensions in modeling wireless access • Intended user demand • User mobility patterns • Arrival at APs • Roaming across APs • Duration the user is connected to an infrastructure • Link conditions • Network topology

Internet disconnection Wired Network Router Switch AP3 Wireless Network User A AP 1 AP 2 Events User B Session 1 2 3 0 Flow Arrivals t1 t2 t3 t4 t5 t6 t7 time

Our parameters and models

Wireless infrastructure & acquisition • 26,000 students, 3,000 faculty, 9,000 staff in over 729-acre campus • 488 APs (April 2005), 741 APs (April 2006) • SNMP data collected every 5 minutes • Several months of SNMP & SYSLOG data from all APs • Packet-header traces: • Two weeks (in April 2005 & April 2006) • Captured on the link between UNC & rest of Internet via a high-precision monitoring card

determines spatial & temporal scale Modeling process • Selection of models (e.g., various distributions) • Fittingparameters using empirical traces • Evaluation and comparison of models • Visual inspection e.g., CCDFs & QQ plots models vs. empirical data • Statistical-based criteria e.g., QQ/simulation envelops, statistical tests • Systems-based criteria • Validationof models • Generalization of models

Modeling in various spatio-temporal scales Objective Scales • Tradeoff with respect to accuracy, scalability, reusability & tractablity

Synthetic trace generation

Simulation/Emulation testbed • TCP flows • UDP • Wired clients: senders • Wireless clients: receivers

User D User C User B Simulation & emulation testbeds Internet Router Wired Network AP3 Switch Wireless Network User A AP 1 AP 2 Assign traffic demand Scenario of wireless access Scenario: User A generates a flow of size X @ T1 User B generates a flow of size Y @ T2 ▪ ▪ Various traffic conditions

Main results • Accurate and scalable models of wireless demand • Same distributions/models persist: • over two different periods (2005 and 2006) • over two different campus-wide infrastructures • over heavy & normal traffic conditions @ AP • using statistical- & systems-based metrics  Empirical traces used as “ground truth” for the comparison with synthetics traces based on various models

Main results (con’t) Accuracy: • our models perform very close to the empirical traces • popular models deviate substantially from the empirical traces Scalability: • same distributions at various spatial & temporal scales • group of APs per bldg addresses scalability-accuracy tradeoffs • Application mix of AP traffic • mostly web: very accurate models • both web & p2p : models are ok • mostly p2p: larger deviations from empirical data

In progress … • Improve modeling of non-web traffic • Client profiling • Impact of underlying network conditions on application and usage patterns • Evaluate the performance of AP or channel selection, load balancing & admission control protocols under real-life traffic conditions • Mesh testbed • Heterogeneous wireless networks

UNC/FORTH web archive  Online repository of models, tools, and traces • Packet header, SNMP, SYSLOG, synthetic traces, … http://netserver.ics.forth.gr/datatraces/  Free login/ password to access it  Simulation & emulation testbeds that replay synthetic traces for various traffic conditions Mobile Computing Group @University of Crete/FORTH http://www.ics.forth.gr/mobile/  maria@csd.uoc.gr

Hourly aggregate throughput FLOWSIZE—FLOWARRIVAL EMPIRICAL Impact of flow size Fixed flow sizes & empirical flow arrivals (aggregate traffic as in EMPIRICAL) BIPARETO-LOGNORMAL-AP Pareto flow sizes, empirical flow arrivals BIPARETO-LOGNORMAL

Scalability vs. Accuracy: flow interarrivals EMPIRICAL BDLG(DAY) BDLGTYPE(DAY) NETWORK(TRACE)

Scalability vs Accuracy: Number of flow arrivals in an hour BDLGTYPE(TRACE) BDLG(DAY) EMPIRICAL NETWORK(TRACE)

Per-flow throughput FLOWSIZE—FLOWARRIVAL Pareto flow sizes & uniform flow arrivals in tracing period BIPARETO-LOGNORMAL EMPIRICAL BIPARETO-LOGNORMAL-AP due to large % of small size flows Pareto flow sizes Fixed flow sizes & empirical flow arrivals

Histogram of flow sizes

Aggregate hourly downloaded traffic

UDP traffic scenario • Wireless hotspot AP • Wireless clients downloading • Wired traffic transmit at 25Kbps • Total aggregate traffic sent in CBR and in empirical is the same Empirical: 1.4 Kbps Bipareto-Lognormal-AP: 2.4 Kbps Bipareto-Lognormal: 2.6 Kbps Large differences in the distributions

Impact ofapplication mix on per-flow throughput TCP-based scenario AP with 85% web traffic AP with 80% p2p traffic AP with 50% web & 40% p2p traffic

Goodput

Per-flow delay

Jitter per flow

Impact of application mix of AP traffic 50% web & 40% p2p 80% p2p 85% web

Session-level flow related variation In-session flow interarrival can be modeled with same distribution for all building types but with different parameters Mean in-session flow interarrival f

Session-level flow size variation Mean flow size f (bytes)

Flow size vs. flow-interarrival on hourly throughput TCP scenario Flow size - Flow interarrival avg flow size fixed original flow interarrival Flow interarrivals has slightly higher impact empirical avg flow interarrivals fixed original flow size

Flow size vs. flow-interarrival on per-flow throughput Flow size - Flow interarrival avg flow size fixed original flow interarrivals original flow size avg flow interarrivals fixed Flow size has higher impact original trace

Per flow statistics for hours that have produced the same aggregate download traffic

Our models persist for traffic generated during busy periods Empirical trace: one hour of a hotspot AP with heavy workload conditions

Simplicity at the cost of higher loss of information Number of flows per session

Number of Flows Per Session

Accurate & scalable models for wireless traffic workload