380 likes | 664 Views
Traffic Prediction on the Internet. Anne Denton. Outline. Paper by Y. Baryshnikov, E. Coffman, D. Rubenstein and B. Yimwadsana Solutions Time-Series prediction Our work for the KDD-cup 03. Time Series Prediction on the Internet.
E N D
Traffic Prediction on the Internet Anne Denton
Outline • Paper by Y. Baryshnikov, E. Coffman, D. Rubenstein and B. Yimwadsana • Solutions • Time-Series prediction • Our work for the KDD-cup 03
Time Series Prediction on the Internet By Y. Baryshnikov, E. Coffman, D. Rubenstein and B. Yimwadsana • Adjustment to “hot spots” • Avoiding degradation, even “denial of service” • Can “hot spots” be predicted? • Can predicted “hot spots” be avoided?
What are “hot spots”? • Exceptionally large numbers of requests • Spontaneous, short lifetime • “instant” ramp up in traffic • Only valid on long time scales • Claim: time scale for increase larger than time scale to react • Why does increase take time? • Passing on the word • How good does a predictor have to be? • Cost of missing a “hot spot” higher than aggregate cost of false alarms (similar to hurricane)
Examples • Olympics (Nagano 98) • Soccer World Cup (98) • NASA (95)
What to do about “hot spots”? • <Detour> “The Columbia Hotspot Rescue Service: A Research Plan” E. Coffman, P. Jelenkovic, J.Nieh, and D. Rubenstein • Approaches • Deal ad hoc with high request • Build a better network (expensive) • Content delivery services • Caching • Extra bandwidth • Suggested solution: use available and underutilized resources
Hotspot Rescue Service • Server-based approach • Requires additional resources from server when necessary • Resources provided by other members of Hotspot Rescue Service • Peer-to-Peer approach • Requires additional resources from client when necessary • Caching
Four Phases • Prediction (see rest of presentation) • Server-based: daemons • P2P: plug-ins • Replication • Server-based: replication of objects • P2P: identified cached copies • More advanced: redistribution of traffic load • Notification • Modifications to DNS (Domain Name System) • P2P system proactively announces hot objects and indicates alternative locations? • Termination <End of Detour>
Tail of Distribution • Requests per 10-second time slot • X-axis: number of hits per time slot • Y-axis: probability that that number of hits will be exceeded
Time Scales • Prediction relies on correlation between values at different times • Auto correlation function • Predictability on time scales of 5-30 min
Prediction Algorithm • Standard problem • Signal processing • Econometrics • Internet traffic • Particularly bursty • Simplest model • Linear extrapolation
Structure of Prediction Algorithms • Traffic observation • # of requests in time unit (t-1,t] • Usually 1s • Prediction window • Duration Wp 0 • Advance notice • Prediction at time t: • Mapping of observations in [t-Wp,t] to a number pt 0 of requests predicted in interval [t+, t++1] that is units in the future
Linear Prediction • Linear Fit: Least squares linear fit • pt = ft(t+) with • ft(s) = at s+bt • Minimizing • Performance: O(W+T) • W: Window size • T: uptime duration • Problems • Prediction window size must match burstiness parameters governing request flow
Results • Depends on properties of auto-correlation function
Conclusions of Paper • Build a load-based taxonomy of web server traffic • Depends on technological, sociological, and psychological factors • Look for quantification of basic patterns reflecting behavior Do we agree ??? • Why cluster when we can classify!!
Our Approach • Normally time series prediction uses only data in that time series • We use similarity to other instances • E.g., other web sites • Model-free • Weighted Nearest Neighbor approach • Problem: • How integrate time?
Typical Nearest Neighbor Classification / Regression • R(A1, …, An, C) • Attributes Ai • C class label (classification) • or continuous variable (regression) • Based on distance function on Ai • K nearest neighbors • Neighbors within a range • Use kernel function to weight closer ones higher
Weighting of Attributes • Some attributes are more important than others • Apply scaling to space • Optimize weights through • Hill-climbing • Genetic Algorithm • How does this generalize to a time-series?
Our Answer • Identify “relevant” sections in the time series • E.g. times with already high download rates • We’ll call each relevant section a “prediction”
Predictions • Each prediction contains information about • The nature of the time series • The time instance in question, i.e. the history of requests • The actual change in requests • Make a table of predictions • Leads to a relation just as standard classification / regression setting
Data Set • Paper citations in “e-print ArXive” • Background: KDD-cup 03 • Predict the change in citations in successive 3-month periods • Only consider periods with at least 6 citations • Evaluation: L1 distance (Manhattan distance) between predicted and real difference • Very close match between citation history and request history • Predict change in requests • Only consider periods that already show large number of requests
Attributes of a “Prediction” • Quantitative attributes • Number of citations in window • Gradient of citations in window • Aggregate number of citations up to and through window (assume finite time series) • Attribute values given by time series • Keyword occurrences • Author • Number of revisions of papers • Maximum time interval between revisions • Country of origin • Format
Similarity Function • Common kernel-function • What worked better
Accuracy • No linear extrapolation data available • Could lead to negative citations • Comparison • Default prediction: No change: 1851 • Very simple model (decrease by 0.3 in 3 months): 1532 • Prediction based on average of time series (synchronized at first non-0): 1593 • Prediction based on quantitative attributes: 1465 • Full prediction (prelimiary): 1357 • Weight optimized (very preliminary): reduction 1414 -> 1391
Conclusions • Method works well for citation prediction • Yet to be tested for hot-spot prediction