140 likes | 147 Views
This workshop discusses popularity models, which describe how users distribute their preferences among a set of objects. It explores frequency-rank and frequency-count plots and their relationship to Pareto's model. The applications include cache algorithm design, address cache table dimensioning, and optimization of Video-on-Demand servers' architecture.
E N D
Università Di Roma “Tor Vergata” Dip. Informatica Sistemi Produzione POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Maurizio Naldi Università di Roma “Tor Vergata” Workshop “Statistica e Telecomunicazioni”, Roma 2-3 Luglio 2001
WHAT’S A POPULARITY MODEL Popularity models describe the way users distribute their preferences among a set of objects. They are represented under the form of either a frequency-rank plot (suitable for highly preferred objects) or a frequency-count plot (suitable for the less preferred objects.
EXAMPLES OF FREQUENCY-RANK AND FREQUENCY-COUNT PLOTS A frequency-rank plot No. of preferences vs. rank A frequency-count plot No. of preferences vs. no. of objects that have those preferences
SOME POPULARITY MODELS(FREQUENCY-RANK LAWS) • Zipf • Simon • Yule
RELATIONSHIP TO PARETO’S MODEL If the objects in a set of N are ranked by size according to Zipf’s law Then the number of objects having a size greater or equal to is The probability distribution function is therefore i.e. of the Pareto type
APPLICATIONS Present • Cache algorithm design • Address cache table dimensioning • Optimization of Video-on-Demand servers’ architecture Possible • Any communications context where the user has a wide choice
Users Sites Users Sites TRAFFIC MONITORING POINTS Web proxy observation point Some-to-All Web server observation point All-to-One
THE 20/80 (10/90) RULE • The proportion of requests for the top documents is • overestimated • Fixed proportion rules are false
GENERAL COMMENTS • When fitted by linear regression via Zipf’s law the estimated parameter typically lies in the 0.6-0.85 range • All log-log frequency-rank plots exhibit an initial concavity (top objects’ preferences are overestimated) • All log-log frequency-count plots exhibit final (count vs. frequency) spreading
OPEN ISSUES • Search for better models (solving the initial concavity problem) • Search for parameter estimation methds other than linear regression • Definition of proper goodness-of-fit tests