1 / 10

Zipf’s Law

Zipf’s Law. ・ Frequency of occurrence of words is inversely proportional to the rank in this frequency of occurrence. ・ When both are plotted on a log scale, the graph is a straight line. George Kingsley Zipf 1902-1950. Zipf Distribution. The Important Points:

omer
Download Presentation

Zipf’s Law

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Zipf’s Law ・ Frequency of occurrence of words is inversely proportional to the rank in this frequency of occurrence. ・ When both are plotted on a log scale, the graph is a straight line. George Kingsley Zipf 1902-1950

  2. Zipf Distribution • The Important Points: • a few elements occur veryfrequently • a medium number of elements have medium frequency • manyelements occur very infrequently

  3. Zipf Distribution The product of the frequency of words (f) and their rank (r) is approximately constant Rank = order of words’ frequency of occurrence

  4. Zipf Distribution(Same curve on linear and log scale) Illustration by Jacob Nielsen

  5. What Kinds of Data Exhibit a Zipf Distribution? • Words in a text collection • Virtually any language usage • Library book checkout patterns • Incoming Web Page Requests (Nielsen) • Outgoing Web Page Requests (Cunha & Crovella) • Document Size on Web (Cunha & Crovella)

  6. Characteristics of WWW Client-based Traces Zipf's Law Applied To WWW Documents

  7. Distribution of users among web sites

  8. Binned distribution of users to sites Exponentially increasing bins Cumulative distribution of users to sites

  9. Durée des connexions (power law) • Exp C : Nov. 18th, 2001, pendant 24 heures : • - 20954 connexions valides (17735 IN, 3218 OUT) • - session la plus longue : 11 heures ; 5 sessions • ont durées plus que 8 heures La durée moyenne des connexions est courte (entre 30 et 60 sec), mais il existe des connexions très durable

More Related