370 likes | 500 Views
we.b : The web of short URLs. Demetris Antoniades , lasonas Polakis , Gerogios Kontaxis , Elias Athansapoulos , Sotiris loannidis , Evangelos P.Markatos , Thomas Karagiannis FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, Microsoft Research www 2011 March 30 2011
E N D
we.b : The web of short URLs DemetrisAntoniades, lasonasPolakis, GerogiosKontaxis, Elias Athansapoulos, Sotiris loannidis, EvangelosP.Markatos, Thomas Karagiannis FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, Microsoft Research www 2011 March 30 2011 Presented by Somin Kim
Outline • Introduction • URL Shortening Services • Data Collection • The Web of Short URLs • Evolution and Lifetime • Publishers • Short URLs and Web Performance • Conclusion
Introduction • The idea behind URL shortening services is to assist in the easy sharing of URLs by providing a short equivalent one • Short URLs have seen a significant increase in their usage • Result of their extensive usage in Online Social Networks • Understanding the usage of short URLs is important • To provide insight into the interests of OSNs or IM systems • To know performance, scalability, and reliability of URL shortening services • To define the proper architecture for URL shortening services
Outline • Introduction • URL Shortening Services • Data Collection • The Web of Short URLs • Evolution and Lifetime • Publishers • Short URLs and Web Performance • Conclusion
URL Shortening Services(1/3) • URL Shortening Services • Popularity of URL shortening services • The rapid adoption of OSNs has led to an increased demand for short URLs • Short URLs are also useful in traditional systems • such as IMs, SMSes, and e-mails URL shortening Service bit.ly publish Long URL http://www.this.is.a.long.url.com/indeed.html Short URL http://bit.ly/dv82ka access Redirected to original URL
URL Shortening Services(2/3) • Some of these services provide statistics about the accesses of these URLs • The number of hits • The referrer sites the hits came from • The visitor’s countries • … • Users can create many short URLs for the same long URL • If a user creates a short URL for the same long URL, the service will create a different hash that will be given to the user • For each unique long URL, bit.ly provides a unique global hash with an information page • Overall statistics will still be kept by the global URL’s information page
URL Shortening Services(3/3) Global information 통계페이지 캡쳐해서보여줄까?
Outline • Introduction • URL Shortening Services • Data Collection • Collection methodology • Collected data • The Web of Short URLs • Evolution and Lifetime • Publishers • Short URLs and Web Performance • Conclusion
Data Collection(1/3)Collection Methodology • Twitter crawling • Twitter crawling returns links “gossiped” in a social network • We collected tweets that contain HTTP URLs • Only 13% of the HTTP URLs were not shortened by any URL shortening services • 50% of the HTTP URLs from Twitter were from bit.ly URLs
Data Collection(2/3)Collection Methodology • Brute-Force • We can get hashes irrespective of their published medium and recency • We gathered metadata provided by the shortening service • We monitored the evolution of the keyspace in ow.ly system • Ow.ly serially iterates over the available short URL space • About 70000 new short URLs created each day
Data Collection(3/3)Collected Data • In case of twitter and bitly, all the accompanied metadata for each short URL are also collected
Outline • Introduction • URL Shortening Services • Data Collection • The Web of Short URLs • Where do short URLs come from? • Where do short URLs point to? • Location • Popularity • Evolution and Lifetime • Publishers • Short URLs and Web Performance • Conclusion
The Web of Short URLs(1/7)Where do short URLs come from? • Short URLs do not frequently appear in traditional web pages • The vast majority of users arrive at bit.ly from non-web applications • Users who access through web applications mostly come from social networking channels (Twitter, facebook)
The Web of Short URLs(2/7)Where do short URLs point to? • Most popular types of short URL contents • News and informative content come first • 4% of the most accessed URLs in owly trace were shortening services • Spammers use short URLs packed inside other short URLs to avoid exposure of the long URL
The Web of Short URLs(3/7)Location • The penetration of short URL use is significantly different from that of the Internet/web • Most of these accesses come from the United States, Japan, and Great Britain • Any accesses from China and India was not seen • China and India are ranked in the top-5 countries with the largest number of Internet users
The Web of Short URLs(4/7)Popularity • URL popularity • Large systems that provide content to users typically exhibit the power-law behavior • A small fraction of the content is very popular • Most of it is considered uninteresting
The Web of Short URLs(5/7)Popularity • URL popularity (cont.) • We split short URLs into active and inactive • Inactive : no hit was observed during the last 7 days of trace • 10% of the short URLs are responsible for about 90% of the total hits seen in trace
The Web of Short URLs(6/7)Popularity • Content popularity • Besides familiar websites, less known or popular websites were observed • Pollpigeon.com(short opinion polls), Mashable.com(social media news), Twibbon.com(Twitter campaign) • Short polls are popular contents • It’s very common in social networking sites
The Web of Short URLs(7/7)Popularity • Content popularity (cont.) • Do popular web sites significantly change over time? • About 6 sites appears every single day of April 2010 in the top-100 • 22 sites for March 2010 • About 400 sites enjoy short bursts of popularity
Outline • Introduction • URL Shortening Services • Data Collection • The Web of Short URLs • Evolution and Lifetime • Life span of short URLs • Temporal evolution • Publishers • Short URLs and Web Performance • Conclusion
Evolution and Lifetime(1/5)Life span of short URLs • Lifetime of a URL is the number of days between its last and first observed hit • Lifetime CDF of the traces (twitter2, bitly) • 50% of the short URLs are not ephemeral • Inactive URLs have a shorter lifespan
Evolution and Lifetime(2/5)Temporal evolution • The daily change in the number of hits for each short URL • The number of accesses for a typical short URL varies by as much as 40% from one day to the next • As less popular URLs are included, larger daily changes are observed
Evolution and Lifetime(3/5)Temporal evolution • Inactive URLs • Average 60% of hits are observed during their first day • After that, hit rate drops sharply • Active URLs • First-day effect is also evident • A significant hit rate for recent days are also observed • The evolution of hit rate across the lifetime of the short URLs
Evolution and Lifetime(4/5)Temporal evolution • The daily hit rate with a short URL’s lifetime for inactive short URLs • There’s no obvious dependence of the daily hit rate with a short URL’s lifetime
Evolution and Lifetime(5/5)Temporal evolution • Total number of hits as a function of the short URL’s lifetime • Active short URLs(bottom) appear to exhibit a linear relationship in log-log scale
Outline • Introduction • URL Shortening Services • Data Collection • The Web of Short URLs • Evolution and Lifetime • Publishers • Short URLs and Web Performance • Conclusion
Publishers(1/4) • Twitter effect • Short URLs referred from Twitter enjoy significantly higher popularity
Publishers(2/4) • CCDF of posted short URLs per Twitter user • Most users published a handful of tweets with short URLs • The majority of tweets with short URLs are original Twitter messages (not retweets)
Publishers(3/4) • User’s daily publish rate of short URLs • Median rate is 1 short URL per day • 98% or the user publish no more than 5 short URLs per day
Publishers(4/4) • Correlation between a user’s publish rate and total number of hits • As the number of URLs published by a poster increases, the expected hit rate drops • Spamming-type behavior • Only a few short URLs from each publisher enjoy high hit rates
Outline • Introduction • URL Shortening Services • Data Collection • The Web of Short URLs • Evolution and Lifetime • Publishers • Short URLs and Web Performance • Space reduction • Latency • Conclusion
Short URLs and Web Performance(1/3)Space reduction • Space gain for the short URL • URL shortening services are quite effective at reducing URL size • For roughly 50% of the URLs, 91% reduction in size is observed • In twitter trace, only 31% of long versions of short URL remained under the character limit
Short URLs and Web Performance(2/3)Latency • URL shortening services impose an additional overhead in the user’s web request • We periodically accessed the 10 most popular short URLs • Fb.me and ow.ly exhibit a bimodal behavior • Bit.ly appears to be the slowest but shows more consistent behavior
Short URLs and Web Performance(3/3)Latency • The redirection overhead of bit.ly • More than 50% of the accesses, the URL shortening redirection imposes a relative overhead of 54% • This additional delay turns out to be comparable to the final web page access time in a significant fraction
Outline • Introduction • URL Shortening Services • Data Collection • The Web of Short URLs • Evolution and Lifetime • Publishers • Short URLs and Web Performance • Conclusion
Conclusion • We have presented a large-scale study of URL shortening services • Exploring traces from services themselves and Twitter • Summary • Short URLs appear mostly in ephemeral media, with profound effects on their popularity, lifetime, and access patterns • Small number of URLs have a very large number of accesses • A large percentage of short URLs are not ephemeral • The most popular websites changes slowly over time • The web sites differ from the sites which are popular among the broader web community • URL shortening services are extremely effective in space gaining but increase the overhead to access the web page