290 likes | 670 Views
add image. I Tube, You Tube, Everybody Tubes… Pablo Rodriguez Telefonica Research Barcelona. YouTube Video Example. “ Content is NOT king ” . Content Explosion. Internet. infinite. How to search content?. Number of TV channels. digital cable. 100. analog cable. 40. broadcast. 3.
E N D
add image I Tube, You Tube, Everybody Tubes…Pablo RodriguezTelefonica ResearchBarcelona
Content Explosion Internet infinite How to search content? Number of TV channels digitalcable 100 analogcable 40 broadcast 3 Time 1995 1980 today 1950
Aggregation and Recommendation Infinite Choice = Overwhelming Confusion Filters required to connect users with content that appeal to their interests
Video and Social Networks • Trends in video services • Users generate new videos • Users help each other finding videos • Need to understand usersandcontents • Video characteristics in YouTube • User-behavior and potential for recommendations
Particularities of “bite-size bits for high-speed munching” [Wired mag. Mar 2007] • Plethora of YouTube clones • UGC is very different How different?
UGC vs. Non-UGC • Massive production scale 15 days in YouTube to produce 120-yr worth of movies in IMDb! • Extreme publishers 1000 uploads over few years vs. 100 movies over 50 years • Short video length 30 sec–5 min vs. 100 min movies in LoveFilm the rest: consumption patterns
User Participation/Finding Videos • Despite Web 2.0 features, user participation remains low • Only 0.16%-0.22% viewers rate videos/comment. • 47% videos have pointers from external sites • But requests from such sites account for less than 3% of the total views
Goals and Data • Potential for recommendation systems? • Popularity evolution • Content Duplication • Crawled YouTube and other UGC systems metadata: video ID, length, views 1.6M Entertainment, 250KScience videos Goals Data
Part1: Popularity Distribution Static popularity characteristics Underlying mechanism
Pareto Principle • 10% popular videos account for 80% total views Other online VoD systems show smaller skew! Fraction of aggregate views Normalized video ranking
Dominant Power-Law Behavior • Richer-get-richer principle If video has K views, then users will watch the video with rate K • word frequency- citations of papers - scale of earthquakes • web hits a y=x Frequency (log) City population (log)
UGC Video Distribution • Straight-line waists and truncated both ends
Focusing on Popular Videos • Why popular videos deviate from power-law? • Fetch-at-most-once[SOSP2003] • Behavior of fetching immutable objects oncecf. visiting popular web sites many times
Why the Unpopular Tail Falls Off • Natural shape is curved • Sampling bias or pre-filters • Publishers tend to upload interesting videos • Information filtering or post-filters • Search results or suggestions favor popular items
Impact of Post-Filters • Videos exposed longer to filtering effect appear more truncated video rank
Is it Naturally Curved? • Matlab curve fitting for Science Science videos Zipf Zipf + exp cutoff Exponential Log-normal
Is it Naturally Curved? • Matlab curve fitting for Science Science videos Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipfand truncation is due to bottlenecks Zipf Zipf + exp cutoff Exponential Log-normal
Implication of Our Findings “ Latent demand for products that is suppressed by bottlenecks in the system [Chris Anderson, The Long Tail] ” Views Entertainment 40% additional views! How? Personalized recommendation Enriched metadataAbundant videos Rankings
Part2: Popularity Evolution Relationship between popularity and age
Popularity Evolution • So far, we focused on static popularity • Now focus on popularity dynamics • How requests on any given day are distributed across the video age? • 6-day daily trace of Science videos • Step1- Group videos requested at least once by age • Step2- Count request volume per age group
Request Volume Across Age User preference relatively insensitive to age --> 80% requests on videos older than a month The probability of a video being watched is 43%, 18%, 17% and 14% for the first 24 hours, 6 days, 3 weeks, and 1 month accordingly
Part4: Content Duplication Level of duplication Birth of duplicates
Content Duplication • Alias-identical or similar copies of the same content • Aliases dilute popularity of a single event • Views distributed across multiple copies • Difficulty in recommendation & ranking systems • Test with 51 volunteers • Find alias using keyword search • Identified 1,224 aliases for 184 original videos
The Level of Popularity Dilution • Popularity diluted up to few-orders magnitude • Often aliases got more requests than original • (e.g. alias got >1000 times more requests)
How Late Aliases Appear? • Significant aliases appear within one week • Within the first day of posting the original video, sometimes you get more than 80 aliases
Conclusions • UGC is a new form of video social interaction • User interaction remains low • Lots of potential for social recommendations
Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html Questions?