1 / 29

I Tube, You Tube, Everybody Tubes … Pablo Rodriguez Telefonica Research Barcelona

add image. I Tube, You Tube, Everybody Tubes … Pablo Rodriguez Telefonica Research Barcelona. YouTube Video Example. “ Content is NOT king ” . Content Explosion. Internet. infinite. How to search content?. Number of TV channels. digital cable. 100. analog cable. 40. broadcast. 3.

Anita
Download Presentation

I Tube, You Tube, Everybody Tubes … Pablo Rodriguez Telefonica Research Barcelona

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. add image I Tube, You Tube, Everybody Tubes…Pablo RodriguezTelefonica ResearchBarcelona

  2. YouTube Video Example

  3. “Content is NOT king”

  4. Content Explosion Internet infinite How to search content? Number of TV channels digitalcable 100 analogcable 40 broadcast 3 Time 1995 1980 today 1950

  5. Aggregation and Recommendation Infinite Choice = Overwhelming Confusion Filters required to connect users with content that appeal to their interests

  6. Video and Social Networks • Trends in video services • Users generate new videos • Users help each other finding videos • Need to understand usersandcontents • Video characteristics in YouTube • User-behavior and potential for recommendations

  7. Particularities of “bite-size bits for high-speed munching” [Wired mag. Mar 2007] • Plethora of YouTube clones • UGC is very different How different?

  8. UGC vs. Non-UGC • Massive production scale 15 days in YouTube to produce 120-yr worth of movies in IMDb! • Extreme publishers 1000 uploads over few years vs. 100 movies over 50 years • Short video length 30 sec–5 min vs. 100 min movies in LoveFilm the rest: consumption patterns

  9. User Participation/Finding Videos • Despite Web 2.0 features, user participation remains low • Only 0.16%-0.22% viewers rate videos/comment. • 47% videos have pointers from external sites • But requests from such sites account for less than 3% of the total views

  10. Goals and Data • Potential for recommendation systems? • Popularity evolution • Content Duplication • Crawled YouTube and other UGC systems metadata: video ID, length, views 1.6M Entertainment, 250KScience videos Goals Data

  11. Part1: Popularity Distribution Static popularity characteristics Underlying mechanism

  12. Pareto Principle • 10% popular videos account for 80% total views Other online VoD systems show smaller skew! Fraction of aggregate views Normalized video ranking

  13. Dominant Power-Law Behavior • Richer-get-richer principle If video has K views, then users will watch the video with rate K • word frequency- citations of papers - scale of earthquakes • web hits a y=x Frequency (log) City population (log)

  14. UGC Video Distribution • Straight-line waists and truncated both ends

  15. Focusing on Popular Videos • Why popular videos deviate from power-law? • Fetch-at-most-once[SOSP2003] • Behavior of fetching immutable objects oncecf. visiting popular web sites many times

  16. Why the Unpopular Tail Falls Off • Natural shape is curved • Sampling bias or pre-filters • Publishers tend to upload interesting videos • Information filtering or post-filters • Search results or suggestions favor popular items

  17. Impact of Post-Filters • Videos exposed longer to filtering effect appear more truncated video rank

  18. Is it Naturally Curved? • Matlab curve fitting for Science Science videos Zipf Zipf + exp cutoff Exponential Log-normal

  19. Is it Naturally Curved? • Matlab curve fitting for Science Science videos Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipfand truncation is due to bottlenecks Zipf Zipf + exp cutoff Exponential Log-normal

  20. Implication of Our Findings “ Latent demand for products that is suppressed by bottlenecks in the system [Chris Anderson, The Long Tail] ” Views Entertainment 40% additional views! How? Personalized recommendation Enriched metadataAbundant videos Rankings

  21. Part2: Popularity Evolution Relationship between popularity and age

  22. Popularity Evolution • So far, we focused on static popularity • Now focus on popularity dynamics • How requests on any given day are distributed across the video age? • 6-day daily trace of Science videos • Step1- Group videos requested at least once by age • Step2- Count request volume per age group

  23. Request Volume Across Age User preference relatively insensitive to age --> 80% requests on videos older than a month The probability of a video being watched is 43%, 18%, 17% and 14% for the first 24 hours, 6 days, 3 weeks, and 1 month accordingly

  24. Part4: Content Duplication Level of duplication Birth of duplicates

  25. Content Duplication • Alias-identical or similar copies of the same content • Aliases dilute popularity of a single event • Views distributed across multiple copies • Difficulty in recommendation & ranking systems • Test with 51 volunteers • Find alias using keyword search • Identified 1,224 aliases for 184 original videos

  26. The Level of Popularity Dilution • Popularity diluted up to few-orders magnitude • Often aliases got more requests than original • (e.g. alias got >1000 times more requests)

  27. How Late Aliases Appear? • Significant aliases appear within one week • Within the first day of posting the original video, sometimes you get more than 80 aliases

  28. Conclusions • UGC is a new form of video social interaction • User interaction remains low • Lots of potential for social recommendations

  29. Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html Questions?

More Related