160 likes | 491 Views
A Geographical Characterization of YouTube: a Latin American View. Fernando Duarte, Fabrício Benevenuto , Virgílio Almeida, Jussara Almeida Federal University of Minas Gerais – Brazil. Outline. Motivation and Goals YouTube Features Crawler and Sampling Geographical Characterization
E N D
A Geographical Characterization of YouTube: a Latin American View Fernando Duarte, Fabrício Benevenuto, Virgílio Almeida, Jussara Almeida Federal University of Minas Gerais – Brazil
Outline • Motivation and Goals • YouTube Features • Crawler and Sampling • Geographical Characterization • Conclusions and Future Work
Motivation and Goals • YouTube is a popular online social video sharing service which generates high-volumes of Internet traffic • YouTube Popularity in Latin American (from www.alexa.com) • 6th in Argentina and Paraguay, • 5th in Brazil, Mexico, Chile and Peru, • 4th in Ecuador and Venezuela. • Goal: characterize influence of geographical localization of users on traffic and social relationship. • Focus on Latin American
YouTube Features Users – videos • watch videos • upload videos (unlimited) • add videos as favorite • post a comment to a video • respond a video with another video • rating a video Users – users interactions • add users as friends • subscribe to another user Videos • Have a list of 20 related videos • are distributed in 14 categories
Sampling Mechanism • Sampling Strategy: • collect information of popular videos and analyze the user interactions around these videos. • First crawler: Collect metadata information of Videos • Start from top all time viewed video and collect the related videos recursively in snowball fashion • Snowball uses a Breadth first scheme • Second Crawler: Collect metadata information of Users from the first crawler • User who uploaded videos, posted comments or video responses.
Crawler Architecture • Parallel crawler • Server coordinates the snowball sampling and • Server avoids redundant data collection • 7 Linux boxes … Client 2 Client 7 Client 1 Server • Collected information of over 2 million videos, exhausting 6 tiers in 11 days (from Apr 3rd to 14th) • 96 of the 100 most all-time popular videos are part of the sample
Statistics of videos and users collected • USA is responsible for 28% of videos and 38% of users • 7% of users are LA, responsible for 7% of uploads and 6% of views • 13% of users without country information (empty) • # views > # comments > # video responses
Latin American Users • Table is sorted by number of users • Users from Brazil, Mexico, and Argentina have contributed with more videos, but in terms of uploads/user Peru leads the rank • In terms of traffic (wached videos) Brazil, Mexico, and Virgin Islands lead the rank • LA users have an average 22 favorite videos and average of 2 friends • Orkut and Myspace have an average of 30 and 137 friends respectively • We guess that most part of the users interact with friends in other online social network and use YouTube essentially to watch videos
Video Popularity • Curve of number of views does not descend linearly • 10% of the top popular LA videos concentrate 76% of the views: caching • LA videos are less visualized and discussed, generating less traffic than other videos
Video Duration • About 80% of the videos are smaller than 5 minutes • There is no difference for different regions
Use of Social Features • LA users interact less at YouTube than other users.
Use of Social Features • Besides less interactive, there are LA users with 2400 friends, users who uploaded 1400 videos and sent more than 1200 comments.
User Interactions Latin American videos • Observe the percentage of comments for videos from LA, USA and others. • Plot the distribution of this percentage
Textual Interactions Latin American videos USA videos • The probability of LA videos have more than 60% of comments from LA users is 0.32 (from USA is only 0.08) • Videos have higher probability to receive comments from same region • Potential use of CDNs (assuming that number of views is also influenced by geographical factors) • Few LA users interact with videos from USA/others, but USA/others interact with LA users
Conclusions and Future Work • We present a geographical characterization of YouTube, highlighting a number of differences between Latin American users and other countries • Main Findings • Videos uploaded by LA users present different characteristics than videos uploads by users from other regions: less visualized and discussed. • Top popular videos concentrate most part of the views, suggesting the use of caching • Interactions present strong influence of geographical localization, suggesting the use of CDNs to improve performance • Future Work • Analyzing impact of language on traffic and user behavior • Explore social networks characteristics of interactions between users and videos across different regions