Ch 5 + Anatomy of the Long Tail ( Goel et al., WSDM 2010)

Padmini Srinivasan Computer Science Department Department of Management Sciences http://cs.uiowa.edu/~psriniva padmini-srinivasan@uiowa.edu Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010)

Compression (Ch 5)

Heaps

Zipf’s law

Broder et al. Graph Structure of the Web • Note that the exponent is different. Note also the deviation • In the low end of the out-degree. • Probability page has in-degree k = 1/k2 • Actual exponent slightly larger than 2.

Infinite-inventory retailers • Amazon, Netflix, iTunes music store, • Long tail markets • Items not in brick and mortar stores: • 30% Amazon.com sales • 25% Netflix • Success because of long tail markets. • Two different hypotheses • Majority prefer popular and minority prefer niche items • Everyone likes some popular and some niche items • Different impact on inventory control. If keeping mainstream items: • Satisfy most people nearly all the time • Irritate most people at least some of the time • Knowing which model works/fits/explains behaviour better is important

Infinite-inventory retailers • Two different hypotheses • Majority prefer popular and minority prefer niche items • Everyone likes some popular and some niche items • Different impact on inventory control. If keeping mainstream items: • Satisfy most people nearly all the time • Irritate most people at least some of the time • Knowing which model works better is important • Their work supports the second hypothesis. • Also availability of tail items may boost sale of ‘head’ items ~ one-stop shopping convenience • Not just the direct impact on revenue: second-order gains: customer satisfaction.

Datasets examined Web queries: stemming Urls: restricted to domains (click search data) Browsing: Nielsen data (domains) Data trimming done

Long Tail • What is it? • A relatively small number of items accounts for large number of consumptions – old 80 – 20 rule. • Definition: popularity: fraction of total consumption fulfilled by an item. Eg. fraction of checkouts associated with a particular book. • Popularity of a movie: total times rated/total number of ratings

Two Long-Tail GraphsNetflix & Yahoo! Music Typical inventory: 3000 (netflix) 50,000 (Yahoo! Music) Web search: 10 web sites > over 15% page views Top 10,000 web sites leaves 20% unaccounted.

More Long Tail Graphs

Eccentric Tastes? • An inventory: k-ranked (most popular) items • Definition User is p-percent satisfied if at least p percent of consumption is in the k-ranked set. • Analysis: What percent of users are p-percent satisfied? • Netflix (k = 3000) only 11% of users are 100% satisfied; 63% are 90% satisfied • Yahoo! Music (k=50,000), only 5% users 100% satisfied; 32% are 90% satisfied • With brick and mortar almost none of the users completely satisfied.

Eccentric Tastes?Netflix & Yahoo! music Upper: 90% satisfaction; lower: 100 % satisfaction

Ratings versus Popularity • The more obscure the less appreciated an item. • So the more aware the more appreciated? • Studied with movies and music. • Relationship between popularity (rank) and rating • Value of tail over emphasized because there is disproportionate dissatisfaction or satisfaction. • Tail end less dissatisfaction/satisfaction?

Ratings versus Popularity • Pattern present Netflix but not in music dataset. (more obscure songs get even higher ratings).

Ratings versus Popularity Tail end less dissatisfaction/satisfaction? (users disproportionately dissatisfied with tail end) 32% Netflix users, 56% of Yahoo! Music users had at least 10% items rated high in the tail 85% netflix users and 91% yahoo! Music users rated an item outside physical stores. (original 89% & 95% resp.) So can’t dismiss the long tail ends Even typical users have a need for tail end items

Null Hypothesis model • Random model • Each user decides how many items to consume (consistent with the empirical data. Fix number of users, number of items, and number selected/viewed/clicked/rated by users). • Item selection by user also random but constrained to be according to popularity and without replacement. • What are the limitations in this null model?

Null ModelNetflix & Yahoo! music Upper: 90% satisfaction; lower: 100 % satisfaction Null models: users are much harder to satisfy. Eg: only 14% of users in null model are 90% satisfied compared to 64% (movies) with k=3000.

Implications? • Though most users consume tail content part of the time • Sizeable fraction of users prefer head over tail content that goes beyond the draw of popularity. • To compensate other users draw disproportionately from the tail.

Consumption patterns: Users vs Popularity

Some patterns • By moving from k = 3000 to 3500 movies, cumulative popularity increases 2% from 87 to 89% while 90% satisfaction increases more (7%) (63 to 70%). • Movies that by popularity alone account for only 2% of the demand could potentially grow the overall customer base by 7% by attracting newly satisfied users. • Searching: moving 95 to 96% along the tail increases 90% user satisfaction from 80 to 86%

Individual eccentricity: median rank of his/her consumed items.

More on eccentricity • Are those who are more ‘engaged’ (i.e., consume more) more eccentric? • No: correlations between two at individual level (low) • But some observations at the group level

More on eccentricity ~ web pages Unique urls

Theoretical Analysis • Independent model • Sticky model • Winner take all. • Shared inventory approach

Summary • Nice analysis long tail • Different perspectives combined • Popularity (cumulative and individual) • 90% , 100% satisfaction • Engagement versus ratings • Use of a null model to make predictions and compare • Nice graphs • Long tail helps in capturing user satisfaction and retention

Ch 5 + Anatomy of the Long Tail ( Goel et al., WSDM 2010)