670 likes | 844 Views
Music r ecommendations at Spotify. Erik Bernhardsson erikbern@spotify.com. Spotify. Launched in 2009 Available in 17 countries 20M active users, 5M paying subscribers Peak at 5k tracks/s, 1M logged in users 20M tracks. Some applications. Recommendation stuff at Spotify.
E N D
Music recommendations at Spotify Erik Bernhardssonerikbern@spotify.com
Spotify • Launched in 2009 • Available in 17 countries • 20M active users, 5M paying subscribers • Peak at 5k tracks/s, 1M logged in users • 20M tracks
Recommendation stuff at Spotify • Related artists:
Recommendations • Manual classification • Feature extraction • Social media analysis, web scraping, metadata based • Collaborative filtering
Pandora & Music Genome Project • Classifies tracks in terms of 400 attributes • Each track takes 20-30 minutes to classify • A distance function finds similar tracks • “Subtle use of strings” • “Epic buildup” • “Acid Jazz roots” • “Beats made for dancing” • “Trippy soundscapes” • “Great trombone solo” • …
Collaborative filtering • Idea: • If two movies x, y get similar ratings then they are probably similar • If a lot of users all listen to tracks x, y, z, then those tracks are probably similar
Aggregate data • Throw away temporal information and just look at the number of times
… very big matrix • Throw out all the temporal data:
Supervised collaborative filtering is pretty much matrix completion
Unsupervised learning • Trying to estimate the density • i.e. predict probability of future events
We can calculate correlation coefficient as an item similarity • Use something like Pearson, Jaccard, …
Amazon did this for “customers who bought this also bought” • US patent 7113917
Can speed this up using various LSH tricks • Twitter: Dimension Independent Similarity Computation (DISCO)
Natural Language Processing has a lot of similar problems • …matrix factorization is one idea
Matrix factorization • Want to get user vectors and item vectors • Assume f latent factors (dimensions) for each user/item
Probabilistic Latent Semantic Analysis (PLSA) • Hofmann, 1999 • Also called PLSI
PLSA, cont. • + a bunch of constraints:
PLSA, cont. • Optimization problem: maximize log-likelihood
“Collaborative Filtering for Implicit Feedback Datasets” • Hu, Koren, Volinsky (2008)
“Collaborative Filtering for Implicit Feedback Datasets”, cont.
What happens each iteration • Assign all latent vectors small random values • Perform gradient ascent to optimize log-likelihood
Calculate derivative and do gradient ascent • Assign all latent vectors small random values • Perform gradient ascent to optimize log-likelihood
Vectors are pretty nice because things are now super fast • User-item score is a dot product: • Item-item similarity score is a cosine similarity: • Both cases have trivial complexity in the number of factors f: