1 / 41

Large-scale Recommender Systems on Just a PC

Explore the power of running large-scale recommender systems on a single PC with GraphChi. Learn why single-computer computing is efficient and practical for big tasks, as demonstrated with vertex-centric programming and disk-based graph computation. Discover the benefits of using GraphChi for collaborative filtering and link prediction in vast networks.

kcross
Download Presentation

Large-scale Recommender Systems on Just a PC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-scale Recommender Systems on Just a PC Aapo Kyrölä Ph.D. candidate @ CMU http://www.cs.cmu.edu/~akyrola Twitter: @kyrpov LSRS 2013 keynote (RecSys’13 Hong Kong) Big Data – small machine

  2. My Background • Academic: 5th year Ph.D. @ Carnegie Mellon. Advisors: Guy Blelloch, Carlos Guestrin (UW) 2009  2012  + Shotgun : Parallel L1-regularized regression solver (ICML 2011). + Internships at MSR Asia (2011) and Twitter (2012) • Startup Entrepreneur Habbo : founded 2000

  3. Outline of this talk • Why single-computer computing? • Introduction to graph computation and GraphChi • Recommender systems with GraphChi • Future directions & Conclusion

  4. Large-Scale Recommender Systems on Just a PC Why on a single machine? Can’t we just use the Cloud?

  5. Why use a cluster? Two reasons: • One computer cannot handle my problem in a reasonable time. • I need to solve the problem very fast.

  6. Why use a cluster? Two reasons: • One computer cannot handle my problem in a reasonable time. • I need to solve the problem very fast. • Our work expands the space of feasible (graph) problems on one machine: • Our experiments use the same graphs, or bigger, than previous papers on distributed graph computation. (+ we can do Twitter graph on a laptop) • Most data not that “big”. Our work raises the bar on required performance for a “complicated” system.

  7. Benefits of single machine systems Assuming it can handle your big problems… • Programmer productivity • Global state • Can use “real data” for development • Inexpensive to install, administer, less power. • Scalability.

  8. Efficient Scaling Distributed Graph System Single-computer system (capable of big tasks) Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 6 machines (Significantly) less than 2x throughput with 2x machines Task 1 Exactly 2x throughput with 2x machines Task 2 Task 3 Task 4 Task 5 Task 6 Task 10 Task 11 Task 12 12 machines Time Time T T

  9. Graph computation and graphchi

  10. Why graphs for recommender systems? • Graph = matrix: edge(u,v) = M[u,v] • Note: always sparse graphs • Intuitive, human-understandable representation • Easy to visualize and explain. • Unifies collaborative filtering (typically matrix based) with recommendation in social networks. • Random walk algorithms. • Local view  vertex-centric computation

  11. Vertex-Centric Computational Model • Graph G = (V, E) • directed edges: e = (source, destination) • each edge and vertex associated with a value (user-defined type) • vertex and edge values can be modified • (structure modification also supported) A B Data Data Data Data Data Data Data Data Data Data GraphChi – Aapo Kyrola

  12. Vertex-centric Programming • “Think like a vertex” • Popularized by the Pregel and GraphLabprojects Data Data Data Data Data Data Data Data Data Data MyFunc(vertex) { // modify neighborhood } Data Data Data Data Data

  13. What is GraphChi • 2 Both in OSDI’12!

  14. The Main Challenge of Disk-based Graph Computation: Random Access << 5-10 M random edges / sec to achieve “reasonable performance” 100s reads/writes per sec ~ 100K reads / sec (commodity) ~ 1M reads / sec (high-end arrays)

  15. Details: Kyrola, Blelloch, Guestrin: “Large-scale graph computation on just a PC” (OSDI 2012) Parallel Sliding Windows or Only P large reads for each interval (sub-graph). P2 reads on one full pass.

  16. GraphChi Program Execution For T iterations: For p=1 to P For v in interval(p) updateFunction(v) For T iterations: For v=1 to V updateFunction(v) “Asynchronous”: updates immediately visible (vs. bulk-synchronous).

  17. Performance GraphChi can compute on the full Twitter follow-graph with just a standard laptop. ~ as fast as a very large Hadoop cluster! (size of the graph Fall 2013, > 20B edges [Gupta et al 2013])

  18. GraphChi is Open Source • C++ and Java-versions in GitHub: http://github.com/graphchi • Java-version has a Hadoop/Pig wrapper. • If you really really want to use Hadoop.

  19. Recsysmodeltrainingwith graphchi

  20. Overview of Recommender Systems for GraphChi • Collaborative Filtering toolkit (next slide) • Link prediction in large networks • Random-walk based approaches (Twitter) • Talk on Wednesday.

  21. GraphChi’sCollaborative Filtering Toolkit • Developed by Danny Bickson(CMU / GraphLabInc) • Includes: • Alternative Least Squares (ALS) • Sparse-ALS • SVD++ • LibFM (factorization machines) • GenSGD • Item-similarity based methods • PMF • CliMF (contributed by Mark Levy) • …. See Danny’s blog for more information: http://bickson.blogspot.com/2012/12/collaborative-filtering-with-graphchi.html Note: In the C++ -version. Java-version in development by a CMU team.

  22. Two examples: ALS and item-based CF

  23. Example: Alternative Least Squares Matrix Factorization (ALS) • Task: Predict ratings for items (movies) by users. • Model: • Latent factor model (see next slide) Reference: Y. Zhou, D. Wilkinson, R. Schreiber, R. Pan: “Large-Scale Parallel Collaborative Filtering for the Netflix Prize” (2008)

  24. ALS: Product – Item bipartite graph Women on the Verge of aNervous Breakdown 4 3 The Celebration City of God 2 Wild Strawberries 5 • User’s rating of a movie modeled as a dot-product: • <factor(user), factor(movie)> La Dolce Vita

  25. ALS: GraphChi implementation • Update function handles one vertex a time (user or movie) • For each user: • Estimate latent(user): minimize least squares of dot-product predicted ratings • GraphChi executes the update function for each vertex (in parallel), and loads edges (ratings) from disk • Latent factors in memory: need O(V) memory. • If factors don’t fit in memory, can replicate to edges. and thus store on disk Scales to very large problems!

  26. ALS: Performance Matrix Factorization (Alternative Least Squares) Remark: Netflix is not a big problem, but GraphChi will scale at most linearly with input size (ALS is CPU bounded, so should be sub-linear in #ratings).

  27. Example: Item Based-CF • Task: compute a similarity score [e,g. Jaccard] for each movie-pair that has at least one viewer in common. • Similarity(X, Y) ~ # common viewers • Output top K similar items for each item to a file. • … or: create edge between X, Y containing the similarity. • Problem: enumerating all pairs takes too much time.

  28. Women on the Verge of aNervous Breakdown Solution: Enumerate all triangles of the graph. 3 The Celebration New problem: how to enumerate triangles if the graph does not fit in RAM? City of God Wild Strawberries La Dolce Vita

  29. Enumerating Triangles (Item-CF) • Triangles with edge (u, v) = intersection(neighbors(u), neighbors(v)) • Iterative memory efficient solution (next slide)

  30. Algorithm: • Let pivots be a subset of the vertices; • Load all neighbor-lists (adjacency lists) of pivots into RAM • Use now GraphChi to load all vertices from disk, one by one, and compare their adjacency lists to the pivots’ adjacency lists (similar to merge). • Repeat with a new subset of pivots. PIVOTS

  31. Triangle Counting Performance Triangle Counting

  32. Future directions & Final remarks

  33. Single-Machine Computing in Production? • GraphChi supports incremental computation with dynamic graphs: • Can keep on running indefinitely, adding new edges to the graph  Constantly fresh model. • However, requires engineering – not included in the toolkit. • Compare to a cluster-based system (such as Hadoop) that needs to compute from scratch.

  34. Unified Recsys Platform for GraphChi? • Working with masters students at CMU. • Goal: ability to easily compare different algorithms, parameters • Unified input, output. • General programmable API (not just file-based) • Evaluation process: Several evaluation metrics; Cross-validation, held-out data… • Run many algorithm instances in parallel, on same graph. • Java. • Scalable from the get-go.

  35. Recent developments: Disk-based Graph Computation • Recently two disk-based graph computation systems published: • TurboGraph (KDD’13) • X-Stream (SOSP’13 in October) • Significantly better performance than GraphChi on many problems • Avoid preprocessing (“sharding”) • But GraphChi can do some computation that X-Stream cannot (triangle counting and related); TurboGraph requires SSD • Hot research area!

  36. Do you need GraphChi – or any system? • Heck, for many algorithms, you can just mmap() over your (binary) adjacency list / sparse matrix, and write a for-loop. • See Lin, Chau, Kang Leveraging Memory Mapping for Fast and Scalable Graph Computation on a PC (Big Data ’13) • Obviously good to have a common API • And some algos need more advanced solutions (like GraphChi, X-Stream, TurboGraph) Beware of the hype!

  37. Conclusion • Very large recommender algorithms can now be run on just your PC or laptop. • Additional performance from multi-core parallelism. • Great for productivity – scale by replicating. • In general, good single machine scalability requires care with data structures, memory management  natural with C/C++, with Java (etc.) need low-level byte massaging. • Frameworks like GraphChi hide the low-level. • More work needed to ‘’productize’’ current work.

  38. Thank you! Aapo Kyrölä Ph.D. candidate @ CMU – soon to graduate! (Currently visiting U.W) http://www.cs.cmu.edu/~akyrola Twitter: @kyrpov

More Related