1 / 72

Joseph Gonzalez Joint work with

@. The Next Generation of the GraphLab Abstraction. Joseph Gonzalez Joint work with. Yucheng Low. Aapo Kyrola. Danny Bickson. Carlos Guestrin. Guy Blelloch. Joe Hellerstein. David O’Hallaron. Alex Smola. How will we design and implement parallel learning systems?.

Download Presentation

Joseph Gonzalez Joint work with

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. @ The Next Generation of the GraphLab Abstraction. Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron Alex Smola

  2. How will wedesign and implementparallel learning systems?

  3. ... a popular answer: Map-Reduce / Hadoop Build learning algorithms on-top of high-level parallel abstractions

  4. Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel Is there more to Machine Learning ? Map Reduce Label Propagation Lasso Feature Extraction Cross Validation Belief Propagation Kernel Methods Computing Sufficient Statistics Tensor Factorization PageRank Neural Networks Deep Belief Networks

  5. Concrete Example Label Propagation

  6. Label Propagation Algorithm • Social Arithmetic: • Recurrence Algorithm: • iterate until convergence • Parallelism: • Compute all Likes[i] in parallel Sue Ann 50% What I list on my profile 40% Sue Ann Likes 10% Carlos Like 80% Cameras 20% Biking 40% + I Like: 60% Cameras, 40% Biking Profile 50% 50% Cameras 50% Biking Me Carlos 30% Cameras 70% Biking 10%

  7. Properties of Graph Parallel Algorithms Dependency Graph Factored Computation Iterative Computation What I Like What My Friends Like

  8. Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel Map Reduce Map Reduce? ? Label Propagation Lasso Feature Extraction Cross Validation Belief Propagation Kernel Methods Computing Sufficient Statistics Tensor Factorization PageRank Neural Networks Deep Belief Networks

  9. Why not use Map-Reducefor Graph Parallel Algorithms?

  10. MapAbuse: Iterative MapReduce • Only a subset of data needs computation: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier

  11. MapAbuse: Iterative MapReduce • System is not optimized for iteration: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Startup Penalty Disk Penalty Disk Penalty Startup Penalty Startup Penalty Disk Penalty Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data

  12. Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel Map Reduce Pregel (Giraph)? Map Reduce? SVM Lasso Feature Extraction Cross Validation Belief Propagation Kernel Methods Computing Sufficient Statistics Tensor Factorization PageRank Neural Networks Deep Belief Networks

  13. Pregel (Giraph) • Bulk Synchronous Parallel Model: Compute Communicate Barrier

  14. Problem with Bulk Synchronous • Example Algorithm: If Red neighbor then turn Red • Bulk Synchronous Computation : • Evaluate condition on all vertices for every phase 4 Phases each with 9 computations  36 Computations • Asynchronous Computation (Wave-front) : • Evaluate condition only when neighbor changes 4 Phases each with 2 computations  8 Computations Time 0 Time 2 Time 3 Time 4 Time 1

  15. The Need for a New Abstraction • Map-Reduce is not well suited for Graph-Parallelism Data-Parallel Graph-Parallel Map Reduce Pregel (Giraph) Feature Extraction Cross Validation Belief Propagation Kernel Methods SVM Computing Sufficient Statistics Tensor Factorization PageRank Lasso Neural Networks Deep Belief Networks

  16. What is GraphLab?

  17. The GraphLab Framework Scheduler Graph Based Data Representation Update Functions User Computation Consistency Model

  18. Data Graph A graph with arbitrary data (C++ Objects) associated with each vertex and edge. • Graph: • Social Network • Vertex Data: • User profile text • Current interests estimates • Edge Data: • Similarity weights

  19. Update Functions An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex label_prop(i, scope){ // Get Neighborhood data (Likes[i], Wij, Likes[j]) scope; // Update the vertex data // Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }

  20. The Scheduler The scheduler determines the order that vertices are updated. b d a c CPU 1 c b e f g Scheduler e f b a i k h j i h i j CPU 2 The process repeats until the scheduler is empty.

  21. The GraphLab Framework Scheduler Graph Based Data Representation Update Functions User Computation Consistency Model

  22. Ensuring Race-Free Code • How much can computation overlap?

  23. GraphLab Ensures Sequential Consistency For each parallel execution, there exists a sequential execution of update functions which produces the same result. time CPU 1 Parallel CPU 2 Single CPU Sequential

  24. Consistency Rules Full Consistency Data Guaranteed sequential consistency for all update functions

  25. Full Consistency Full Consistency

  26. Obtaining More Parallelism Full Consistency Edge Consistency

  27. Edge Consistency Edge Consistency Safe Read CPU 1 CPU 2

  28. Consistency Through R/W Locks • Read/Write locks: • Full Consistency • Edge Consistency Write Write Write Canonical Lock Ordering Read Read Write Read Write

  29. GraphLab for Natural Graphs the Achilles heel

  30. Problem: High Degree Vertices • Graphs with high degree vertices are common: • Power-Law Graphs (Social Networks): • Affects algorithms like label-propagation • Probabilistic Graphical Models: • Hyper-parameters which couple large sets of data • Connectivity structure induced by natural phenomena • High degree vertices kill parallelism: Pull a Large Amount of State Requires Heavy Locking Processed Sequentially

  31. Proposed Solutions • Decomposable Update Functors • Expose greater parallelism by further factoring update functions • Abelian Group Caching (concurrent revisions) • Allows for controllable races through diff operations • Stochastic Scopes • Reduce degree through sampling

  32. Decomposable Update Functors Breaking computation over edges of the graph.

  33. Decomposable Update Functors • Decompose update functions into 3 phases: • Locks are acquired only for region within a scope  Relaxed Consistency Gather Apply Scatter Scope Y Y Y Y Apply the accumulated value to center vertex Parallel Sum + + … +  Δ Update adjacent edgesand vertices. Y Y Y Y User Defined: User Defined: User Defined: Apply( , Δ)  Gather( )  Δ Scatter( )  Y Y Δ1 + Δ2 Δ3

  34. Decomposable Update Functors • Implementing Label Propagation with Factorized Update Functions: • Implemented using update functors state as accumulator: • Uses same (+) operator to merge partial gathers: Scatter(i, scope){ // Get Neighborhood data (Likes[i], Wij,Likes[j]) scope; // Reschedule if changed if Likes[i] changed then reschedule(j); } Gather(i,j, scope) // Get Neighborhood data (Likes[i], Wij,Likes[j]) // Emit accumulator emit Wij x Likes[j] as Δ; Apply(i, scope, Δ){ // Get Neighborhood (Likes[i]) scope; // Update the vertex Likes[i]  Δ; } Δ1 + Δ2 Δ3 ( + )( ) F1 F2 F1 F2 Y Y

  35. Decomposable Functors • Fits many algorithms • Loopy Belief Propagation, Label Propagation, PageRank… • Addresses the earlier concerns • Problem:High degree vertices will still get locked frequently and will need to be up-to-date Large State Heavy Locking Sequential Parallel Gather and Scatter Fine Grained Locking Distributed Gather and Scatter

  36. Abelian Group Caching Enabling eventually consistent data races

  37. Abelian Group Caching • Issue: • All earlier methods maintain a sequentially consistent view of data across all processors. • Proposal: Try to split data instead of computation. • How can we split the graph without changing the update function? CPU 4 CPU 3 CPU 1 CPU 2 CPU 2 CPU 3 CPU 4 CPU 1

  38. Insight from WSDM paper • Answer: Allow Eventually Consistent data races • High degree vertices admit slightly “stale” values: • Changes in a few elements  negligible effect • High degreevertex updates typically a form of “sum” operation which has an “inverse” • Example: Counts, Averages, Sufficient statistics • Counter Example: Max • Goal:Lazily synchronize duplicate data • Similar to a version control system • Intermediate values partially consistent • Final value at termination must be consistent

  39. Example • Every processor initial has a copy of the same central value: Master 10 10 10 10 Current Current Current Processor 1 Processor 2 Processor 3

  40. Example • Each processor makes a small change to its value: Master 10 11 10 7 10 13 10 Current Old Current Old Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

  41. Example • Send delta values (Diffs) to the master: Master 10 1 -3 11 10 7 10 13 10 Current Old Current Old Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

  42. Example • Send delta values (Diffs) to the master: Master 10 1 -3 11 7 13 10 Current Current Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

  43. Example • Send delta values (Diffs) to the master: Master 8 1 -3 11 7 13 10 Current Current Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

  44. Example • Master is consistent with first two processors changes Master 8 11 7 13 10 Current Current Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

  45. Example • Master decides to refresh other processors Master 8 8 8 8 8 11 7 13 10 Current Current Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

  46. Example • Master decides to refresh other processors Master 8 8 8 8 8 13 10 Current Current Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

  47. Example • Master decides to refresh other processors Master 8 8 8 +3 8 8 13 10 Current Current Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

  48. Example • Master decides to refresh other processors Master 8 8 8 +3 8 8 8+3 8 Current Current Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

  49. Example • Master decides to refresh other processors Master 8 8 8 8 11 8 Current Current Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

  50. Example • Processor 3 decides to update the master Master 8 8 3 8 8 11 8 Current Current Current Old Processor 1 Processor 2 Processor 3 True Value: 10+ 1 - 3 + 3 = 11

More Related