580 likes | 755 Views
Joseph Gonzalez Postdoc, UC Berkeley AMPLab jegonzal@eecs.berkeley.edu. 2. A System for Distributed Graph-Parallel Machine Learning. The Team :. Yucheng Low. Aapo Kyrola. Danny Bickson. Haijie Gu. Carlos Guestrin. Alex Smola. Guy Blelloch. Graphs. Data. Big. More Noise.
E N D
Joseph Gonzalez Postdoc, UC Berkeley AMPLab jegonzal@eecs.berkeley.edu 2 A System for Distributed Graph-Parallel Machine Learning The Team: Yucheng Low Aapo Kyrola Danny Bickson Haijie Gu Carlos Guestrin Alex Smola Guy Blelloch
Graphs Data Big More Noise More Signal
Social Media Web Advertising Science • Graphsencoderelationships between: • Big: billions of vertices and edgesand rich metadata People Products Ideas Facts Interests
Graphs are Essential to Data-Mining and Machine Learning • Identify influential people and information • Find communities • Target ads and products • Model complex data dependencies
Estimate Political Bias ? ? Liberal Conservative ? ? ? ? ? ? ? ? ? Post ? Post Post Post ? Post Post Post Post ? Post Post Post Post ? ? ? Post ? Post ? ? ? Post ? Post Post Post Post Conditional Random Field Belief Propagation Post Post Post ? ? ? ? ? ? ? ?
Triangle Counting • For each vertex in graph, countnumber of triangles containing it • Measures both “popularity” of the vertex and “cohesiveness” of the vertex’s community: Fewer Triangles Weaker Community More Triangles Stronger Community
Collaborative Filtering: Exploiting Dependencies Women on the Verge of aNervous Breakdown The Celebration City of God What do I recommend??? recommend Wild Strawberries La Dolce Vita
PageRank • Everyone starts with equal ranks • Update ranks in parallel • Iterate until convergence Rank of user i Weighted sum of neighbors’ ranks
How should we programgraph-parallel algorithms? Low-level tools like MPI and Pthreads? - Me, during my first years of grad school
Threads, Locks, and MPI Graduatestudents • ML experts repeatedly solve the same parallel design challenges: • Implement and debug complex parallel system • Tune for a single parallel platform • Six months later the conference paper contains: “We implemented ______ in parallel.” • The resulting code: • is difficult to maintain and extend • couples learning modeland implementation
How should we programgraph-parallel algorithms? High-level Abstractions! - Me, now
The Graph-Parallel Pattern • A user-defined Vertex-Programruns on each vertex • Graph constrains interaction along edges • Using messages (e.g. Pregel[PODC’09, SIGMOD’10]) • Through shared state (e.g., GraphLab[UAI’10, VLDB’12, OSDI’12]) • Parallelism: run multiple vertex programs simultaneously “Think like a Vertex.” -Malewicz et al. [SIGMOD’10]
Better for Machine Learning Graph-parallel Abstractions Pregel Messaging Shared State i i Synchronous Dynamic Asynchronous
The GraphLab Vertex Program Vertex Programs directly access adjacent vertices and edges R[4] * w41 GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach( j in neighbors(i)): total = total + R[j] * wji // Update the PageRank R[i] = 0.15 + total // Trigger neighbors to run again if R[i] not converged then signal nbrsOf(i) to be recomputed + 4 1 R[2] * w21 R[3] * w31 + 3 2 Signaled vertices are recomputed eventually.
GraphLabAsynchronousExecution The scheduler determines the order that vertices are executed b d a c CPU 1 c b e f g Scheduler e f b a i k h j i h i j CPU 2 Scheduler can prioritize vertices.
Benefit of Dynamic PageRank Better 51% updated only once!
Asynchronous Belief Propagation Challenge = Boundaries Many Updates Synthetic Noisy Image Few Updates Cumulative Vertex Updates Algorithm identifies and focuses on hidden sequential structure Graphical Model
GraphLab Ensures a Serializable Execution • Needed for Gauss-Seidel, Gibbs Sampling, Graph Coloring, … Conflict Edge Conflict Edge
Never Ending Learner Project (CoEM) • Language modeling: named entity recognition 6x fewer CPUs! 15x Faster! 0.3% of Hadoop time
Thus far… GraphLabprovided apowerful new abstraction But… We couldn’t scale up to AltavistaWebgraphfrom 2002 1.4B vertices, 6.6B edges
Properties of Natural Graphs Regular Mesh Natural Graph Power-Law Degree Distribution
Power-Law Degree Distribution High-Degree Vertices Top 1% of vertices are adjacent to 50% of the edges! -Slope = α≈ 2 Number of Vertices More than 108 vertices have one neighbor. AltaVista WebGraph 1.4B Vertices, 6.6B Edges Degree
Power-Law Degree Distribution “Star Like” Motif President Obama Followers
Challenges of High-Degree Vertices Sequentially processedges Touches a large fraction of graph Provably Difficult to Partition CPU 1 CPU 2
Random Partitioning • GraphLab resorts to random (hashed) partitioning onnatural graphs Machine 1 Machine 2 10 Machines 90% of edges cut 100 Machines 99% of edges cut!
GraphLab2 Program For This • Split High-Degree vertices • New AbstractionEquivalence on Split Vertices Run on This Machine 1 Machine 2
A Common Pattern forVertex-Programs GraphLab_PageRank(i) // Compute sum over neighbors total = 0 foreach( j in neighbors(i)): total = total + R[j] * wji // Update the PageRank R[i] = total // Trigger neighbors to run again priority = |R[i] – oldR[i]| if R[i] not converged then signal neighbors(i) with priority Gather InformationAbout Neighborhood Update Vertex Signal Neighbors & Modify Edge Data
Pattern Abstraction • Gather(SrcV, Edge, DstV) A • Collect information from a neighbor • Sum(A, A) A • Commutative associative sum • Apply(V, A) V • Update the vertex • Scatter(SrcV, Edge, DstV) (Edge, signal) • Update edges and signal neighbors Gather InformationAbout Neighborhood Update Vertex Signal Neighbors & Modify Edge Data
PageRank in GraphLab2 GraphLab2_PageRank(i) Gather( j i ) : return wji* R[j] sum(a, b) : return a + b; Apply(i, Σ) : R[i] = 0.15 + Σ Scatter( i j ) :if R[i]changed then trigger jto be recomputed
GAS Decomposition Mirror Mirror Master Mirror Machine 1 Machine 2 Gather Y’ Y’ Y Y Y’ Y’ Y Σ Σ1 Σ2 Y + + + Apply Machine 3 Machine 4 Σ3 Σ4 Scatter
Minimizing Communication in PowerGraph Communication is linear in the number of machines each vertex spans New Theorem:For anyedge-cut we can directly construct a vertex-cut which requires strictly less communication and storage. Y Y Y A vertex-cut minimizes machines each vertex spans • Percolation theory suggests that power law graphs have good vertex cuts. [Albert et al. 2000]
Constructing Vertex-Cuts • Evenly assign edges to machines • Minimize machines spanned by each vertex • Assign each edge as itis loaded • Touch each edge only once • Propose two distributed approaches: • Random Vertex Cut • Greedy Vertex Cut
Random Vertex-Cut • Randomly assign edges to machines Machine 1 Machine 2 Machine 3 Balanced Vertex-Cut Y Spans 3 Machines Y Z Z Spans 2 Machines Y Y Y Z Y Y Y Y Y Z Not cut!
Analysis Random Edge-Placement • Expected number of machines spanned by a vertex: Twitter Follower Graph 41 Million Vertices 1.4 Billion Edges Accurately Estimate Memory and Comm. Overhead
Random Vertex-Cuts vs. Edge-Cuts • Expected improvement from vertex-cuts: Order of Magnitude Improvement
Streaming Greedy Vertex-Cuts • Place edges on machines which already have the vertices in that edge. Machine1 Machine 2 A B B C A B D E
Greedy Vertex-Cuts Improve Performance Greedy partitioning improves computation performance.
System Design • Implemented as C++ API • Uses HDFS for Graph Input and Output • Fault-tolerance is achieved by check-pointing • Snapshottime < 5 seconds for twitter network PowerGraph (GraphLab2) System MPI/TCP-IP PThreads HDFS EC2 HPC Nodes
Implemented Many Algorithms • Collaborative Filtering • Alternating Least Squares • Stochastic Gradient Descent • Statistical Inference • Loopy Belief Propagation • Max-Product Linear Programs • Gibbs Sampling • Graph Analytics • PageRank • Triangle Counting • Shortest Path • Graph Coloring • K-core Decomposition • Computer Vision • Image stitching • Language Modeling • LDA
PageRank on Twitter Follower Graph Natural Graph with 40M Users, 1.4 Billion Links Order of magnitude by exploiting properties of Natural Graphs GraphLab2 Hadoop results from [Kang et al. '11] Twister (in-memory MapReduce) [Ekanayake et al. ‘10]
GraphLab2 is Scalable Yahoo Altavista Web Graph (2002): One of the largest publicly available web graphs 1.4Billion Webpages, 6.6 Billion Links 7 Seconds per Iter. 1B links processed per second 30 lines of user code 64 HPC Nodes 1024 Cores (2048 HT)
Topic Modeling • English language Wikipedia • 2.6M Documents, 8.3M Words, 500M Tokens • Computationally intensive algorithm 100 Yahoo! Machines Specifically engineered for this task 64 cc2.8xlarge EC2 Nodes 200 lines of code & 4 human hours GraphLab2
Triangle Counting on The Twitter Graph Identify individuals with strong communities. Counted: 34.8 Billion Triangles Hadoop[WWW’11] 1536 Machines 423 Minutes 282 x Faster 64 Machines 1.5 Minutes GraphLab2 Why? Wrong Abstraction Broadcast O(degree2) messages per Vertex S. Suri and S. Vassilvitskii, “Counting triangles and the curse of the last reducer,” WWW’11
Machine Learning and Data-Mining Toolkits Graph Analytics GraphicalModels ComputerVision Clustering TopicModeling CollaborativeFiltering MPI/TCP-IP PThreads HDFS GraphLab2 System EC2 HPC Nodes http://graphlab.org Apache 2 License
GraphChi: Going small with GraphLab Solve huge problems on small or embedded devices? Key: Exploit non-volatile memory (starting with SSDs and HDs)
Triangle Counting in Twitter Graph Total: 34.8 Billion Triangles 40M Users 1.2B Edges 1536 Machines 423 Minutes 59 Minutes, 1 Mac Mini! 64 Machines, 1024 Cores 1.5 Minutes GraphLab2 Hadoop results from [Suri & Vassilvitskii '11]
2 Machine 1 Machine 2 New ways to representreal-world graphs New ways executegraph algorithms Orders of magnitude improvements over existing systems