Danny Bickson

Parallel Machine Learning for Large-Scale Graphs Danny Bickson The GraphLab Team: Yucheng Low Joseph Gonzalez Aapo Kyrola Carlos Guestrin Joe Hellerstein Jay Gu Alex Smola

Parallelism is Difficult • Wide array of different parallel architectures: • Different challenges for each architecture GPUs Multicore Clusters Clouds Supercomputers High Level Abstractions to make things easier

How will wedesign and implementparallel learning systems?

... a popular answer: Map-Reduce / Hadoop Build learning algorithms on-top of high-level parallel abstractions

Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel Map Reduce Label Propagation Lasso Feature Extraction Cross Validation Belief Propagation Kernel Methods Computing Sufficient Statistics Tensor Factorization PageRank Neural Networks Deep Belief Networks

Example of Graph Parallelism

PageRank Example • Iterate: • Where: • αis the random reset probability • L[j] is the number of links on page j 1 2 3 4 5 6

Properties of Graph Parallel Algorithms Dependency Graph LocalUpdates Iterative Computation My Rank Friends Rank

Addressing Graph-Parallel ML • We need alternatives to Map-Reduce Data-Parallel Graph-Parallel Map Reduce Pregel (Giraph)? Map Reduce? SVM Lasso Feature Extraction Cross Validation Belief Propagation Kernel Methods Computing Sufficient Statistics Tensor Factorization PageRank Neural Networks Deep Belief Networks

Pregel (Giraph) • Bulk Synchronous Parallel Model: Compute Communicate Barrier

Problem: Bulk synchronous computation can be highly inefficient

BSP Systems Problem:Curse of the Slow Job Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier

The Need for a New Abstraction • If not Pregel, then what? Data-Parallel Graph-Parallel Map Reduce Pregel (Giraph) Feature Extraction Cross Validation Belief Propagation Kernel Methods SVM Computing Sufficient Statistics Tensor Factorization PageRank Lasso Neural Networks Deep Belief Networks

The GraphLabSolution • Designed specifically for ML needs • Express data dependencies • Iterative • Simplifies the design of parallel programs: • Abstract away hardware issues • Automatic data synchronization • Addresses multiple hardware architectures • Multicore • Distributed • Cloud computing • GPU implementation in progress

What is GraphLab?

The GraphLab Framework Scheduler Graph Based Data Representation Update Functions User Computation Consistency Model

Data Graph A graph with arbitrary data (C++ Objects) associated with each vertex and edge. • Graph: • Social Network • Vertex Data: • User profile text • Current interests estimates • Edge Data: • Similarity weights

Update Functions An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex pagerank(i, scope){ // Get Neighborhood data (R[i], Wij, R[j]) scope; // Update the vertex data // Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); } Dynamic computation

The Scheduler The scheduler determines the order that vertices are updated b d a c CPU 1 c b e f g Scheduler e f b a i k h j i h i j CPU 2 The process repeats until the scheduler is empty

Ensuring Race-Free Code • How much can computation overlap?

Need for Consistency?

Inconsistent ALS Consistent Netflix data, 8 cores

Even Simple PageRank can be Dangerous GraphLab_pagerank(scope) { refsum = scope.center_value sum= 0 forall(neighbor in scope.in_neighbors ) sum = sum + neighbor.value / nbr.num_out_edges sum = ALPHA + (1-ALPHA) * sum …

Inconsistent PageRank

Even Simple PageRank can be Dangerous CPU 1 GraphLab_pagerank(scope) { refsum = scope.center_value sum= 0 forall(neighbor in scope.in_neighbors) sum = sum + neighbor.value / nbr.num_out_edges sum = ALPHA + (1-ALPHA) * sum … Read-write race  CPU 1 reads bad PageRank estimate, as CPU 2 computes value CPU 2 Read

Race Condition Can Be Very Subtle GraphLab_pagerank(scope) { refsum = scope.center_value sum= 0 forall(neighbor in scope.in_neighbors) sum = sum + neighbor.value / neighbor.num_out_edges sum = ALPHA + (1-ALPHA) * sum … Unstable GraphLab_pagerank(scope) { sum = 0 forall(neighbor in scope.in_neighbors) sum = sum + neighbor.value / nbr.num_out_edges sum = ALPHA + (1-ALPHA) * sum scope.center_value= sum … Stable This was actually encountered in user code.

GraphLab Ensures Sequential Consistency For each parallel execution, there exists a sequential execution of update functions which produces the same result. time CPU 1 Parallel CPU 2 Single CPU Sequential

Consistency Rules Full Consistency Data Guaranteed sequential consistency for all update functions

Full Consistency Full Consistency

Obtaining More Parallelism Full Consistency Edge Consistency

Edge Consistency Edge Consistency Safe Read CPU 1 CPU 2

What algorithms are implemented in GraphLab?

Alternating Least Squares SVD Splash Sampler CoEM Bayesian Tensor Factorization Lasso Belief Propagation PageRank LDA SVM Gibbs Sampling Dynamic Block Gibbs Sampling K-Means Matrix Factorization …Many others… Linear Solvers

GraphLab Libraries Matrix factorization SVD,PMF, BPTF, ALS, NMF, Sparse ALS, Weighted ALS, SVD++, time-SVD++, SGD Linear Solvers Jacobi, GaBP, Shotgun Lasso, Sparse logistic regression, CG Clustering K-means, Fuzzy K-means, LDA, K-core decomposition Inference Discrete BP, NBP, Kernel BP

Efficient MulticoreCollaborative FilteringLeBuSiShu team – 5th place in track1 Institute of Automation Chinese Academy of Sciences Yao Wu Qiang Yan Qing Yang Danny Bickson Yucheng Low Machine Learning Dept Carnegie Mellon University ACM KDD CUP Workshop 2011

ACM KDD CUP 2011 • Task: predict music score • Two main challenges: • Data magnitude – 260M ratings • Taxonomy of data

Data taxonomy

Our approach • Use ensemble method • Custom SGD algorithm for handling taxonomy

Ensemble method • Solutions are merged using linear regression

Performance results Blended Validation RMSE: 19.90

Classical Matrix Factorization Users Sparse Matrix d Item

MFITR Features of the Artist Features of the Album Item Specific Features “Effective Feature of an Item” Users d Sparse Matrix

Intuitively, features of an artist and features of his/her album should be “similar”. How do we express this? Artist • Penalty terms which ensure Artist/Album/Track features are “close” • Strength of penalty depends on “normalized rating similarity” (See neighborhood model) Album Track

Fine Tuning Challenge • Dataset has around 260M observed ratings • 12different algorithms, total 53 tunable parameters • How do we train and cross validate all these parameters? • USE GRAPHLAB!

16 Cores Runtime

Speedup plots

Danny Bickson

Danny Bickson

Presentation Transcript

Danny Doc

Danny Gaynor

April by Danny

Danny Reible

Danny Elfman

BY: Danny Dennis

Danny elfman

Danny Elfman

Danny Lylo

Adventures with Danny

T.S. Danny

Danny Boy Ireland

Danny torres

Danny Boyle

Danny Hanny

By Danny

Danny Gaskill

Danny Boice