1 / 12

GraphLab A New Parallel Framework for Machine Learning

GraphLab A New Parallel Framework for Machine Learning. Carnegie Mellon Based on Slides by Joseph Gonzalez Mosharaf Chowdhury. The Need for a New Abstraction. Data-Parallel Graph-Parallel. Map Reduce. Pregel ( Giraph ). Feature Extraction. Cross Validation. Belief Propagation.

tallys
Download Presentation

GraphLab A New Parallel Framework for Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GraphLabA New Parallel Framework for Machine Learning Carnegie Mellon Based on Slides by Joseph Gonzalez Mosharaf Chowdhury

  2. The Need for a New Abstraction Data-ParallelGraph-Parallel Map Reduce Pregel (Giraph) Feature Extraction Cross Validation Belief Propagation Kernel Methods SVM Computing Sufficient Statistics Tensor Factorization PageRank Lasso Neural Networks Deep Belief Networks

  3. GraphLab wants to support • Sparse Computational Dependencies • Asynchronous Iterative Computation • Sequential Consistency • Prioritized Ordering • Rapid Development

  4. The GraphLab Framework Scheduler Graph Based Data Representation Update Functions User Computation Consistency Model

  5. Data Graph A graph with arbitrary data (C++ Objects) associated with each vertex and edge. • Graph: • Social Network • Vertex Data: • User profile text • Current interests estimates • Edge Data: • Similarity weights

  6. Update Functions An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex label_prop(i, scope){ // Get Neighborhood data (Likes[i], Wij, Likes[j]) scope; // Update the vertex data // Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }

  7. The Scheduler The scheduler determines the order that vertices are updated. b d a c CPU 1 c b e f g Scheduler e f b a i k h j i h i j CPU 2 The process repeats until the scheduler is empty.

  8. Sequential Consistency Models • Full Consistency • Edge Consistency Write Write Write Canonical Lock Ordering Read Read Write Read Write

  9. Consistency Through Scheduling • Edge Consistency Model: • Two vertices can be Updated simultaneously if they do not share an edge. • Graph Coloring: • Two vertices can be assigned the same color if they do not share an edge. Phase 1 Phase 2 Phase 3 Barrier Barrier Barrier

  10. Algorithms Implemented • PageRank • Loopy Belief Propagation • Gibbs Sampling • CoEM • Graphical Model Parameter Learning • Probabilistic Matrix/Tensor Factorization • Alternating Least Squares • Lasso with Sparse Features • Support Vector Machines with Sparse Features • Label-Propagation • …

  11. The Table

More Related