GraphLab : how I understood it with sample code

GraphLab: how I understood it with sample code AapoKyrola, Carnegie Mellon Univ. Oct 1, 2009

To test if I got your idea… • … I created two imaginary GraphLab sample applications by using imaginary GraphLab Java API • Is this how you imagined GraphLab applications would look like?

Technology layers GraphLab OpenGL OpenGL API Maintained by Khronos group glVertex3f(a,b,c,d) glTransform(…) OpenGL graphics card drivers By Nvidia, ATI, …; interface with their hardware • GraphLab API • Defined and maintained by us • GraphLab Engine • Reference implementation done by us • Others encouraged to implement their own

Contents • GraphLab sample code for belief propagation –based inference • ML practitioner’s (end-user’s) point of view • What happens in the Engine? • Sample code for stochastic matrix eigenvector calculation by iteration • Issue with syncs and aggregation functions

Note about BP • Bishop’s text uses BP on bipartite graph (variable + factor nodes), while Keller’s book uses Cluster Factor graphs • I will use Keller’s representation because it is simpler

Sample program • User has huge Bayes network that models weather in USA • He knows it is 37F in Philadelphia and it rained yesterday in Pittsburgh, and it is October (evidence) • What is the probability of rain in Pittsburgh today? • See main() below (no GraphLab –stuff here yet)

Initialization of BP • Create cluster factor graph with special nodes for Belief Propagation (have storage for messages; edges contain the shared variables between factors) BayesNetwork and ClusterFactorGraph are classes defined by the GraphLab API or/and extend some more abstract Graph class This implicitly marks each node ‘dirty’ (which the engine will add to task queue)

Node function (kernel) • To run the distributed BP algorithm, we need to define function (kernel) that runs on each factor node --- always when the factor is “dirty” (task queue is not visible?) Only if message changes significantly, do we send it. Sending a message flags recipient as dirty -> it will be added to task queue. Note: edge might be remote or local, depending on graph partitioning. Kernel may or may not care about it. (For example, threshold could be higher for remote edges?)

Executing the engine • User can execute the algorithm on different GraphLab implementations • Data cluster version, multicore version, GPU version, DNA computer version, Quantum computer etc. • Advanced users can customize graph partitioning algorithm, scheduling priority, timeout etc. • For example, loopy BP may not converge everywhere, but still be usable?? Need timeout our relaxed convergence criteria. After lightning fast computation, we have calibrated belief network. We can use this to efficiently ask marginal distributions.

What Engine does? • Client sends the graph data and functions to be run to the Computation Parent • How code is delivered? Good question. In Java, easy to send class files. • Graph is partitioned to logical partitions (minimizing of links between partitions) • Edges that cross partitions are made into remote edges • Each CPU is assigned one or more logical partitions by the Computation Parent • In each logical partition, computation is done sequentially • In the beginning of each iteration, partition collects the dirty nodes (-> taskqueue(T)) • … and calls each dirty node with node function sequentially • This will result into new set of dirty nodes (-> taskqueue(T+1)) • via remote edges, nodes in other partitions are flagged dirty • Computation Parent monitors each logical partition for number of dirty nodes • When dirty count is zero or under defined limit, computation is finished. • Graph state in the end of computation is sent back to client. Next example of eigenvalue calculation shows how we can calculate partition-level accumulative functions efficiently and deliver them to the central unit Note: in this model, nodes are not able to read from other nodes. Instead they can send data to other nodes, which can then cache this information.

A posteriori

Stochastic Matrix Eigenvector • Task: to iterate x = Mx, where x is a probability distribution over (X1..Xn) and M is a stochastic matrix (Markov transition matrix), until we reach convergence (“fixed point”) • Existence of eigenvector (limit distribution) is guaranteed in all but pathological cases? (= periodic chains?) • Running iteration in parallel is not stable because of “feedback loops” • In serial computation, |Mx| = 1 (norm is L1 norm, right?) • Normalization factor is needed to keep computation in control • But calculation of |Mx| needs input from all Xi synchronously • Sync is costly, so we want to do this infrequently • how well is the effect studied? Are the some runaway problems? Two players in Markov’s chain talking

Normalization • Each logical partition has its on SumAccumulator • This is passed to each node on function computation. Node discounts its previous value and adds new (=> we need not to enumerate al nodes to get an updated sum) • After iteration, partition sends its accumulator value to the computation parent, which has its own SumAccumulator • Amount of remote accumulator communication = N(num of partitions) • Before each iteration, partition queries parent for current value of normalization. This is passed to all nodes when node function is computed. • If normalization factor changes significantly, all nodes are renormalized. But does it work? Good question!

Initialization

Node Function Invokes update on outbound nodes only if its value changed significantly. When converging, there are less and less dirty nodes.

Partition Interceptor Interceptor-idea is copied from certain web application frameworks

Computation Parent code

Putting it together…

GraphLab : how I understood it with sample code

GraphLab : how I understood it with sample code

Presentation Transcript

OVERVIEW OF SAMPLE SURVEYS

T-DAP MAP 32

Malicious Code for Fun and Profit

What is the probability that a person chosen at random from those in the sample will be in the 31-45 age category?

CPT Changes for 2005

Tackling the Da Vinci Code 2

Motivation as Understood Through Self-Determination Theory

Ethics in Refugee Representation: Understanding the Nairobi Code

Forest Mensuration II

Code Generation

Carlos Guestrin

Code-switching

8 Intermediate code generation

Kinematics of Rigid Bodies

Touch of Class Home Improvements, Inc. Slide Show Presentation

CAPILLARY GC INLETS

Sampling and Sample Size in Epidemiology

Introduction to Large-Scale Graph Computation

Indian Electricity Grid Code

Clean Coders Hate What Happens To Your Code When You Use These Enterprise Programming Tricks