1 / 18

GraphLab : how I understood it with sample code

GraphLab : how I understood it with sample code. Aapo Kyrola , Carnegie Mellon Univ. Oct 1, 2009. To test if I got your idea…. … I created two imaginary GraphLab sample applications by using imaginary GraphLab Java API Is this how you imagined GraphLab applications would look like?.

maeve
Download Presentation

GraphLab : how I understood it with sample code

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GraphLab: how I understood it with sample code AapoKyrola, Carnegie Mellon Univ. Oct 1, 2009

  2. To test if I got your idea… • … I created two imaginary GraphLab sample applications by using imaginary GraphLab Java API • Is this how you imagined GraphLab applications would look like?

  3. Technology layers GraphLab OpenGL OpenGL API Maintained by Khronos group glVertex3f(a,b,c,d) glTransform(…) OpenGL graphics card drivers By Nvidia, ATI, …; interface with their hardware • GraphLab API • Defined and maintained by us • GraphLab Engine • Reference implementation done by us • Others encouraged to implement their own

  4. Contents • GraphLab sample code for belief propagation –based inference • ML practitioner’s (end-user’s) point of view • What happens in the Engine? • Sample code for stochastic matrix eigenvector calculation by iteration • Issue with syncs and aggregation functions

  5. Note about BP • Bishop’s text uses BP on bipartite graph (variable + factor nodes), while Keller’s book uses Cluster Factor graphs • I will use Keller’s representation because it is simpler

  6. Sample program • User has huge Bayes network that models weather in USA • He knows it is 37F in Philadelphia and it rained yesterday in Pittsburgh, and it is October (evidence) • What is the probability of rain in Pittsburgh today? • See main() below (no GraphLab –stuff here yet)

  7. Initialization of BP • Create cluster factor graph with special nodes for Belief Propagation (have storage for messages; edges contain the shared variables between factors) BayesNetwork and ClusterFactorGraph are classes defined by the GraphLab API or/and extend some more abstract Graph class This implicitly marks each node ‘dirty’ (which the engine will add to task queue)

  8. Node function (kernel) • To run the distributed BP algorithm, we need to define function (kernel) that runs on each factor node --- always when the factor is “dirty” (task queue is not visible?) Only if message changes significantly, do we send it. Sending a message flags recipient as dirty -> it will be added to task queue. Note: edge might be remote or local, depending on graph partitioning. Kernel may or may not care about it. (For example, threshold could be higher for remote edges?)

  9. Executing the engine • User can execute the algorithm on different GraphLab implementations • Data cluster version, multicore version, GPU version, DNA computer version, Quantum computer etc. • Advanced users can customize graph partitioning algorithm, scheduling priority, timeout etc. • For example, loopy BP may not converge everywhere, but still be usable?? Need timeout our relaxed convergence criteria. After lightning fast computation, we have calibrated belief network. We can use this to efficiently ask marginal distributions.

  10. What Engine does? • Client sends the graph data and functions to be run to the Computation Parent • How code is delivered? Good question. In Java, easy to send class files. • Graph is partitioned to logical partitions (minimizing of links between partitions) • Edges that cross partitions are made into remote edges • Each CPU is assigned one or more logical partitions by the Computation Parent • In each logical partition, computation is done sequentially • In the beginning of each iteration, partition collects the dirty nodes (-> taskqueue(T)) • … and calls each dirty node with node function sequentially • This will result into new set of dirty nodes (-> taskqueue(T+1)) • via remote edges, nodes in other partitions are flagged dirty • Computation Parent monitors each logical partition for number of dirty nodes • When dirty count is zero or under defined limit, computation is finished. • Graph state in the end of computation is sent back to client. Next example of eigenvalue calculation shows how we can calculate partition-level accumulative functions efficiently and deliver them to the central unit Note: in this model, nodes are not able to read from other nodes. Instead they can send data to other nodes, which can then cache this information.

  11. A posteriori

  12. Stochastic Matrix Eigenvector • Task: to iterate x = Mx, where x is a probability distribution over (X1..Xn) and M is a stochastic matrix (Markov transition matrix), until we reach convergence (“fixed point”) • Existence of eigenvector (limit distribution) is guaranteed in all but pathological cases? (= periodic chains?) • Running iteration in parallel is not stable because of “feedback loops” • In serial computation, |Mx| = 1 (norm is L1 norm, right?) • Normalization factor is needed to keep computation in control • But calculation of |Mx| needs input from all Xi synchronously • Sync is costly, so we want to do this infrequently • how well is the effect studied? Are the some runaway problems? Two players in Markov’s chain talking

  13. Normalization • Each logical partition has its on SumAccumulator • This is passed to each node on function computation. Node discounts its previous value and adds new (=> we need not to enumerate al nodes to get an updated sum) • After iteration, partition sends its accumulator value to the computation parent, which has its own SumAccumulator • Amount of remote accumulator communication = N(num of partitions) • Before each iteration, partition queries parent for current value of normalization. This is passed to all nodes when node function is computed. • If normalization factor changes significantly, all nodes are renormalized. But does it work? Good question!

  14. Initialization

  15. Node Function Invokes update on outbound nodes only if its value changed significantly. When converging, there are less and less dirty nodes.

  16. Partition Interceptor Interceptor-idea is copied from certain web application frameworks

  17. Computation Parent code

  18. Putting it together…

More Related