1 / 63

Concurrency Control for Scalable Bayesian Inference

Concurrency Control for Scalable Bayesian Inference. Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. jegonzal @eecs.berkeley.edu. ISBA’ 2014. A Systems Approach to Scalable Bayesian Inference. Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab

nia
Download Presentation

Concurrency Control for Scalable Bayesian Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concurrency Controlfor Scalable Bayesian Inference Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. jegonzal@eecs.berkeley.edu ISBA’ 2014

  2. A Systems Approachto Scalable Bayesian Inference Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. jegonzal@eecs.berkeley.edu ISBA’ 2014

  3. http://www.domo.com/learn/data-never-sleeps-2

  4. http://www.domo.com/learn/data-never-sleeps-2

  5. http://www.domo.com/learn/data-never-sleeps-2

  6. http://www.domo.com/learn/data-never-sleeps-2

  7. Data Velocityis an opportunity for Bayesian Nonparametrics. How do we scale Bayesian inference?

  8. Opposing Forces Accuracy Scalability Serial Inference Parameter Server Ability to estimate the posterior distribution Ability to effective useparallel resources Coordination Free Samplers

  9. Serial Inference Model State Data

  10. Coordination Free Parallel Inference Model State Processor 1 Data Processor 2

  11. Coordination Free Parallel Inference Model State Processor 1 Data Processor 2 Keep Calm and Carry On.

  12. Parameter Servers System for Coordination Free Inference D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichletallocation. In NIPS, 2007. A. Smola and S. Narayanamurthy. An architecture for parallel topic models.VLDB’10 Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. WSDM '12 Ho et al. “More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server.” NIPS’13

  13. Hierarchical Clustering Global Variables Local Variables

  14. Example: Topic Modeling with LDA Word Dist. by Topic Local Variables Documents Tokens Maintained by the Parameter Server Maintained by the Workers Nodes

  15. Ex: Collapsed Gibbs Sampler for LDA Parameter Server Parameter Server Parameter Server • Partitioning the model and data • W10k:20K • W20k:30K • W1:10K w4 w5 w6 w1 w2 w3 w7 w9 w8 w4 w6 w8 w3 w5 w9 w1 w7 w2

  16. Ex: Collapsed Gibbs Sampler for LDA Parameter Server Parameter Server Parameter Server • Partitioning the model and data • W10k:20K • W20k:30K • W1:10K Parameter Cache Parameter Cache Parameter Cache Parameter Cache • Car • Cat • Mac • Car • iOS • Gas • Dog • Rim • $$ • Cat • VW • Bat Zoo • iPod bmw • Pig

  17. Ex: Collapsed Gibbs Sampler for LDA Parameter Server Parameter Server Parameter Server • Inconsistent model replicas • W10k:20K • W20k:30K • W1:10K Inconsistent Values Parameter Cache Parameter Cache Parameter Cache Parameter Cache • Cat • Cat • Cat • Car • Car • Mac • Rim • Dog • iOS • Gas • Bat • VW • $$ • Cat • bmw Zoo Pig • iPod

  18. Parallel Gibbs SamplingIncorrect Posterior dependent variables cannot in general be sampled simultaneously. Strong Positive Correlation t=1 t=2 t=3 Strong Negative Correlation t=0 Sequential Execution Strong Positive Correlation Parallel Execution

  19. Issues with Nonparametrics • Difficult to introduce new clusters asynchronously: • Leads to too many clusters! Parameter Server Parameter Server Parameter Server Create Cluster 7 Create Cluster 7 7 7

  20. Opposing Forces Serial Inference Accuracy Scalability Parameter Server Asynchronous Samplers

  21. Opposing Forces Accuracy Scalability Accuracy Scalability

  22. Opposing Forces Serial Inference Accuracy Parameter Server Asynchronous Samplers Scalability

  23. Opposing Forces • ? Concurrency Control Serial Inference Accuracy Parameter Server Asynchronous Samplers Scalability

  24. Concurrency Control Coordination Free (Parameter Server): Provably fast and correct under key assumptions. Concurrency Control: Provably correct and fast under key assumptions. Systems Ideas to Improve Efficiency

  25. Opposing Forces Concurrency Control Serial Inference Mutual Exclusion Optimistic Concurrency Control Accuracy Parameter Server Unsafe Safe Asynchronous Samplers Scalability

  26. Mutual Exclusion Conditional Independence J. Gonzalez, Y. Low, A. Gretton, and C. Guestrin. Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees. AISTATS’11 Exploit the Markov random field for Parallel Gibbs Sampling Graph Coloring R/W Lock Mutual Exclusion

  27. Mutual Exclusion through SchedulingChromatic Gibbs Sampler • Compute a k-coloring of the graphical model • Sample all variables with same color in parallel • Serial Equivalence: Time

  28. Theorem:Chromatic Sampler • Ergodic: converges to the correct distribution • Based on graph coloring of the Markov Random Field • Quantifiable acceleration in mixing # Variables Time to updateall variables once # Colors # Processors

  29. Mutual Exclusion Through Locking Model State Processor 1 Data Processor 2 Introducing locking (scheduling) protocols to identify potential conflicts.

  30. Mutual Exclusion Through Locking Model State ✗ Processor 1 Data Processor 2 Enforce serialization of computation that could conflict.

  31. Markov Blanket Locks • Read/Write Locks: R W R R R W R R R R

  32. Markov Blanket Locks • Eliminate fixed schedule and global coordination • Supports more advanced block sampling • Expected Parallelism: Max Degree # Processors # Variables

  33. A System for Mutual Exclusion onMarkov Random Fields • GraphLab/PowerGraph [UAI’10, OSDI’12]: • Chromatic Sampling • Markov Blanket Locks + Block Sampling

  34. LimitationDensely Connected MRF • V-Structures: observations couple many variables • Collapsed models: clique-like MRFs • Mutual exclusion pessimistically serializes computation that couldinterfere. • Can we be optimisticand only serialize computation that doesinterfere?

  35. Opposing Forces Serial Inference Mutual Exclusion Optimistic Concurrency Control Accuracy Parameter Server Unsafe Safe Asynchronous Samplers Scalability

  36. Optimistic Concurrency Control assume the best and correct X. Pan, J. Gonzalez, S. Jegelka, T. Broderick, M. Jordan. Optimistic Concurrency Control for Distributed Unsupervised Learning. NIPS’13 Xinghao Pan Tamara Broderick Stefanie Jegelka Michael Jordan

  37. Optimistic Concurrency Control • Classic idea from Database Systems: • Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems. 1981 • Assume most operations won’t conflict: • Execute operations without blocking Frequent case is fast • Identify and resolve conflicts after they occur Infrequent case with potentially costly resolution

  38. Optimistic Concurrency Control Model State Processor 1 Data Processor 2 Allow computation to proceed without blocking. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981

  39. Optimistic Concurrency Control Model State ✔ ? Valid outcome Processor 1 Data Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981

  40. Optimistic Concurrency Control Invalid Outcome ✗ ? ? ✗ Model State Processor 1 Data Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981

  41. Optimistic Concurrency Control Amend the Value ✗ ✗ Model State Processor 1 Data Processor 2 Take a compensating action. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981

  42. Optimistic Concurrency Control Invalid Outcome ✗ ✗ Model State Processor 1 Data Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981

  43. Optimistic Concurrency Control Rollback and Redo ✗ ✗ Model State Processor 1 Data Processor 2 Take a compensating action. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981

  44. Optimistic Concurrency Control Rollback and Redo Model State Processor 1 Data Processor 2 Non-Blocking Computation Concurrency Requirements: Validation:Identify Errors Resolution: Correct Errors Fast Accuracy Infrequent

  45. Optimistic Concurrency Controlfor Bayesian Inference • Non-parametric Models [Pan et al., NIPS’13]: • OCC DP-Means: Dirichlet Process Clustering • OCC BP-Means: Beta Process Feature Learning • Conditional Sampling: (In Progress) • Collapsed Gibbs LDA • Retrospective Sampling for HDP

  46. DP-Means Algorithm [Kulis and Jordan, ICML’12] • Start with DP Gaussian mixture model: • small variance limit

  47. DP-Means Algorithm [Kulis and Jordan, ICML’12] • Start with DP Gaussian mixture model: • small variance limit redefine : Decreases Rapidly

  48. DP-Means Algorithm [Kulis and Jordan, ICML’12] • Corresponding Gibbs sampler conditionals: • Taking the small variance limit

  49. DP-Means Algorithm [Kulis and Jordan, ICML’12] • Gibbs updates become deterministic: • Taking the small variance limit

  50. DP-Means Algorithm [Kulis and Jordan, ICML’12] • Gibbs updates become deterministic:

More Related