630 likes | 761 Views
Concurrency Control for Scalable Bayesian Inference. Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. jegonzal @eecs.berkeley.edu. ISBA’ 2014. A Systems Approach to Scalable Bayesian Inference. Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab
E N D
Concurrency Controlfor Scalable Bayesian Inference Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. jegonzal@eecs.berkeley.edu ISBA’ 2014
A Systems Approachto Scalable Bayesian Inference Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. jegonzal@eecs.berkeley.edu ISBA’ 2014
Data Velocityis an opportunity for Bayesian Nonparametrics. How do we scale Bayesian inference?
Opposing Forces Accuracy Scalability Serial Inference Parameter Server Ability to estimate the posterior distribution Ability to effective useparallel resources Coordination Free Samplers
Serial Inference Model State Data
Coordination Free Parallel Inference Model State Processor 1 Data Processor 2
Coordination Free Parallel Inference Model State Processor 1 Data Processor 2 Keep Calm and Carry On.
Parameter Servers System for Coordination Free Inference D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichletallocation. In NIPS, 2007. A. Smola and S. Narayanamurthy. An architecture for parallel topic models.VLDB’10 Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. WSDM '12 Ho et al. “More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server.” NIPS’13
Hierarchical Clustering Global Variables Local Variables
Example: Topic Modeling with LDA Word Dist. by Topic Local Variables Documents Tokens Maintained by the Parameter Server Maintained by the Workers Nodes
Ex: Collapsed Gibbs Sampler for LDA Parameter Server Parameter Server Parameter Server • Partitioning the model and data • W10k:20K • W20k:30K • W1:10K w4 w5 w6 w1 w2 w3 w7 w9 w8 w4 w6 w8 w3 w5 w9 w1 w7 w2
Ex: Collapsed Gibbs Sampler for LDA Parameter Server Parameter Server Parameter Server • Partitioning the model and data • W10k:20K • W20k:30K • W1:10K Parameter Cache Parameter Cache Parameter Cache Parameter Cache • Car • Cat • Mac • Car • iOS • Gas • Dog • Rim • $$ • Cat • VW • Bat Zoo • iPod bmw • Pig
Ex: Collapsed Gibbs Sampler for LDA Parameter Server Parameter Server Parameter Server • Inconsistent model replicas • W10k:20K • W20k:30K • W1:10K Inconsistent Values Parameter Cache Parameter Cache Parameter Cache Parameter Cache • Cat • Cat • Cat • Car • Car • Mac • Rim • Dog • iOS • Gas • Bat • VW • $$ • Cat • bmw Zoo Pig • iPod
Parallel Gibbs SamplingIncorrect Posterior dependent variables cannot in general be sampled simultaneously. Strong Positive Correlation t=1 t=2 t=3 Strong Negative Correlation t=0 Sequential Execution Strong Positive Correlation Parallel Execution
Issues with Nonparametrics • Difficult to introduce new clusters asynchronously: • Leads to too many clusters! Parameter Server Parameter Server Parameter Server Create Cluster 7 Create Cluster 7 7 7
Opposing Forces Serial Inference Accuracy Scalability Parameter Server Asynchronous Samplers
Opposing Forces Accuracy Scalability Accuracy Scalability
Opposing Forces Serial Inference Accuracy Parameter Server Asynchronous Samplers Scalability
Opposing Forces • ? Concurrency Control Serial Inference Accuracy Parameter Server Asynchronous Samplers Scalability
Concurrency Control Coordination Free (Parameter Server): Provably fast and correct under key assumptions. Concurrency Control: Provably correct and fast under key assumptions. Systems Ideas to Improve Efficiency
Opposing Forces Concurrency Control Serial Inference Mutual Exclusion Optimistic Concurrency Control Accuracy Parameter Server Unsafe Safe Asynchronous Samplers Scalability
Mutual Exclusion Conditional Independence J. Gonzalez, Y. Low, A. Gretton, and C. Guestrin. Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees. AISTATS’11 Exploit the Markov random field for Parallel Gibbs Sampling Graph Coloring R/W Lock Mutual Exclusion
Mutual Exclusion through SchedulingChromatic Gibbs Sampler • Compute a k-coloring of the graphical model • Sample all variables with same color in parallel • Serial Equivalence: Time
Theorem:Chromatic Sampler • Ergodic: converges to the correct distribution • Based on graph coloring of the Markov Random Field • Quantifiable acceleration in mixing # Variables Time to updateall variables once # Colors # Processors
Mutual Exclusion Through Locking Model State Processor 1 Data Processor 2 Introducing locking (scheduling) protocols to identify potential conflicts.
Mutual Exclusion Through Locking Model State ✗ Processor 1 Data Processor 2 Enforce serialization of computation that could conflict.
Markov Blanket Locks • Read/Write Locks: R W R R R W R R R R
Markov Blanket Locks • Eliminate fixed schedule and global coordination • Supports more advanced block sampling • Expected Parallelism: Max Degree # Processors # Variables
A System for Mutual Exclusion onMarkov Random Fields • GraphLab/PowerGraph [UAI’10, OSDI’12]: • Chromatic Sampling • Markov Blanket Locks + Block Sampling
LimitationDensely Connected MRF • V-Structures: observations couple many variables • Collapsed models: clique-like MRFs • Mutual exclusion pessimistically serializes computation that couldinterfere. • Can we be optimisticand only serialize computation that doesinterfere?
Opposing Forces Serial Inference Mutual Exclusion Optimistic Concurrency Control Accuracy Parameter Server Unsafe Safe Asynchronous Samplers Scalability
Optimistic Concurrency Control assume the best and correct X. Pan, J. Gonzalez, S. Jegelka, T. Broderick, M. Jordan. Optimistic Concurrency Control for Distributed Unsupervised Learning. NIPS’13 Xinghao Pan Tamara Broderick Stefanie Jegelka Michael Jordan
Optimistic Concurrency Control • Classic idea from Database Systems: • Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems. 1981 • Assume most operations won’t conflict: • Execute operations without blocking Frequent case is fast • Identify and resolve conflicts after they occur Infrequent case with potentially costly resolution
Optimistic Concurrency Control Model State Processor 1 Data Processor 2 Allow computation to proceed without blocking. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981
Optimistic Concurrency Control Model State ✔ ? Valid outcome Processor 1 Data Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981
Optimistic Concurrency Control Invalid Outcome ✗ ? ? ✗ Model State Processor 1 Data Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981
Optimistic Concurrency Control Amend the Value ✗ ✗ Model State Processor 1 Data Processor 2 Take a compensating action. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981
Optimistic Concurrency Control Invalid Outcome ✗ ✗ Model State Processor 1 Data Processor 2 Validate potential conflicts. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981
Optimistic Concurrency Control Rollback and Redo ✗ ✗ Model State Processor 1 Data Processor 2 Take a compensating action. Kung & Robinson. On optimistic methods for concurrency control.ACM Transactions on Database Systems 1981
Optimistic Concurrency Control Rollback and Redo Model State Processor 1 Data Processor 2 Non-Blocking Computation Concurrency Requirements: Validation:Identify Errors Resolution: Correct Errors Fast Accuracy Infrequent
Optimistic Concurrency Controlfor Bayesian Inference • Non-parametric Models [Pan et al., NIPS’13]: • OCC DP-Means: Dirichlet Process Clustering • OCC BP-Means: Beta Process Feature Learning • Conditional Sampling: (In Progress) • Collapsed Gibbs LDA • Retrospective Sampling for HDP
DP-Means Algorithm [Kulis and Jordan, ICML’12] • Start with DP Gaussian mixture model: • small variance limit
DP-Means Algorithm [Kulis and Jordan, ICML’12] • Start with DP Gaussian mixture model: • small variance limit redefine : Decreases Rapidly
DP-Means Algorithm [Kulis and Jordan, ICML’12] • Corresponding Gibbs sampler conditionals: • Taking the small variance limit
DP-Means Algorithm [Kulis and Jordan, ICML’12] • Gibbs updates become deterministic: • Taking the small variance limit
DP-Means Algorithm [Kulis and Jordan, ICML’12] • Gibbs updates become deterministic: