1 / 26

Shimin Chen Big Data Reading Group

Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006). Shimin Chen Big Data Reading Group. Motivations. Industry-wide shift to multicore No good framework for parallelize ML algorithms

lesa
Download Presentation

Shimin Chen Big Data Reading Group

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Map-Reduce for Machine Learning on MulticoreC. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin ChenBig Data Reading Group

  2. Motivations • Industry-wide shift to multicore • No good framework for parallelize ML algorithms • Goal: develop a general and exact technique for parallel programming of a large class of ML algorithms for multicore processors

  3. Idea Statistical Query Model Summation Form Map-Reduce

  4. Outline • Introduction • Statistical Query Model and Summation Form • Architecture (inspired by Map-Reduce) • Adopted ML Algorithms • Experiments • Conclusion

  5. Valiant Model [Valiant’84] • x is the input • y is a function of x that we want to learn • In Valiant model, the learning algorithm uses randomly drawn examples <x, y> to learn the target function

  6. Statistical Query Model [Kearns’98] • A restriction on Valiant model • A learning algorithm uses some aggregates over the examples, not the individual examples • More precisely, the learning algorithm interacts with a statistical query oracle • Learning algorithm asks about f(x,y) • Oracle returns the expectation that f(x,y) is true

  7. Summation Form • Aggregate over the data: • Divide the data set into pieces • Compute aggregates on each cores • Combine all results at the end

  8. Example: Linear Regression using Least Squares Model: Goal: Solution: Given m examples: (x1, y1), (x2, y2), …, (xm, ym) We write a matrix X with x1, …, xm as rows, and row vector Y=(y1, y2, …ym). Then the solution is Parallel computation: Cut to m/num_processor pieces

  9. Outline • Introduction • Statistical Query Model and Summation Form • Architecture (inspired by Map-Reduce) • Adopted ML Algorithms • Experiments • Conclusion

  10. Lighter Weight Map-Reduce for Multicore

  11. Outline • Introduction • Statistical Query Model and Summation Form • Architecture (inspired by Map-Reduce) • Adopted ML Algorithms • Experiments • Conclusion

  12. Locally Weighted Linear Regression (LWLR) • Mappers: one sets compute A, the other set compute b • Two reducers for computing A and b • Finally compute the solution Solve: When wi==1, this is least squares.

  13. Naïve Bayes (NB) • Goal: estimate P(xj=k|y=1) and P(xj=k|y=0) • Computation: count the occurrence of (xj=k, y=1) and (xj=k, y=0), count the occurrence of (y=1) and (y=0), the compute division • Mappers: count a subgroup of training samples • Reducer: aggregate the intermediate counts, and calculate the final result

  14. Gaussian Discriminative Analysis (GDA) • Goal: classification of x into classes of y • assuming each class is a Gaussian Mixture model with different means but same covariance. • Computation: • Mappers: compute for a subset of training samples • Reducer: aggregate intermediate results

  15. K-means • Computing the Euclidean distance between sample vectors and centroids • Recalculating the centroids • Divide the computation to subgroups to be handled by map-reduce

  16. Expectation Maximization (EM) • E-step computes some prob or counts per training example • M-step combines these values to update the parameters • Both of them can be parallelized using map-reduce

  17. Neural Network (NN) • Back-propagation, 3-layer network • Input, middle, 2 output nodes • Goal: compute the weights in the NN by back propagation • Mapper: propagate its set of training data through the network, and propagate errors to calculate the partial gradient for weights • Reducer: sums the partial gradients and does a batch gradient descent to update the weights

  18. Principal Components Analysis (PCA) • Compute the principle eigenvectors of the covariance matrix • Clearly, we can compute the summation form using map-reduce

  19. Other Algorithms • Logistic Regression • Independent Component Analysis • Support Vector Machine

  20. Time Complexity

  21. Outline • Introduction • Statistical Query Model and Summation Form • Architecture (inspired by Map-Reduce) • Adopted ML Algorithms • Experiments • Conclusion

  22. Setup • Compare map-reduce version and sequential version • 10 data sets • Machines: • Dual-processor Pentium-III 700MHz, 1GB RAM • 16-way Sun Enterprise 6000 • (these are SMP, not multicore)

  23. Dual-Processor SpeedUps

  24. 2-16 processor speedups More data in the paper

  25. Multicore Simulator Results • A paragraph on this • Basically, says that results are better than multiprocessor machines. • Could be because of less communication cost

  26. Conclusion • Parallelize summation forms • Use map-reduce on a single machine

More Related