1 / 42

When Machine Learning Meets the Web

Chao Liu Internet Services Research Center Microsoft Research-Redmond. When Machine Learning Meets the Web. Outline. Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization

Download Presentation

When Machine Learning Meets the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chao Liu Internet Services Research CenterMicrosoft Research-Redmond When Machine Learning Meets the Web

  2. Outline • Motivation & Challenges • Background on Distributed Computing • Standard ML on MapReduce • Classification: Naïve Bayes • Clustering: Nonnegative Matrix Factorization • Modeling: EM Algorithm • Customized ML on MapReduce • Click Modeling • Behavior Targeting • Conclusions

  3. Motivation & Challenges • Data on the Web • Scale: terabyte-to-petabyte data • Around 20TB log data per day from Bing • Dynamics: evolving data streams • Click data streams with evolving/emerging topics • Applications: Non-traditional ML tasks • Predicting clicks & ads

  4. Outline • Motivation & Challenges • Background on Distributed Computing • Standard ML on MapReduce • Classification: Naïve Bayes • Clustering: Nonnegative Matrix Factorization • Modeling: EM Algorithm • Customized ML on MapReduce • Click Modeling • Behavior Targeting • Conclusions

  5. Parallel vs. Distributed Computing • Parallel computing • All processors have access to a shared memory, which can be used to exchange information between processors • Distributed computing • Each processor has its own private memory (distributed memory), communicating over the network • Message passing • MapReduce

  6. MPI vs. MapReduce • MPI is for task parallelism • Suitable for CPU-intensive jobs • Fine-grained communication control, powerful computation model • MapReduce is for data parallelism • Suitable for data-intensive jobs • A restricted computation model

  7. Word Counting on MapReduce Mapper Mapper Mapper docs docs docs Web corpus on multiple machines Reducer Reducer Reducer … … … … (docId, doc) pairs (docId, doc) pairs (docId, doc) pairs Mapper: for each word w in a doc, emit (w, 1) (w1,1) (w1,1) (w1,1) (w3,1) (w3,1) (w3,1) (w2,1) (w2,1) Intermediate (key,value) pairs are aggregated by word • Aggregate values by keys (w3,<1,1,1>) (w1,<1,1, 1>) (w2,<1, 1>) … Reducer is copied to each machine to run over the intermediate data locally to produce the result (w3, 3) (w1, 3) (w2, 2)

  8. Machine Learning on MapReduce • A big picture: Not Omnipotent but good enough

  9. Outline • Motivation & Challenges • Background on Distributed Computing • Standard ML on MapReduce • Classification: Naïve Bayes • Clustering: Nonnegative Matrix Factorization • Modeling: EM Algorithm • Customized ML on MapReduce • Click Modeling • Behavior Targeting • Conclusions

  10. Classification: Naïve Bayes Mapper Mapper • P(C|X) P(C) P(X|C) =P(C)∏P(Xj|C) … … … … … … Reduce on y(i) (j, xj(i),y(i)) P(C) (x(i),y(i)) (j, xj(i),y(i)) P(Xj|C) (x(i),y(i)) (j, xj(i),y(i)) Reduce on j

  11. Clustering: Nonnegative Matrix Factorization [Liu et al., WWW2010] • Effective tool to uncover latent relationships in nonnegative matrices with many applications [Berry et al., 2007, Sra & Dhillon, 2006] • Interpretable dimensionality reduction [Lee & Seung, 1999] • Document clustering [Shahnaz et al., 2006, Xu et al, 2006] • Challenge: Can we scale NMF to million-by-million matrices

  12. NMF Algorithm [Lee & Seung, 2000]

  13. Distributed NMF … … • Data Partition: A, W and H across machines . . . . . . . . . .

  14. Computing DNMF: The Big Picture

  15. … … … … Map-III Map-V Map-II Map-I Map-IV … Reduce-II Reduce-I Reduce-III … … … Reduce-V …

  16. X = WTA … … Map-II Map-I … Reduce-I Reduce-II … … …

  17. Y = WTWH … … … Map-III Map-IV Reduce-III . . . . . . . . . . .

  18. H = H.*X/Y … … … Map-V … Reduce-V …

  19. … … … … Map-III Map-V Map-II Map-I Map-IV … Reduce-II Reduce-I Reduce-III … … … Reduce-V …

  20. Scalability w.r.t. Matrix Size • 3 hours per iteration, 20 iterations take around 20*3*0.72 ≈ 43 hours Less than 7 hours on a 43.9M-by-769M matrix with 4.38 billion nonzero values

  21. General EM on MapReduce • Map • Evaluate • Compute • Reduce

  22. Outline • Motivation & Challenges • Background on Distributed Computing • Standard ML on MapReduce • Classification: Naïve Bayes • Clustering: Nonnegative Matrix Factorization • Modeling: EM Algorithm • Customized ML on MapReduce • Click Modeling • Behavior Targeting • Conclusions

  23. Click Modeling: Motivation • Clicks are good… • Are these two clicks equally “good”? • Non-clicks may have excuses: • Not relevant • Not examined

  24. Eye-tracking User Study

  25. Bayesian Browsing Model [Liu et al., KDD2009] URL1 URL2 URL3 URL4 query S4 S1 S2 S3 Relevance Examine Snippet E4 E1 E2 E3 C4 C1 C2 C3 ClickThroughs

  26. Dependencies in BBM … Si S1 S2 … Ei E1 E2 the preceding click position before i Ci C1 C2 …

  27. Model Inference • Ultimate goal • Observation: conditional independence

  28. P(C|S) by Chain Rule • Likelihood of search instance • From S to R:

  29. Putting Things Together • Posterior with • Re-organize by Rj’s How many times dj was not clicked when it is at position (r + d) and the preceding click is on position r How many times dj was clicked

  30. What p(R|C1:n) Tells Us • Exact inference with joint posterior in closed form • Joint posterior factorizes and hence mutually independent • At most M(M+1)/2 + 1 numbers to fully characterize each posterior • Count vector:

  31. An Example 0 3 2 1 • Compute • Count vector for R4 0 0 0 0 0 0 1 2 N4, r, d 1 0 N4 1

  32. LearnBBM on MapReduce • Map: emit((q,u), idx) • Reduce: construct the count vector

  33. Example on MapReduce Map Map Map (U1, 0) (U2, 4) (U3, 0) (U1, 1) (U3, 0) (U4, 7) (U1, 1) (U3, 0) (U4, 0) Reduce (U1, 0, 1, 1) (U2, 4) (U3, 0, 0, 0) (U4, 0, 7)

  34. Petabyte-Scale Experiment • Setup: • 8 weeks data, 8 jobs • Job k takes first k-week data • Experiment platform • SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets [Chaiken et al, VLDB’08]

  35. Scalability of BBM • Increasing computation load • more queries, more urls, more impressions • Near-constant elapse time • 3 hours • Scan 265 terabyte data • Full posteriors for 1.15 billion (query, url) pairs Elapse Time on SCOPE Computation Overload

  36. Large-scale Behavior Targeting [Ye et al., KDD2009] • Behavior targeting • Ad serving based on users’ historical behaviors • Complementary to sponsored Ads and content Ads

  37. Problem Setting • Goal • Given ads in a certain category, locate qualified users based on users’ past behaviors • Data • User is identified by cookie • Past behavior, profiled as a vector x, includes ad clicks, ad views, page views, search queries, clicks, etc • Challenges: • Scale: e.g., 9TB ad data with 500B entries in Aug'08 • Sparse: e.g., the CTR of automotive display ads is 0.05% • Dynamic: i.e., user behavior changes over time.

  38. Learning: Linear Poisson Model • CTR = ClickCnt/ViewCnt • A model to predict expected click count • A model to predict expected view count • Linear Poisson model • MLE on w

  39. Implementation on MapReduce • Learning • Map: Compute and • Reduce: Update • Prediction

  40. Outline • Motivation & Challenges • Background on Distributed Computing • Standard ML on MapReduce • Classification: Naïve Bayes • Clustering: Nonnegative Matrix Factorization • Modeling: EM Algorithm • Customized ML on MapReduce • Click Modeling • Behavior Targeting • Conclusions

  41. Conclusions • Challenges imposed by Web data • Scalability of standard algorithms • Application-driven customized algorithms • Capability to consume huge amount of data outweighs algorithm sophistication • Simple counting is no less powerful than sophisticated algorithms when data is abundant or even infinite • MapReduce: a restricted computation model • Not omnipotent but powerful enough • Things we want to do turn out to be things we can do

  42. Q&A Thank You! SEWM‘10 Keynote, Chengdu, China

More Related