Scaling up LDA

Scaling up LDA - 2 William Cohen

SPEEDUP FOR Parallel LDA - USING ALLREDUCE FOR Synchronization

What if you try and parallelize? Split document/term matrix randomly and distribute to p processors .. then run “Approximate Distributed LDA” Common subtask in parallel versions of: LDA, SGD, ….

Introduction • Common pattern: • do some learning in parallel • aggregate local changes from each processor • to shared parameters • distribute the new shared parameters • back to each processor • and repeat…. • AllReduce implemented in MPI, recently in VW code (John Langford) in a Hadoop/compatible scheme MAP ALLREDUCE

Gory details of VW Hadoop-AllReduce • Spanning-tree server: • Separate process constructs a spanning tree of the computenodes in the cluster and then acts as a server • Worker nodes (“fake” mappers): • Input for worker is locally cached • Workers all connect to spanning-tree server • Workers all execute the same code, which might contain AllReduce calls: • Workers synchronize whenever they reach an all-reduce

HadoopAllReduce don’t wait for duplicate jobs

Second-order method - like Newton’s method

2 24 features ~=100 non-zeros/example 2.3B examples example is user/page/ad and conjunctions of these, positive if there was a click-thru on the ad

50M examples explicitly constructed kernel  11.7M features 3,300 nonzeros/example old method: SVM, 3 days: reporting time to get to fixed test error

MORE LDA SPEEDUPSFirst - RECAP LDA DEtails

More detail

z=1 z=2 random z=3 unit height … …

SPEEDUP 1 - Sparsity

z=1 z=2 random z=3 unit height … …

Running total of P(z=k|…) or P(z<=k)

Discussion…. • Where do you spend your time? • sampling the z’s • each sampling step involves a loop over all topics • this seems wasteful • even with many topics, words are often only assigned to a few different topics • low frequency words appear < K times … and there are lots and lots of them! • even frequent words are not in every topic

Discussion…. Idea: come up with approximations to Z at each stage - then you might be able to stop early….. • What’s the solution? Want Zi>=Z

Tricks • How do you compute and maintain the bound? • see the paper • What order do you go in? • want to pick large P(k)’s first • … so we want large P(k|d) and P(k|w) • … so we maintain k’s in sorted order • which only change a little bit after each flip, so a bubble-sort will fix up the almost-sorted array

Results

SPEEDUP 2 - ANOTHER APPROACH FOR USING Sparsity

KDD 09

z=s+r+q

If U<s: • lookup U on line segment with tic-marks at α1β/(βV + n.|1), α2β/(βV + n.|2), … • If s<U<r: • lookup U on line segment for r Only need to check t such that nt|d>0 z=s+r+q

If U<s: • lookup U on line segment with tic-marks at α1β/(βV + n.|1), α2β/(βV + n.|2), … • If s<U<s+r: • lookup U on line segment for r • If s+r<U: • lookup U on line segment for q Only need to check t such that nw|t>0 z=s+r+q

Only need to check occasionally (< 10% of the time) Only need to check t such that nt|d>0 Only need to check t such that nw|t>0 z=s+r+q

Trick; count up nt|dfor d when you start working on d and update incrementally Only need to store (and maintain) total words per topic and α’s,β,V Only need to storent|dfor current d Need to storenw|t for each word, topic pair …??? z=s+r+q

1. Precompute, for each t, 2. Quickly find t’s such that nw|tis large for w Most (>90%) of the time and space is here… Need to storenw|t for each word, topic pair …??? z=s+r+q

1. Precompute, for each t, 2. Quickly find t’s such that nw|tis large for w • map w to an int array • no larger than frequency w • no larger than #topics • encode (t,n) as a bit vector • n in the high-order bits • t in the low-order bits • keep ints sorted in descending order Most (>90%) of the time and space is here… Need to storenw|t for each word, topic pair …???

SPEEDUP 3 - Online LDA

Pilfered from… NIPS 2010: Online Learning for LDA, Matthew Hoffman, Francis Bach & Blei

ASIDE: VARIATIONAL INFERENCE FOR LDA

Scaling up LDA - 2

Scaling up LDA - 2

Presentation Transcript

Scaling up HIV

Scaling Up Without Blowing Up

Scaling Up

Scaling Up in Education

Scaling up innovation

Scaling Up in Illinois

Storage: Scaling Out > Scaling Up?

Scaling up Implementation

Scaling up Biodiversity Finance

Scaling Up An Introduction

SCALING UP CHE

Scaling up Biodiversity Finance

Scaling-Up the BIRN

Scaling Up PVSS

SCALING UP BIODIVERSITY FINANCE

Scaling up LDA

Scaling Up PVSS

Scaling up Reuse

State Scaling-up Workgroup

SCALING UP RtI 2.0

Scaling Up

Scaling up LDA - 2

Scaling up LDA - 2

Presentation Transcript

Scaling up HIV

Scaling Up Without Blowing Up

Scaling Up

Scaling Up in Education

Scaling up innovation

Scaling Up in Illinois

Storage: Scaling Out &gt; Scaling Up?

Scaling up Implementation

Scaling up Biodiversity Finance

Scaling Up An Introduction

SCALING UP CHE

Scaling up Biodiversity Finance

Scaling-Up the BIRN

Scaling Up PVSS

SCALING UP BIODIVERSITY FINANCE

Scaling up LDA

Scaling up LDA

Scaling Up PVSS

Scaling up Reuse

State Scaling-up Workgroup

SCALING UP RtI 2.0

Scaling Up

Storage: Scaling Out > Scaling Up?