1 / 21

Lp-Sampling: Efficient Streaming Algorithm for Lp-Norm Estimations

This joint work by David Woodruff and Morteza Monemizadeh proposes a streaming algorithm, Lp-Sampling, for estimating Lp-norms of high-dimensional vectors. The algorithm solves and unifies many well-studied streaming problems with additive error guarantees. It provides efficient solutions for sampling with deletions, heavy hitters, cascaded moment estimation, and Fk-estimation.

burtonb
Download Presentation

Lp-Sampling: Efficient Streaming Algorithm for Lp-Norm Estimations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lp-Sampling David Woodruff IBM Almaden Joint work with Morteza Monemizadeh TU Dortmund

  2. Given a stream of updates (i, a) to coordinates i of an n-dimensional vector x • |a| < poly(n) • a is an integer • stream length < poly(n) • Output i with probability |xi|p/Fp, where Fp = |x|pp = Σi=1n |xi|p • Easy cases: • p = 1 and updates all of the form (i, 1) for some i Solution: choose a random update in the stream, output the coordinate it updates [Alon, Matias, Szegedy] Generalizes to all positive updates • p = 0 and there are no deletions Solution: min-wise hashing, hash all distinct coordinates as you see them, maintain the minimum hash and item [Broder, Charikar, Frieze, Mitzenmacher] [Indyk] [Cormode, Muthukrishnan]

  3. Our main result • For every 0 · p · 2, there is an algorithm that with probability · n-100 fails, and otherwise outputs an I in [n] for which for all j in [n] Pr[I = j] = (1 ± ε)|xj|p/Fp Condition on every invocation succeeding in any poly(n)-time algorithm Algorithm is 1-pass, poly(ε-1 log n)-space and update time, and also returns wi = (1 ± ε)|xj|p/Fp Generalizes to 1-pass n1-2/ppoly(ε-1 log n)-space for p > 2 • “additive-error” samplers Pr[I = j] = |xj|p/Fp ± εFpgiven • explicitly in [Jayram, W] • implicitly in [Andoni, DoBa, Indyk, W]

  4. Lp-sampling solves and unifies many well-studied streaming problems:

  5. Solves Sampling with Deletions: • [Cormode, Muthukrishnan, Rozenbaum] want importance sampling with deletions: maintain a sample i with probability |xi|/|x|1 • Set p = 1 in our theorem • [Chaudhuri, Motwani, Narasayya] ask to sample from the result of a SQL operation, e.g., self-join • Set p = 2 in our theorem • [Frahling, Indyk, Sohler] study maintaining approximate range spaces and costs of Euclidean spanning trees • They need and obtain a routine to sample a point from a set undergoing insertions and deletions • Alternatively, set p = 0 in our theorem

  6. Alternative solution to Heavy Hitters Problem for any Fp: • Output all i for which |xi|p > Á Fp • Do not output any i for which |xi|p < (Á/2) Fp • Studied by Charikar, Chen, Cormode, Farach-Colton, Ganguly, Muthukrishnan, and many others • Invoke our algorithm O~(1/Á) times, use approximations to values • Optimal up to poly(ε-1 log n) factors

  7. Solves Block Heavy Hitters: given an n x d matrix, return indices i of rows Ri with |Ri|pp > Á¢Σj |Rj|pp • [Andoni, DoBa, Indyk] study the case p = 1 • Used by [Andoni, Indyk, Kraughtgamer] for constructing a small-size sketch for the Ulam metric under the edit distance • Treat R as a big (nd)-dimensional vector • Sample an entry (i, j) using our theorem for general p • The probability a row i is sampled is |Ri|pp/ Σj |Rj|pp, so we can recover IDs of all the heavy rows. • We do not use Cauchy random variables or Nisan’s pseudorandom generator, could be more practical than [ADI]

  8. Alternative Solution to Fk-Estimation for any k ¸ 2: • Optimal up to poly(ε-1 log n) factors • Reduction given by [Coppersmith, Kumar]: • Take r = O(n1-2/k)L2-samples wi1, … , wir • In parallel estimate F2, call it F2’ • Output (F2’/r) * Σj wijk-2 Proof: second moment method First algorithm not to use Nisan’s pseudorandom generator

  9. Solves Cascaded Moment Estimation: • Given ann x dmatrixA,Fk(Fp)(A) = Σj |Aj|pkp • Problem initiated by [Cormode, Muthukrishnan] • Show F2(F0)(A) uses O(n1/2) space if no deletions • Ask about complexity for other k and p • For any p in [0,2], gives O(n1-1/k) space for Fk(Fp)(A) • We get entry (i, j) with probability |Ai, j|p/ Σi’, j’ |Ai’, j’|p • Probability row Ai is returned is Fp(Ai)/ Σj Fp(Aj) • If 2 passes allowed, take O(n1-1/k) samples Ai, in 1st pass, compute Fp(Ai) in 2nd pass, and feed into Fk AMS estimator • To get 1 pass, feed row IDs into an O(n1-1/k)-space algorithm of [Jayram, W] for estimating Fkbased only on item IDs • Algorithm is space-optimal [Jayram, W] • Our theorem with p = 0 gives O(n1/2) space for F2(F0)(A) with deletions

  10. Ok, so how does it work?

  11. General Framework [Indyk, W] 1. Form streams by subsampling • St = {i | |xi| in [ηt-1, ηt)} forη = 1 + £(ε) • Stcontributes if |St|ηpt¸³ Fp(x), where ³ = poly(ε/log n) • assume p > 0 in talk • Let h:[n] -> [n] be a hash function • Create log n substreams Stream1, Stream2, …, Streamlog n • Streamjis stream restricted to updates (i, c) with h(i) · n/2j • Suppose 2j¼ |St|. Then • Streamj contains about 1 item of St • Fp(Streamj) ¼ Fp(x)/2j • |St| ηpt¸³ Fp(x) meansηpt¸³ Fp(Streamj) • Can find the item in Stin Streamj with Fp-heavy hitters algorithm • Repeat the sampling poly(ε-1log n) times, count number of times there was an item in Streamjfrom St • Use this to estimate sizes of contributing St, and Fp(x) ¼Σt |St|ηpt 2. Run Heavy hitters algorithm on streams 3. Use heavy hitters to estimate contributing St

  12. Additive Error Sampler [Jayram, W] • For contributing St, we also get poly(ε-1log n) items from the heavy hitters routine • If the sub-sampling is sufficiently random (Nisan’s generator, min-wise independent), these items are random in St • Since we have (1 ± ε)-approximations s’tto all contributing St, can: • Choose a contributing t with probability s’tηpt/Σt’ s’t’ηpt • Output a random heavy hitter found in St • For item i in contributing St, • Pr[i output] =[s’tηpt/Σt’ s’t’ηpt] ¢ 1/|St| = (1 ± ε)|xi|p/Fp • For item i in non-contributing St, • Pr[i output] = 0

  13. Relative Error in Words • Force all classes to contribute • Inject additional coordinates in each class whose purpose is to make every class contribute • Inject just enough so that overall, Fp does not change by more than a (1+ε)-factor • Run [Jayram, W]-sampling on resulting vector • If the item sampled is an injected coordinate, forget about it • Repeat many times in parallel and take the first repetition that is not an injected coordinate • Since injected coordinates only contribute O(ε) to Fpmass, small # of repetitions suffice

  14. Some Minor Points • Before seeing the stream, we don’t know which classes contribute, so we inject coordinates into every class • For St = {i | |xi| in [ηt-1, ηt)}, inject£(εFp/(ηpt # classes)) coordinates, where # classes = O(ε-1log n) • Need to know Fp - just guess it, verify at end of stream • For some classes, £(εFp/(ηpt # classes)) < 1, e.g. if t is very large, so we can’t inject any new coordinates • Find all elements in these classes and (1 ± ε)-approximations to their frequenciesseparately using a heavy hitters algorithm • When sampling, either choose a heavy hitter with the appropriate probability, or select from contributing sets using [Jayram, W]

  15. There is a Problem • The [Jayram, W]-sampler fails with probability ¸ poly(ε/log n), in which case it can output any item • This is due to some of the subroutines of [Indyk, W] that it relies on, which only succeed with this probability • So the large poly(ε/log n) additive error is still there • Cannot repeat [Jayram, W] multiple times for amplification, since we get a collection of samples, and no obvious way of detecting failure • On the other hand, could just repeat [Indyk, W] and take the median for the simpler Fk-estimation problem • Our solution: • Dig into the guts of the [Indyk, W] algorithm • Amplify success probability to ¸ 1 – n-100 of subroutines

  16. A Technical Point About [Indyk, W] • In [Indyk, W], • Create log n substreams Streamj, where Streamj includes each coordinate independently with probability 2-j • Can find the items in contributing Stin Streamj with Fp-heavy hitters • Repeat the sampling poly(ε-1log n) times, observe the fraction there is an item in Streamjfrom St • Can use [Indyk, W] to estimate every |St| since every class contributes • Issue of misclassification • St = {i | |xi| in [ηt-1, ηt)}, and Fp-heavy hitters algorithm only reports approximate frequencies of items i it finds • If |xi| = ηt, it may be classified into St or St+1 – it doesn’t matter • Simpler solution than in [Indyk, W] • If item misclassified, just classify it consistently if we see it again • Equivalent to sampling from x’ with |x’|p = (1 ±ε)|x|p • Can ensure with probability ¸ 1-n-100, we obtain st’ = (1 ±ε)|St| for all t

  17. A Technical Point About [Jayram, W] • Since we have st’ = (1 ± ε)|St| for all t • Choose a class t with probability s’tηpt/Σt’ s’t’ηpt • Output a random heavy hitter found in St • How do we output a random item in St? • Min-wise independent hash function h • For each i in St, h(i) = minj in St h(j) with probability (1 ±ε)/|St| • h can be an O(log 1/ε)-wise independent hash function • We recover i* in St for which h(i*) is minimum • Compatible with sub-sampling, where Streamj is items i for which h(i) · n/2j • Our goal is to recover i* with probability ¸ 1-n-100 • We have st’, and look at the level j* where |St|/2j* = £(log n) • If h is O(log n)-wise independent, then with probability ¸ 1-n-100, i* is in Streamj* • A worry: maybe Fp(Streamj*) >>Fp(x)/2j* so Heavy Hitter algorithm doesn’t work • Can be resolved with enough independent repetitions

  18. Beyond the Moraines: Sampling Records • Given an n x d matrix M of rows M1, …, Mn, sample i with probability |Mi|X/Σj |Mj|X, where X is a norm • If i sampled, return a vector v for which |v|X = (1 ± ε)|Mi|X • Applications • Estimating planar EMD [Andoni, DoBa, Indyk, W] • Sampling records in a relational database • Define classes St = {i | |Mi|X in [ηt-1, ηt)} forη = 1 + £(ε) • If we have a heavy hitters algorithm for rows of a matrix, then we can apply a similar approach as before • Space should be d¢poly(ε-1log n)

  19. Heavy Hitters for Rows • Algorithm in [Andoni, DoBa, Indyk, W] • Partition rows into B buckets • In each bucket maintain the vector sum of rows hashed to it • If |Mi|X > γΣj |Mj|X, and if v is the vector in the bucket containing Mi, by the triangle inequality • |v|X < |Mi|X + |Noise|X¼ |Mi|X + Σj |Mj|X/B • |v|X > |Mi|X - |Noise|X¼ |Mi|X – Σj |Mj|X/B • For B large enough, noise translates to a relative error

  20. Lower Bounds For every 0 · p · 2, there is a randomized algorithm that with probability · n-100 outputs FAIL, and otherwise outputs an I in [n] for which for all j in [n] Pr[I = j] = (1 ± ε)|xj|p/Fp Algorithm is 1-pass, poly(ε-1 log n)-space and time, returns wi = (1 ± ε)|xj|p/Fp For p > 2, gives n1-2/ppoly(ε-1 log n)-space. Can we use less space for p > 2? Requires (n1-2/p) space for any ε. Reduction from L1-estimation Can improve to(n1-2/plog n) using augmented L1-estimation [Jayram, W] Can we output FAIL with probability 0? Requires (n) space for any ε. Reduction from 2-party equality testing with no error Given that we don’t output FAIL, can we get a sampler with ε = 0? Yes for 2-pass algorithms, using rejection sampling. 1-pass requires (n) space if algorithm outputs the corresponding probability wi (needed in many applications). Reduction from the 2-party INDEX problem

  21. Some Open Questions • 1-pass algorithms for Lp-sampling • If we output FAIL with probability · n-100, and don’t require outputting the sampled item’s probability, can we get ε = 0 with low space? • ε and log n factors are large. What is the optimal dependence on them? • Useful for Fk-estimation for k > 2, and other applications • Sampling from other distributions • Given a vector (x1, …, xn) in a data stream, for which functions g can we sample from the distribution ¹(i) = |g(xi)|/Σj |g(xj)|? • E.g., random walks Thank you

More Related