1 / 15

Estimating L2 Norm

Estimating L2 Norm. MIT Piotr Indyk. Basic Data Stream Model. Single pass over the data i 1 , i 2 ,…,i n from 0…m Typically, we assume n , m are known Bounded storage (often log O(1) n ) Units of storage: bits, words or „elements ” (e.g., points, nodes/edges)

robinwatson
Download Presentation

Estimating L2 Norm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimating L2 Norm MIT Piotr Indyk

  2. Basic Data Stream Model • Single pass over the data i1, i2,…,in from 0…m • Typically, we assume n, m are known • Bounded storage (often logO(1) n) • Units of storage: bits, words or „elements” (e.g., points, nodes/edges) • Randomness and approximation OK (in fact, almost always necessary) • Last lecture: estimating the number of distinct elements, up to 1±ε, with probability 2/3, using space O(log(n)+ 1/ε2) About 8 distinct elements 8 2 1 9 1 9 2 4 6 3 9 4 2 3 4 2 3 8 5 2 5 6 ...

  3. Vector x: 0 1 ………………………m Generalization • A stream can be viewed as a sequence of updates (i,a) xi=xi+a (initially x=0) • Basic streaming model corresponds to updates (i,1) • In general, a could be negative • Number of distinct elements  number of non-zero coordinates in x, denoted by ||x||0 • Similar algorithms as in the previous lecture work for ||x||0as well • Today: two methods for estimating ||x||2 • Alon-Matias-Szegedy (AMS) • Johnson-Lindenstrauss • Really cute and simple • Really powerful, need in future lectures

  4. Why estimate L2 norm ? Rel1 Rel2 • Database join (on A): • All triples (Rel1.A, Rel1.B, Rel2.B) s.t. Rel1.A=Rel2.A • Self-join: if Rel1=Rel2 • Size of self-join: ∑val of A Rows(val)2 • Updates to the relation increment/decrement Rows(val)

  5. Algorithm I: AMS

  6. Alon-Matias-Szegedy’96 • Choose r1 … rmto be i.i.d. r.v., with Pr[ri=1]=Pr[ri=-1]=1/2 • Maintain Z=∑i ri xi under increments/decrements to xi • Return Y=Z2 as an estimate for ||x||22 • Analysis: • Compute the expectation of Y • Bound the variance of Y (or the second moment)

  7. Expectation • The expectation of Z2 = (∑i ri xi )2 is equal to E[Z2] = E[∑i,j rixirjxj] = ∑i,j xi x j E[rirj] • We have • For i≠j, E[rirj] = E[ri] E[rj] =0 – term disappears • For i=j, E[rirj] = E[ri2] = E[1] =1 • Therefore E[Z2] = ∑i xi2 =||x||22 (unbiased estimator) • Now we just need to bound the variance

  8. Bounding the second moment • The second moment of Z2 = (∑i ri xi )2 is equal to the expectation of Z4 = (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) • This can be decomposed into a sum of • ∑i (ri xi )4 →expectation= ∑i xi 4 • 6 ∑i<j (ri rj xixj )2 →expectation= 6∑i<j xi2xj2 • Terms involving single multiplier ri xi (e.g., r1x1r2x2r3x3r4x4) →expectation=0 Total: ∑i xi 4 +6∑i<j xi2xj2 ≤ 3 (∑i xi 2 )2

  9. Analysis, ctd. • We have an estimator Y=Z2 • E[Y] = ∑i xi2 • σ2 =Var[Y] ≤ 3 (∑i xi 2 )2 • Chebyshev inequality : Pr[ |E[Y]-Y| ≥ cσ ] ≤ 1/c2 • Algorithm AMS+: • Maintain Z1 … Zk(and thusY1 … Yk), define Y’ = ∑i Yi /k • E[Y’] = k ∑i xi2 /k = ∑i x i2 • σ’2 = Var[Y’] ≤ 3k(∑i xi 2 )2 /k2 = 3 (∑i xi 2 )2 /k • Guarantee: Pr[ |Y’ - ∑i xi2 | ≥c (3/k)1/2 ∑i xi2 ] ≤ 1/c2 • Setting c to a constant and k=O(1/ε2) gives (1 ε)-approximation with const. probability • Total space: O( log (nm) / ε2 ) bits (not counting ri’s)

  10. Comments • Only needed 4-wise indepence of r0…rm • Can generate such vars from O(log m) random bits (previous lecture) • What we did: • Maintain a “linear sketch” vector Z=[Z1...Zk] = R x • Estimator for ||x||22 : (Z12 +... + Zk2)/k = ||Rx||22 /k • “Dimensionality reduction”: x→ Rx … but the tail somewhat “heavy” • Reason: only used second moment of the estimator

  11. Algorithm II: Dim. Reduction (JL)

  12. Interlude: Normal Distribution • Normal distribution N(0,1): • Range: (-∞, ∞) • Density: f(x)=e-x^2/2 / (2π)1/2 • Mean=0, Variance=1 • Basic facts: • If X and Y independent r.v. with normal distribution, then X+Y has normal distribution • Var(cX)=c2 Var(X) • If X,Y independent, then Var(X+Y)=Var(X)+Var(Y)

  13. A different linear sketch • Instead of 1, let ri be i.i.d. random variables from N(0,1) • Consider Z=∑i ri xi • We still have that E[Z2] = ∑i xi2 =||x||22, since: • E[ri] E[rj] = 0 • E[ri2] = variance of ri, i.e., 1 • As before we maintain Z=[Z1 … Zk ] and define Y = ||Z||22= ∑j Zj2(so that E[Y]=k||x||22 ) • We show that there exists C>0 s.t. for small enough ε>0 Pr[ | Y - k||x||22 |> εk||x||22] ≤ exp(-C ε2 k) • Set k=O(1/ε2 log(1/δ)) to get 1±ε approx. with prob. 1-δ

  14. Proof • See the attached notes, by Ben Rossman and Michel Goemans

  15. JL - comments • Can use k-wise independence to generate ri’s, but this is much more messy than for AMS • Time to compute the sketch vector Z from x is O(dk) • Good if k is small, not so great if k is large • Fast JL, Sparse JL to the rescue(a few lectures from now)

More Related