1 / 38

Sketching in Adversarial Environments Or Sublinearity and Cryptography

Sketching in Adversarial Environments Or Sublinearity and Cryptography. Moni Naor. Joint work with: Ilya Mironov and Gil Segev. Comparing Streams. How to compare data streams without storing them?. S A. S B. Step 1: Compress data on-line into sketches

miyo
Download Presentation

Sketching in Adversarial Environments Or Sublinearity and Cryptography

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sketching in Adversarial EnvironmentsOr Sublinearity and Cryptography Moni Naor Joint work with: Ilya Mironov and Gil Segev

  2. Comparing Streams • How to compare data streams without storing them? SA SB • Step 1: Compress data on-line into sketches • Step 2: Interact using only the sketches • Goal: Minimize sketches, update time, and communication

  3. Comparing Streams • How to compare data streams that cannot to be stored? $ Shared randomness $ • Real-life applications: massive data sets, on-line data,... • Highly efficient solutions assuming shared randomness

  4. Comparing Streams • How to compare data streams that cannot to be stored? $ Shared randomness $ Plagiarism detection • Is shared randomness a reasonable assumption? • No guarantees when set adversarially • Inputs may be adversarially chosen depending on the randomness

  5. The Adversarial Sketch Model “Adversarial” factors: • No secrets • Adversarially-chosen inputs Communication complexity Adversarial sketch model Massive data sets: • Sketching, streaming

  6. The Adversarial Sketch Model • Goal: Compute f(A,B) • Sketch phase • An adversary chooses the inputs of the parties • Provided as on-line sequences of insert and delete operations • No shared secrets • The parties are not allowed to communicate • Any public information is known to the adversary in advance • Adversary is computationally all powerful • Interaction phase small sketches, fast updates low communication & computation

  7. Lower Bound Equality testing in the adversarial sketch model requires sketches of size(K¢log(N/K))1/2 Our Results • Equality testing • A, Bµ[N] of size at most K • Error probability ² • If we had public randomness… • Sketches of size O(log(1/²)) • Similar update time, communication and computation

  8. Our Results • Equality testing • A, Bµ[N] of size at most K • Error probability ² Lower Bound Equality testing in the adversarial sketch model requires sketches of size(K¢log(N/K))1/2 Explicit and efficient protocol: • Sketches of size(K¢polylog(N)¢log(1/²))1/2 • Update time, communication and computationpolylog(N) Upper Bound

  9. Our Results • Symmetric difference approximation • A, Bµ[N] of size at most K • Goal: approximate |A Δ B| with error probability ² • (1 + ½)-approximation for any constant ½ • Sketches of size(K¢polylog(N)¢log(1/²))1/2 • Update time, communication and computationpolylog(N) Upper Bound • Explicit construction: polylog(N)-approximation

  10. Outline • Lower bound • Equality testing • Main tool: Incremental encoding • Explicit construction using dispersers • Symmetric difference approximation • Summary & open problems

  11. Simultaneous Messages Model y x f(x,y)

  12. Simultaneous Messages Model y x adversarial sketch model Lower Bound Equality testing in the private-coin SM model requires communication(K¢log(N/K))1/2 [NS96, BK97] sketches

  13. Outline • Lower bound • Equality testing • Main tool: Incremental encoding • Explicit construction using dispersers • Symmetric difference approximation • Summary & open problems

  14. Simultaneous Equality Testing x y K C(x) C(y) K1/2£K1/2 Communication K1/2

  15. First Attempt row = 3 C(A) C(B) col = 2 C(B)3,2 Sketches of size K1/2 Problem: update time K1/2

  16. Incrementality vs. Distance • Incrementality:Given C(S) and x2[N], the encodings of S [{x} and S \{x} are obtained by modifying very few entries logarithmic • High distance:For every distinct A,Bµ[N] of size at most K, d(C(A),C(B)) > 1 - ² constant • Impossible to achieve both properties simultaneously with Hamming distance

  17. Incremental Encoding S  C(S)1, ... , C(S)r r d(C(A),C(B)) = 1 - {1 – dH(C(A)i,C(B)i)} i = 1 Normalized Hamming distance • r=1: Hamming distance • Hope: Larger r will enable fast updates • r corresponds to the communication complexity of our protocol • Want to keep r as small as possible Explicit construction with r = logK: • Codeword size K¢polylog(N) • Update time polylog(N)

  18. Equality Protocol rows (3,1,1) C(B)3 C(A)3 cols (2,3,1), values C(B)2 C(A)2 C(A)1 C(B)1 r {1 – dH(C(A)i,C(B)i)} < ² Error probability: i = 1 1 – d(C(A), C(B))

  19. The Encoding • Global encoding • Map each element to several entries of each codeword • Exploit “random-looking” graphs • Local encoding • Resolve collisions separately in each entry • A simple solution when |A Δ B| is guaranteed to be small

  20. The Local Encoding • Suppose that |A Δ B|·ℓ

  21. Missing Number Puzzle • Let S={1,...,N}\{i} •  – random permutation over S: • (1),....,(N) as a one-way stream • One number i is missing • Goal: Determine the missing number i using O(log N) bits What if there are ℓ missingnumbers? • Can it be done using O(ℓ¢logN) bits?

  22. The Local Encoding • Suppose that |A Δ B|·ℓ A simple & well-known solution: • Associate each x2[N] with v(x) such that for anydistinct x1,...,xℓthe vectors v(x1),...,v(xℓ) are linearly-independent C(S) =  v(x) x 2 S • If 1·|A Δ B|·ℓ then C(A)  C(B) • For example v(x) = (1, x, ..., xℓ-1) • Size & update timeO(ℓ¢logN) Independent of the size of the sets

  23. The Global Encoding • Each element is mapped into several entries of each codeword • The content of each entry is locally encoded C1 Universe of size N C2 C3

  24. The Global Encoding • Each element is mapped into several entries of each codeword • The content of each entry is locally encoded • The local guarantee:If 1·|Ci[y]Å(AΔB)|·ℓ then C(A) and C(B) differ on Ci[y] Considerℓ = 1 1 C1[2] A 2 B 2 C(A) and C(B) differ at least on these entries Universe of size N 1 2 1 2 1 1 2

  25. The Global Encoding • Identify each codeword with a bipartite graph G = ([N],R,E) • For Sµ[N]define(S,ℓ) µ R as the set of all y 2 R for which 1·|(y)ÅS|·ℓ (K, ², ℓ)-Bounded-Neighbor Disperser: For any S ½ [N] such that K · |S| · 2K it holds that |(S,ℓ)| > (1 - ²)|R| S 2 1 Universe of size N 2 1 2

  26. The Global Encoding Bounded-Neighbor Disperser • r = logK codewords, each Ci is identified with a (2i, ², ℓ)-BND • For i = log2|AΔB| we have dH(C(A)i,C(B)i) > 1 - ² • In particular r d(C(A),C(B)) = 1 - {1 – dH(C(A)i,C(B)i)} > 1 - ² i = 1 A C1 B Universe of size N C2 C3

  27. Constructing BNDs (K, ², ℓ)-Bounded-Neighbor Disperser: For any S ½ [N] such that K · |S| · 2K it holds that |(S,ℓ)| > (1 - ²)|R| • Given N and K, want to optimize M, ℓ, ² and the left-degree D Optimal Extractor Disperser ℓ 1 O(1) polylog(N) polylog(N) D log(N/K) 2(loglogN)2 Codeword of length M K M K¢log(N/K) K¢2(loglogN)2 Universe of size N

  28. Outline • Lower bound • Equality testing • Main tool: Incremental encoding • Explicit construction using dispersers • Symmetric difference approximation • Summary & open problems

  29. Symmetric Difference Approximation • Sketch input streams into codewords • Compare s entries from each pair of codewords • di - # of differing entries sampled from the i-th pair • Output APX = (1 + ½)i for the maximal i s.t. di&(1 -²)s KD |AΔB|· APX · (1+½)¢ ¢|AΔB| (1 -²)M non-explicit: » 1explicit: polylog(N) d1 dk A  C(A)1, ... , C(A)k B  C(B)1, ... , C(B)k

  30. Outline • Lower bound • Equality testing • Main tool: Incremental encoding • Explicit construction using dispersers • Symmetric difference approximation • Summary & open problems

  31. Summary • Formalized a realistic model for computation over massive data sets “Adversarial” factors: • No secrets • Adversarially-chosen inputs Communication complexity Adversarial sketch model Massive data sets: • Sketching, streaming

  32. Summary • Formalized a realistic model for computation over massive data sets • Incremental encoding • Main technical contribution • Additional applications? S  C(S)1, ... , C(S)r r d(C(A),C(B)) = 1 - {1 – dH(C(A)i,C(B)i)} i = 1 • Determined the complexity of two fundamental tasks • Equality testing • Symmetric difference approximation

  33. Open Problems • Better explicit approximation for symmetric difference • Our (1 +½)-approximation in non-explicit • Explicit approximation: polylog(N) • Approximating various similarity measures • Lp norms, resemblance,... The Power of Adversarial Sketching • Characterizing the class of functions that can be “efficiently” computed in the adversarial sketch model sublinear sketchespolylog updates • Possible approach: public-coins to private-coins transformation that “preserves” the update time

  34. Computational Assumptions • Better schemes using computational assumptions? • Equality testing: Incremental collision-resistant hashing [BGG ’94] • Significantly smaller sketches • Existing constructions either have very long public descriptions, or rely on random oracles • Practical constructions without random oracles? • Symmetric difference approximation: Not known • Even with random oracles! Thank you!

  35. Pan-Privacy Model output state Data is stream of items, each item belongs to a user Data of different users interleaved arbitrarily Curator sees items, updates internal state, output at stream end Can also consider multiple intrusions Pan-PrivacyFor every possible behavior of user in stream, joint distribution of the internal state at any single point in time and the final output is differentially private

  36. Adjacency: User Level Universe U of users whose data in the stream; x2U • Streams x-adjacentif same projections of users onto U\{x} Example: axbxcxdxxxex and abcdxeare x-adjacent • Both project to abcde • Notion of “corresponding locations” in x-adjacent streams • U -adjacent: 9x 2U for which they are x-adjacent • Simply “adjacent,” if U is understood Note: Streams of different lengths can be adjacent

  37. Example: Stream Density or # Distinct Elements Universe U of users, estimate how many distinct users in U appear in data stream Application: # distinct users who searched for “flu” Ideas that don’t work: • NaïveKeep list of users that appeared (bad privacy and space) • Streaming • Track random sub-sample of users (bad privacy) • Hash each user, track minimal hash (bad privacy)

  38. Pan-Private Density Estimator Inspired by randomized response. Store for each user x 2 Ua single bit bx Initially all bx0w.p.½1w.p. ½ When encounteringxredrawbx0w.p. ½-ε1w.p. ½+ε Final output:[(fraction of 1’s in table - ½)/ε] + noise Distribution D0 DistributionD1 Pan-PrivacyIf user never appeared: entry drawn from D0If user appeared any # of times: entry drawn fromD1D0 and D1 are 4ε-differentially private

More Related