1 / 38

Batch Codes and Their Applications

This presentation covers batch codes for load-balancing scenarios, with examples and applications in private information retrieval. Explore computational and information-theoretic PIR approaches, time complexity, and amortized PIR using hashing.

hurley
Download Presentation

Batch Codes and Their Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Batch Codes andTheir Applications Y.Ishai, E.Kushilevitz, R.Ostrovsky, A.Sahai Preliminary version in STOC 2004

  2. Talk Outline • Batch codes • Amortized PIR • via hashing • via batch codes • Constructing batch codes • Concluding remarks

  3. A Load-Balancing Scenario x

  4. What’s wrong with a random partition? • Good on average for “oblivious” queries. • However: • Can’t balance adversarial queries • Can’t balance few random queries • Can’t relieve “hot spots” in multi-user setting

  5. L R LR L R Example • 3 devices, 50% storage overhead. • By how much can the maximal load be reduced? • Replicating bits is no good:device s.t.1/6 of the bits can only be found at this device. • Factor 2 load reduction is possible:

  6. n N y1 y2 ym x { i1,…,ik } Batch Codes • (n,N,m,k) batch code: • Notes • Rate = n / N • By default, insist on minimal load per bucket m≥k. • Load measured by # of probes. • Generalizations • Allow t probes per bucket • Larger alphabet 

  7. n N y1 y2 ym x < i1,…,ik > Multiset Batch Codes • (n,N,m,k) multiset batch code: • Motivation • Models multiple users (with off-line coordination) • Useful as a building block for standard batch codes • Nontrivial even for multisets of the form < i,i,…,i >

  8. multiset multiset Examples • Trivial codes • Replication: N=kn, m=k • Optimal m, bad rate. • One bit per bucket: N=m=n • Optimal rate, bad m. • (L,R,LR)code: rate=2/3, m=3, k=2. • Goal: simultaneously obtain • High rate(close to 1) • Small m (close to k)

  9. Private Information Retrieval (PIR) • Goal: allow user to query database while hiding the identity of the data-items she is after. • Motivation: patent databases, web searches, ... • Paradox(?): imagine buying in a store without the seller knowing what you buy. Note: Encrypting requests is useful against third parties; not against server holding the data.

  10. Modeling • Database:n-bit string x • User: wishes to • retrieve xi and • keepi private

  11. Server ??? xi User

  12. Some “Solutions” 1. User downloads entire database. Drawback:n communication bits (vs. logn+1 w/o privacy). Main research goal: minimize communication complexity. 2. User masks i with additional random indices. Drawback: gives a lot of information about i. 3. Enable anonymous access to database. Note: addresses the different security concern of hiding user’s identity, not the fact that xi is retrieved. Fact: PIR as described so far requires (n) communication bits.

  13. Two Approaches • Computational PIR[KO97, CMS99,...] • Computational privacy • Based oncryptographic assumptions • Information-Theoretic PIR[CGKS95,Amb97,...] • Replicate database among s servers • Unconditional privacy against tservers • Default: t=1

  14. Communication Upper Bounds • Computational PIR • O(n), polylog(n), O(logn),O(+logn)[KO97,CMS99,…] • Information-theoretic PIR • 2 servers, O(n1/3)[CGKS95] • s servers, O(n1/c(s))where c(s)=Ω(slogs / loglogs)[CGKS95,Amb97,BIKR02] • O(logn/loglogn) servers, polylog(n)

  15. Time Complexity of PIR • Given low-communication protocols, efficiency bottleneck shifts to servers’ time complexity. • Protocols require (at least)linear time per query. • This is an inherent limitation! • Possible workarounds: • Preprocessing • Amortize cost over multiple queries

  16. Previous Results [BIM00] • PIR with preprocessing • s-server protocols with O(n) communication and O(n1/s+) work per query, requiring poly(n) storage. • Disadvantages: • Only work for multi-server PIR • Storage typically huge • Amortized PIR • Slight savings possible using fast matrix multiplication • Require a large batch of queries and high communication • Apply also to queries originating from different users. • This work: • Assume a batch of k queries originate from a single user. • Allow preprocessing (not always needed). • Nearly optimal amortization

  17. ??? xi , xi ,…, xi 1 2 k Model Server/s User

  18. Amortized PIR via Hashing • Let P be a PIR protocol. • Hashing-based amortized PIR: • User picks hRH , defining a random partition of x into k buckets of sizen/k, and sends h to Server/s. • Except for 2- failure probability, at most t=O(logk)queries fall in each bucket. • P is applied t times for each bucket. • Complexity: • Time kt T(n/k)  t T(n) • Communication  ktC(n/k) • Asymptotically optimal up to “polylog factors”

  19. So what’s wrong? • Not much… • Still: • Not perfect • introduces either error or privacy loss • Useless for small k • t=O(logk)overhead dominates • Cannot hash “once and for all” • h  bad k-tuple of queries • Sounds familiar?

  20. Amortized PIR via Batch Codes • Idea: use batch-encoding instead of hashing. • Protocol: • Preprocessing: Server/s encode x as y=(y1,y2,…,ym). • Based on i1,…,ik, User computes the index of the bit it needs from each bucket. • P is applied once for each bucket. • Complexity • Time 1jmT(Nj)  T(N) • Communication  1jmC(Nj) mC(n) • Trivial batch codes imply trivial protocols. • (L,R,LR) code: 2 queries,1.5 X time, 3 X communication

  21. Constructing Batch Codes

  22. n N y1 y2 ym x i1,…,ik Overview • Recall notion • Main qualitative questions: 1.Can we get arbitrarily high constant rate (n/N=1-) while keeping m feasible in terms of k (say m=poly(k))? 2.Can we insist on nearly optimal m (say m=O(k)) and still get close to a constant rate? • Several incomparable constructions • Answer both questions affirmatively. ~

  23. n m Batch Codes from Unbalanced Expanders • By Hall’s theorem, the graph represents an (n,N=|E|,m,k) batch code iff every set S containing at most k vertices on the left has at least |S| neighbors on the right. • Fully captures replication-based batch codes.

  24. Parameters • Non-explicit: N=dn,m=O(k (nk)1/(d-1)) • d=3: rate=1/3, m=O(k3/2n1/2). • d=logn:rate=1/logn, m=O(k)  Settles Q2 • Explicit (using [TUZ01],[CRVW02]) • Nontrivial, but quite far from optimal • Limitations: • Rate < ½ (unless m=(n)) • For const. rate, m must also depend on n. • Cannot handle multisets.

  25. The Subcube Code • Generalize (L,R,LR) example in two ways • Trade better rate for larger m • (Y1,Y2,…,Ys,Y1 …  Ys) • still k=2 • Handle larger k via composition

  26. Geomertic Interpretation A B A B C D AB C D CD AC BD ABCD

  27. Parameters • Nklog(1+1/s)n, mklog(s+1) • s=O(logk)gives an arbitrary constant rate with m=kO(loglogk).  “almost” resolves Q1 • Advantages: • Arbitrary constant rate • Handles multisets • Very easy decoding • Asymptotically dominated by subsequent construction.

  28. The Gadget Lemma • From now on, we can choose a “convenient” n and get same rate and m(k) for arbitrarily larger n. Primitive multiset batch code

  29. Batch Codes vs. Smooth Codes • Def. A code C:n m is q-smooth if there exists a (randomized) decoder D such that • D(i) decodes xi by probing q symbols of C(x). • Each symbol of C(x) is probed w/prob  q/m. • Smooth codes are closely related to locally decodable codes [KT00]. • Two-way relation with batch codes: • q-smooth code  primitive multiset batch code with k=m/q2 (ideally would like k=m/q). • Primitive multiset batch code  (expected) q-smooth for q=m/k • Batch codes and smooth codes are very different objects: • Relation breaks when relaxing “multiset” or “primitive” • Gap between m/q and m/q2 is very significant for high rate case • Best known smooth codes with rate>1/2 require q>n1/2 • These codes are provably useless as batch codes.

  30. Batch Codes from RM Codes • (s,d) Reed-Muller code over F • Message viewed as s-variate polynomial p over F of total degree (at most) d. • Encoded by the sequence of its evaluations on all points in Fs • Case |F|>d is useful due to a “smooth decoding” feature: p(z) can be extrapolated from the values of p on any d+1 points on a line passing through z.

  31. Two approaches for handling conflicts: • Replicate each point t times • Use redundancy to “delete” intersections • Slightly increases field size, but still allows constant rate. x2 xn x1 s=2, d(2n)1/2

  32. Parameters • Rate = (1/s!-), m=k1+1/(s-1)+o(1) • Multiset codes with constant rate (< ½) • Rate = (1/k), m=O(k)  resolves Q2 for multiset codes as well • Main remaining challenge: resolve Q1 ~

  33. ( ) ( ) [s] d s d s x y d The Subset Code • Choose s,d such that n • Each data bit i[n] is associated T • Each bucket j[m] is associated S • Primitive code: yS=TSxT ( ) [s] d

  34. ( ) [s] d Batch Decoding the Subset Code xT • Lemma: For each T’T, xTcan be decoded from all ySsuch that ST=T’. • Let LT,T’ denote the set of such S. • Note: {LT,T’ : T’T } defines a partition of yT’ 0011110000 **0110****

  35. Batch Decoding the Subset Code (contd.) x3 x1 x2 • Goal: Given T1,…,Tk, find subsets T’1,…,T’k such that LTi,T’i are pairwise disjoint. • Easy if all Ti are distinct or if all Ti are the same. • Attempt 1: T’i is a random subset of Ti • Problem: if Ti,Tj are disjoint, LTi,T’i and LTj,T’j intersect w.h.p. • Attempt 2: greedily assign to Ti the largest T’i such that LTi,T’i does not intersect any previous LTj,T’j • Problem: adjacent sets may “block” each other. • Solution: pick random T’iwith bias towards large sets.

  36. Parameters • Allows arbitrary constant rate with m=poly(k)  Settles Q1 • Both the subcube code and the subset code can be viewed as sub-codes of the binary RM code. • The full binary RM code cannot be batch decoded when the rate>1/2.

  37. Concluding Remarks: Batch Codes • A common relaxation of very different combinatorial objects • Expanders • Locally-decodable codes • Problem makes sense even for small values of m,k. • For multiset codes with m=3,k=2, rate 2/3 is optimal. • Open for mk+2. • Useful building block for “distributed data structures”.

  38. Non-adaptive Adaptive  ? Single user Multiple users ? ? Concluding Remarks: PIR • Single-user amortization is useful in practice only if PIR is significantly more efficient than download. • Certainly true for multi-server PIR • Most likely true also for single-server PIR • Killer app for lattice-based cryptosystems?

More Related