When Are LDCs a False Promise?

Weizmann Institute of Science When Are LDCs a False Promise? Moni Naor

Talk Based on: • The Complexity of Online Memory Checking [Naor and Rothblum] • Fault Tolerant Storage And Quorum Systems [Nadav and Naor] • On the Compressibility of NP Instance and Cryptographic Applications [Harnik and Naor] • Theme: cases where LDC should be helpful but • Either provably not helpful • Or open problem

Authentication Verifying a string has not been modified • Central problem in cryptography • Many variants Our Setting: • User works on large file residing on a remote server • User stores a small secret `fingerprint’ (hash) of file • Used to detect corruption • What is the size of the fingerprint? • A well understood problem

Online Memory Checking Problem with the model:What if we don’t want to read the entire file? What if we only want small part? Read entire file?! Idea: Don’t verify the entire file, verify what you need! • How much of the file do you read per authenticated bit? • How large a fingerprint do you need?

R/W R/W R/W Online Memory Checkers User makes store and retrieve requests to memory a vector in {0,1}n under adversary’s control Checker Checks: answer to retrieve = laststored value Checker: • Has secret reliable memory: space complexitys(n) • Makes its own reads/writes: query complexityq(n) Want small s(n) and small q(n)! q(n)bits b Publicmemory C memory checker retrieve(i) store(i,b) User secret memorys(n) bits

Memory Checker Requirements: For ANY sequence of user requests andANY responses from public memory: Completeness: If every read from public memory = last write Guarantee: user retrieve = last store (w.h.p) Soundness: If some read from public memory ≠ last write Guarantee: user retrieve = last store or BUG (w.h.p) b or BUG b Publicmemory C memory checker retrieve(i) User secret memorys(n) bits

Past Results: [Blum, Evans, Gemmel, Kannan and Naor 1991] Offline Memory Checkers: Detect errors only at end of long request sequence q(n)=O(1)(amortized)s(n)=O(log n) No Crypto assumptions! Online Memory Checkers: Are they necessary?! Very Simple (in chunks) Other Results: Optimal [Gemmel Naor 92] Must be invasive [Ajtai 2003] s(n) x q(n) = O(n)

Authenticators Memory Checkers allow reliable local decodability, What about reliable local testability? Authenticators: • Encode the file x 2{0,1}n into: • a large public encoding px • a small secret encodingsx. Space complexity: s(n) • Decoding Algorithm D: • Receives a public encoding p and decodes it into a vector x 2{0,1}n • Consistency verifier checks (repeatedly) public encoding was it (significantly) corrupted? reading only a few bits: t(n). • If not currupted: verifier should output “Ok” • If verifier outputs “Ok”, decoder can (whp) retrieve the file

Pretty Good Authenticatorwith computational assumptions • Idea: encode file X using a good error correcting code C • Actually erasures are more relevant • As long as a certain fraction of the symbols of C(X) is available, can decode X • Add to each symbol a tag Fk(a,i), a function of • secret information k 2 {0,1}s, seed of a PRF • symbol a 2 • location i • Verifiers picks random location i reads symbol ’a’ and tag t • Check whether t=Fk(a,i) and rejects if not • Decoding process removes all inappropriate tags and uses the decoding procedure of C • Good example: Reed Solomon

Memory Checker  Authenticator If there exists an online memory checker with • space complexity s(n) • query complexity t(n) then there exists an authenticator with • space complexity O(s(n)) • query complexity O(t(n)) Idea: Use a high-distance code

Improve the Information Theoretic Upper Bound(s)? Maybe we can use: Locally Decodable Codes? Locally Testable Codes? PCPs of proximity?

The Lower Bound Theorem 1 [Tight lower bound]: For any online memory checker secure against a computationally unbounded adversary s(n) x q(n) = (n) True also for authenticators

Memory Checkers and One-Way Functions Breaking the lower bound implies one-way functions. Theorem 2: If there exists an online memory checker: • Working in polynomial time • Secure against polynomial time adversaries • With query and space complexity:s(n) x q(n) < c · n (for a constant c > 0) Then there exist functions that are hard to invert for infinitely many input lengths(“almost one-way” functions)

This Talk: • Not say much about the proof • It is involved • Initial insight: connection to the simultaneous message model

Simultaneous Messages Protocols [Yao 1979] x {0,1}n • For the equality function: • |mA| + |mB| = (√n)[Newman Szegedy 1996] • |mA| x |mB| = (n)[Babai Kimmel 1997] ALICE mA f(x,y) x=y? CAROL mB BOB y {0,1}n

Ingredients for Full Proof: • Consecutive Messages Model:Generalized communication complexity lower bound. • Adversary “learns” public memory access distribution:Learning Adaptively Changing Distributions [NR06]. • “Bait and Switch” technique:Handle adaptive checkers. • One-Way functions:Breaking the generalized communication complexity lower bound in a computational setting requires one-way functions.

LDC Conclusions for OMC • Settled the complexity of online memory checking • Characterized the computational assumptions required for good online memory checkers Open Questions: • Do we need logarithmic query complexity for online memory checking with computational assumptions? • Understanding relationships of crypto/complexity objects • Quantum Memory Checkers?

Talk Based on: • The Complexity of Online Memory Checking [Naor and Rothblum] • Fault Tolerant Storage And Quorum Systems [Nadav and Naor] • On the Compressibility of NP Instance and Cryptographic Applications [Harnik and Naor] • Theme: cases where LDC should be helpful but • Either provably not helpful • Or open problem

Goal • Distributed file storage system • Peer-to-peer environment • Processors join and leave the system continuously Want to be able to store and retrieve files distributively • Partial Solutions • Distributed File sharing applications [Gnutella, Kazaa] • Distributed Hash Tables [DH, Chord, Viceroy] • Store (key, value) pairs and perform lookup on key

Fault-Tolerant Storage System • Censor • Aims to eliminate access to some files • Can take down some servers • Design Goal: • A reader should be able to reconstruct each file with high probability even after faults have occurred Probability taken over coins of the writer and reader

Adversarial Behavior • How are the faulty processors chosen? What is the influence of the adversary • Type of faults • Complete/Partial control

Adversarial Model • Adversary chooses the set of processors to crash • Different degrees of adaptiveness • Non adaptive adversary • Choice of faulty processors is not based on their content • Adversary with a limited number of queries • May query some processors • fail-stop failures • We do not consider Byzantine failures

Other Fault Models • Random faults model: • Examples: Distance Halving DHT, Chord • Standard technique: • Replication to log(n) processors • Assures survival with high probability • Adversarial faults [Fiat, Saia] • Large fraction accessible after adversary crashes a linear fraction of the processors • Still, a censor can target a specific file

Measures of Quality • Read/Write complexity: • Average number of processors accessed during a read/write operation • Number of rounds: • Number of rounds required from an adaptive reader • Blowup Ratio: • Ratio between the total number of bits used for the storage of a file and its size

Connection to LDC • If you are willing to have high write complexity: • Can encode ALL the data with an LDC • Parameters of the LDC determine how good the data storage is

Probabilistic Storage system based on  intersecting quorum system • Storage System: • To store a file: pick a set of size uniformly at random • replicate the file to all members of the quorum set • Retrieval: Choose a random set of size and probe its members • Intersection follows from the birthday paradox

Properties of the Probabilistic Storage System • Pros: • Simplicity • Resilient against linear number of faults • Even if the processors are chosen by the adversary adaptively • Adapted to a dynamic environment [Abraham, Malkhi] • Cons: • High read/write complexity • High blowup-ratio Want a storage system with better parameters

Theorem:A fault tolerant storage system, in the non-adaptive reader model, resilient against(n)faults, cannot do better than the -intersecting storage system example. • Read Complexity ¢ Write Complexity is (n) • Blowup Ratio is (√n) Non-adaptive readers are wasteful! • Non-adaptive reader: • Processors are chosen without accessing any processor

E For Effort Open Question • Do the lower bounds for the case when both the reader and the adversary are non-adaptive hold when both are fully adaptive?

Talk Based on: • The Complexity of Online Memory Checking [Naor and Rothblum] • Fault Tolerant Storage And Quorum Systems [Nadav and Naor] • On the Compressibility of NP Instance and Cryptographic Applications [Naor and Harnik] • Theme: cases where LDC should be helpful but • Either provably not helpful • Or open problem

The Problem Is it possible to have an efficient procedure: • Given CNF formulae 1 and 2 on same variables and same length come up with a CNF formula  that is: • Satisfiable if and only if 1 v 2 is satisfiable • Shorter than |1|+|2| Sufficiently short to apply recursively (1-) (|1|+|2|) • If no: there is hope for: • Efficient everlasting encryption in the hybrid bounded storage model • Forward-Secure-Storage [Dziembowski] • Derandomization of Sampling [Dubrov-Ishai] • If yes: There is a construction of Collision Resistant Hash functions from any one-way function • No “black box” construction of CRH from OWF [Simon98] • Construction uses the code of the one-way function

No Witness Retrievable Compression • Given CNF formulae 1 and 2 on same variables come up with a formula  that is: • Satisfiable if and only if 1 v 2 is satisfiable • Shorter than |1|+|2| Claim: if one-way functions exist, then a witness for either 1 or 2 cannot yield a witness for  efficiently. Most natural ideas are witness retrievable Satisfying assignment Proof intuition based on broadcast encryption lower bounds

Solve it in time 2n Maybe I can approximate it Could we just postpone it ? I can’t find an algorithm for the problem Solve it for some fixed parameters Find an algorithm that usually works? Approaches for dealing with NP-complete problems: • Approximation algorithms • Sub-exponential time algorithms • Parameterized complexity • Average case complexity • Save it for the future Garey and Johnson, 1979

Verdict on LDCs? Uncompressed paper on compressibility:www.wisdom.weizmann.ac.il/~naor/PAPERS/compressibility.html Compressed version FOCS 2006

THE END Thank You

Slides for the Proof of OMC

Consecutive Messages Protocols Simultaneous ALICE mA x {0,1}n x=y? mP CAROL mB y {0,1}n BOB Theorem (lower bound for CM protocols): For any equality protocol, as long as |mP| ≤ n/100,|mA| x |mB| = (n)

Program for This Talk: • Define online memory checkers • Review some past results • Describe new results • Proof sketch: • Define communication complexity model • Sketch lower bound for a simple case • Ideas for extending to the general case

The Reduction Use online memory checker to construct a consecutive messages equality protocol Online Checker Space: s(n) Query: q(n) Equality Protocol Alice msg: s(n) Bob msg: O(q(n)) Reduction Conclusion:s(n) x q(n) = Ω(n)(From communication complexity lower bound)

Simplifying Assumption (With loss of generality) Assumption: checker chooses indices to read from public memoryindependently of secret memory Checker Operation: • Get an index i in the original file • Choose which indices to read from the public memory,and read them. • Get the secret memory • Retrieve i-th bit or say BUG

The Reduction: Outline Use online memory checker Construct “random index” protocol, Bob chooses random index i:If x = y, then Carol acceptsIf xi ≠ yi, then Carol rejects Use online checker to build this protocol Use error correcting code Go from “random index” to equality testing:Alice, Bob encode inputs and run “random index” protocolIf Alice’s and Bob’s inputs different at even one index, encodings are different at many indices.

xi s(n)bits q(n)+1bits retrieve(i) store(x) Checker Public Memory P(x) Secret Memory S(x) Accept if yi = Ci ALICE x{0,1}n S(x) WANT: An adversary that can find bad x,y for protocol Can be used to find bad x,P(y),i for memory checker PROBLEM: Protocol adversary sees randomness! SOLUTION:Re-Randomize! Alice re-computes S(x) with different randomness, New S(x)independent of public randomness (givenP(x)) Requires exponential timeAlice x=y accept xi≠yi reject CAROL i, yi Conclusion [Weak Theorem]: For “restricted” online memory checkers s(n) x q(n) = Ω(n) y{0,1}n BOB Get random index i Ci = xi/BUG Bits for Carol store(y) Checker Public Memory P(y) retrieve(i) Secret Memory S(x) Secret Memory S(y)

Program for This Talk: • Define online memory checkers • Review some past results • Describe new results • Proof sketch: • Define communication complexity model • Sketch lower bound for a simple case • Ideas for extending to the general case

Recall Simplifying Assumption Assumption: checker chooses indices to read from public memoryindependently of secret memory Do we really need the assumption? Idea: If checker uses secret memory to choose indices, Adversary learns something about the secret memory from indices the checker reads.

Access Pattern Distribution For a retrieve request Access Pattern: Bits of public memory accessed by checker Access Pattern Distribution: Distribution of the checker’s access pattern(given its secret memory)Randomness: over checker’s coin tosses

Where Do We Go From Here? Observation:If adversary doesn’t know the access pattern distribution, then the checker is “home free”. Lesson for adversary:Activate checker many times, “learn” its access pattern distribution! [NR05]: Learning to Impersonate.

Learning The Access Pattern Distribution Theorem (Corollary from [NR05]) Learning algorithm for adversary: • Adversary stores x, secret memory s • Adversary makes O(s(n)) retrieves,p: Final public memory (after the stores and retrieves) • Adversary learnsL, can generate distribution DL(p). • “Real” distribution is DS(p) Guarantee: With high probability, the distributionsDL(p) and DS(p) are ε -close. L is of size O(q(n) x s(n))bits. Guarantee is only for the public memory p reached by checker!

L s(n)bits q(n)+1bits store(x), retrieves Checker Public Memory P(x) Secret Memory S(x) Accept if yi = Ci ALICE x{0,1}n Run Learner with public coins S(x) x=y accept xi≠yi reject Learned L CAROL Soundness: An adversary that finds x≠y s.t. Carol doesn’t reject, also fools memory checker Does this work??? PROBLEM: distributions by “real” S and “learned” L are close on original P(x)! They may be very far on P(y)! i, yi O(s(n)xq(n))bits y{0,1}n BOB Completeness: Adversary that finds x s.t. Carol rejects when Alice AND Bob’s inputs are x, also fools memory checker Access pattern distributions by “real” S and “learned” L are close on P(x).Protocol adversary sees L, checker adversary learns it! Run Learner with same coins Get random index i Ci = xi/BUG Bits for Carol store(y), retrieves Checker Public Memory P(y) retrieve(i) Secret Memory S(x) Secret Memory S(y) Learned L

Does it Work? • Will the protocol work when y≠x? • No! Big problem for the adversary:Can learn access pattern distribution on correct and unmodified public memory…really wants the distribution on different modified memory! • Learned information L may be: • Good on unmodified memory (DL(P(x)), DS(P(x))close) • Bad on modified memory (DL(P(y)), DS(P(y))far) • Can’t hope to learn distribution on modified public memory

Bait and Switch Carol knows S and L, if only she could check whether DL(P(y)), DS(P(y)) are ε-close… If far:P(y)≠P(x) (not “real” public memory)! Reject! If close:OK for Bob to use L for access pattern! Bob always uses L to determine access pattern.This is a “weakening” of the checker.

When Are LDCs a False Promise?

When Are LDCs a False Promise?

Presentation Transcript