320 likes | 425 Views
Making proof-based verified computation almost practical. Michael Walfish. The University of Texas at Austin. By verified computation , we mean the following:. “f”, x. client. server. y , aux. c heck whether y = f(x), without computing f (x ).
E N D
Making proof-based verified computation almost practical Michael Walfish The University of Texas at Austin
By verified computation, we mean the following: “f”, x client server y, aux. check whether y = f(x), without computing f(x) The motivation is 3rd party computing: cloud, volunteers, etc. We desire the following properties in the above exchange: 1. Unconditional, meaning no assumptions about the server 2. General-purpose, meaning not specialized to a particular f 3. Practical, or at least conceivably practical soon
Theory can supposedly help. Consider the theory of Probabilistically Checkable Proofs (PCPs). [almss92, as92] “f”, x client server y’ y reject accept ... ... Using a naive PCP implementation, verifying multiplication of 500×500 matrices would take 500+ trillion CPU years (seriously). Unfortunately, the constantsand proof length are outrageous.
S. Setty, B. Braun, V. Vu, A. J. Blumberg, B. Parno, and M. Walfish. Resolving the conflict between generality and plausibility in verified computation. Cryptology ePrint 622, November 2012. • S. Setty, V. Vu, N. Panpalia, B. Braun, A. J. Blumberg, and M. Walfish. Taking proof-based verified computation a few steps closer to practicality. Usenix Security, August 2012. • S. Setty, R. McPherson, A. J. Blumberg, and M. Walfish. Making argument systems for outsourced computation practical (sometimes). Network and Distributed Systems Security Symposium (NDSS), February 2012. • S. Setty, A. J. Blumberg, and M. Walfish. Toward practical and unconditional verification of remote computations. Workshop on Hot Topics in Operating Systems (HotOS), May 2011. We have reduced the costs of a PCP-based argument system[Ishai et al., CCC07] by 20+ orders of magnitude, with proof. We have implemented the refinements in a system that is not ready for prime time but is practical in some cases. We believe that PCP-based machinery will be a key tool for building secure systems in the near future.
(1) The design of Pepper (2) Experimental results, limitations, and outlook
Pepper incorporates PCPs but not like this: “f”, x client server y accept/reject ... ... (Even the asymptotically short PCPs [BGHSV CCC05, BGHSV SIJC06, Dinur JACM07, Ben-Sasson & Sudan SIJC08]seem to have high constants.) We move out of the PCP setting: we make computational assumptions. (And we allow # of query bits to be superconstant.) The proof is not drawn to scale: it is far too long to be transferred.
Instead of transferring the PCP … server client … Pepper uses an efficient argument [KilianCRYPTO 92,95]: commit request server client commit response ... ... PCPQuery(q){ return <q,w>; } queries q1, q2, q3, … [IKO CCC07] efficient checks [ALMSS JACM98] q1w q2w q3w accept/reject
The server’s vectorw encodes an execution trace of f(x). [ALMSS JACM98] AND NOT 1 0 0 1 y0 x0 w 0 OR NOT 1 1 x1 f ( ) x 0 1 y1 0 1 AND NOT … 0 xn 0 0 1 1 What is in w? • An entry for each wire; and • An entry for the product of each pair of wires.
Pepper uses an efficient argument [KilianCRYPTO 92,95]: commit request server client commit response PCPQuery(q){ return <q,w>; } queries q1, q2, q3, … ... efficient checks [IKO CCC07] [ALMSS JACM98] q1w q2w q3w accept/reject This is still too costly (by a factor of 1023), but it is promising.
Pepper incorporates refinements to [IKO CCC07], with proof. server client “f” , x y commit request w commit response query vectors: q1, q2, q3, … queries checks response scalars: q1w, q2w, q3w, … accept/reject
The client amortizes its overhead by reusing queries over multiple runs. Each run has the same f but different input x. query vectors: q1, q2, q3, … client server w1 w2 w3
server client “f” , x y commit request w commit response ✔ query vectors: q1, q2, q3, … queries checks response scalars: q1w, q2w, q3w, … accept/reject
Arithmetic circuit with concise gates Boolean circuit Arithmetic circuit something gross ab × + ab × ab + × + × Unfortunately, this computational model does not really handle fractions, comparisons, logical operations, etc.
We now work with (degree-2) constraints over a finite field. 0 = X - <input>, 0 = Y - (X+1), 0 = Y - <output> Increment(X) Y=X + 1 An incorrect input/output pair results in unsatisfiable constraints. As an example, suppose the input above is 6. If the output is 7 … If the output is 8 … 0 = X - 6 0 = Y - (X + 1) 0 = Y - 7 0 = X - 6 0 = Y - (X + 1) 0 = Y - 8 … there is a solution … there is no solution
“Z1 != Z2” 1 = (Z1 – Z2) M This formalism represents general-purpose program constructs concisely: if/else, !=, <, >, ||, &&, floating-point, … if (bool X) Y=3 else Y=4 0 = X - M Y = M3 + (1-M)4 We have a compiler that does this transformation; it is derived from Fairplay[Malkhi et al. Usenix Security04]. This work should apply to other protocols.
The proof vector now encodes the assignment that satisfies the constraints. w w 1 219 1 = (Z1 – Z2) M 0 = Z3 − Z4 0 = Z3Z5 + Z6− 5 1 0 0 0 2047 1 0 1 1013 1 0 0 0 1 1 1 0 805 1 187 0 23 The savings from the change are enormous (though the new model is not perfect – loops are still unrolled). Take-away: constraints are general-purpose, (relatively) concise, and accommodate the verification machinery. Z1=23, Z2=187, …,
✔ server client “f” , x y commit request w commit response ✔ query vectors: q1, q2, q3, … queries checks response scalars: q1w, q2w, q3w, … accept/reject
We (mostly) eliminate the server’s PCP-based overhead. server w after: # of entries linear in computation size before:# of entries quadraticin computation size The client and server reap tremendous benefit from this change. Now, the server’s overhead is mainly in the cryptographic machinery and the constraint model itself.
server client commit request w commit response PCP verifier w q1, q2, …, qu queries responses π(q1), …, π(qu) (z, h) linearity test quad corr. test circuit test (z, z ⊗ z) |w|=|Z|2 π()=<,w> |w|=|Z|+|C| [Gennaro et al. 2012] new quad.test For any computation, there exists a linear PCP whose proof vector is (quasi)linear in the computation size. This resolves a conjecture of Ishai et al. [IKO CCC07]
✔ “f”, x server client y commit request commit response w ✔ query vectors: q1, q2, q3, … queries ✔ checks response scalars: q1w, q2w, q3w, … accept/reject
We prove: commitment primitive is (far) stronger than linearity test client server commit request Enc(r) commit response Enc(π(r)) PCP verifier ? q1, q2, …, qu queries t ≣ π() responses π(t) π(q1), …, π(qu) linearity test quad. test consistency test consistency: t = r + α1q1 + … + αuqu π(t) = π(r) + α1π (q1) + … + αuπ (qu) ? ? linearity: π(q1) + π(q2) = π(q1+q2) We eliminate linearity tests. We eliminate soundness amplification (repetition to reduce error).
✔ “f”, x server client y ✔ commit request commit response w ✔ query vectors: q1, q2, q3, … queries ✔ ✔ checks response scalars: q1w, q2w, q3w, … accept/reject
✔ (1) The design of Pepper (2) Experimental results, limitations, and outlook
Our implementation of the server is massively parallel; it is threaded, distributed, and GPUed. Some details of our evaluation platform: • It uses a cluster at Texas Advanced Computing Center (tacc) • Each machine runs Linux on an Intel Xeon 2.53 GHz with 48gb of ram. • For gpu experiments, we use nvidia Tesla M2070 gpus (448 cuda cores and 6gb of memory)
However, this assumes a (fairly large) batch. To get a sense of the cost reduction, consider amortized costs for multiplication of 500×500 matrices:
computation costs verification costs CPU time break-even point # instances • At these points, is the server’s latency acceptable? What are the break-even points?
Consider outsourcing in two different regimes: (1) Verification work performed on CPU mat. mult., m=500 insertion sort, m=256 PAM clustering, m=15, d=100 (2) If we had free crypto hardware for verification …
Parallelizing the server results in near-linear speedup 60 cores (ideal) 60 cores (ideal) 60 cores (ideal) 60 cores (ideal) 60 cores 60 cores 60 cores 60 cores 4 cores 4 cores 4 cores 4 cores 1 core 1 core 1 core 1 core matrix multiplication all-pairs shortest paths insertion sort PAM clustering
Parallelizing the server results in near-linear speedup 60 cores (ideal) 60 cores (ideal) 60 cores (ideal) 60 cores (ideal) 60 cores 60 cores 60 cores 60 cores 4 cores 4 cores 4 cores 4 cores 1 core 1 core 1 core 1 core Hamming distance root finding polynomial evaluation Fannkuch benchmark
Pepper is not ready for prime time: • The client requires batching to break even. • The server’s burden is too high, still. • The computational model is still limited (looping is clumsy, etc.)
Gives up being unconditional or general-purpose • Replication [Castro & Liskov TOCS02], trusted hardware [Chiesa & Tromer ICS10, Sadeghi et al. TRUST10], auditing [Haeberlen et al. SOSP07, Monrose et al. NDSS99] • Special-purpose [Freivalds MFCS79, Golle & Mironov RSA01, Sion VLDB05, Benabbas et al. CRYPTO11, Boneh & Freeman EUROCRYPT11] Unconditional and general-purpose but not geared toward practice • Use fully homomorphic encryption [Gennaro et al., Chung et al. CRYPTO10] • Proof-based verified computation [GMR85, Ben-Or et al. STOC88, BFLS91, Kilian STOC92, ALMSS92, AS92, Goldwasser et al. STOC08, Ben-Sasson et al. ePrint12] Proof-based verified computation geared to practice • Pepper[HotOS11, NDSS12, Usenix Security 12, ePrint12] • [Cormode et al. ITCS12, Thaler et al. HotCloud12] [Gennaro et al. ePrint12] We describe the landscape in terms of our three goals.
We have reduced the costs of a PCP-based argument system [Ishai et al., CCC07] by 20+ orders of magnitude, with proof. The refinements are in the PCP and the cryptography. We have made the computational model general-purpose. This leads to cost savings and programmability … … and yields a practical compiler. Take-away: Proof-based verified computation is now close to practical and will be a key tool in real systems.