On The (Im)possibility of Software Obfuscation

On The (Im)possibility of Software Obfuscation Boaz Barak Joint work with Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil Vadhan and Ke Yang.

What Is an Obfuscator? • An obfuscator: An algorithm O such that for any program P , O(P) is a program such that: • O(P) has the same functionality as P • O(P) is infeasible to analyze / “reverse-engineer”. Intuition: an obfuscator should provide a “virtual black-box” in the sense that giving someone O(P) should be equivalent to giving her a black-box that computes P.

Why Might Obfuscators Exist? • Practical Reasons: • Understanding code is very difficult • Obfuscation used (successfully?) in practice for security purposes • Theoretical Reasons: • All canonical hard problems are problems of reverse engineering: SAT, HALTING • Rice’s Theorem: You can’t look at the code (Turing Machine description) of a function and find out a non-trivial property of it.

Applications for Obfuscators • Distributing music on-line • Removing Random Oracles for specific natural protocols. • Converting a private key encryption to a public key encryption • Give someone ability to sign/decrypt a restricted subset of the message space.

Private (Shared) Key Encryption  Public Key Encryption m Ek c Dk m Ek A c m Private Key Encryption Scheme: CPA (Chosen Plaintext Attack) Security:

m Ee c Dd m e A c m Public Key Encryption Scheme: Security:

The Conversion m E’e c D’d m Instead of publishing the key k, publish e=O(Ek)

Security of The Converted Scheme e=O(Ek) A’ c m Ek A c m

Defining Obfuscators • Definition 1 An algorithm O is an obfuscatorif for any circuit C: • (functionality) O(C)~ C(i.e., O(C) computes the same function as C) • (polynomial slowdown) |O(C)|  p(|C|) for some polynomial p( ). • We say that O is efficient if it runs in polynomial time.

Defining Security “Anything that can be learned from the obfuscated form, could have been learned by merely observing the circuit’s input-output behavior (i.e., by treating the circuit as a black-box)’’ A Natural Formal Interpretation: For any adversary A there’s a simulator S such that for any circuit C A(O(C)) C.I. SC(1|C|) This definition is impossible to meet!

Defining Security (2) Relaxation: simulator should only compute a specific function (even predicate) rather than generate an indistinguishable output. Weak Obfuscators: " A " (poly time) predicate p:{0,1}*{0,1}$ S such that for all circuits C Pr[ A(O(C)) = p(C) ] £ Pr[ SC(1|C|) = p(C) ] + negl(|C|) Note: may be too weak for desired applications, but still we’ll prove that it is impossible to meet.

Inherently Unobfuscatable Functions Definition 2 A (efficiently computable) function ensemble { Ft } ( Ft:{0,1}|t|{0,1}|t| ) is an unobfuscatable function ensemble (UF) if it satisfies: There’s a poly time predicate p:{0,1}*{0,1} such that: • (a) (p easy to compute with a circuit) There’s a p.p.t A such that for any circuit C such that C ~ FtA(C) = p(Ft) • (b) (p hard to compute with black-box access) For any p.p.t S , if t {0,1}nthen • Pr [ SFt (1n) = p(t) ] £ ½ + negl(n)

The Main Result Theorem 1: unobfuscatable functions   “very weak” obfuscators. Theorem 2: one way functions  unobfuscatable functions Theorem 3:efficientweak obfuscators  one way functions Corollary 4: Efficient weak obfuscators do not exist.

The Combination Operator For f0 , f1 : X  Y , define f0#f1: {0,1}  X  Y by: f0#f1(b,x) := fb(x) • Properties: • From a circuit C that computes f0#f1 one can compute circuits C0,C1that compute f0and f1 (Cb(x) := C(b,x).) • Oracle access to f0#f1 oracle access to both f0and f1 Using the combination operator, we can attempt to prove Theorem 2.

Solving The Input Size Problem Lemma 5: If one-way functions exist then there exists an (efficiently constructible) ensemble {Da,b,z } such that: 1. There’s a p.p.t A’ such that for any circuit C that satisfies C(a)= b and for any z:A’Da,b,z(C) = a1 (in particular there’s a p.p.t A’’ such that A’’(C# Da,b,z ) = a1) 2. Oracle access to Da,b,z does not help in learning anything about a . Formal Interpretation (semantic security): For any p.p.t S there’s a p.p.t S’ such that for any (poly time) function p:{0,1}*{0,1}* Pra,b,z [SDa,b,z(1n) = p(a) ] £ Pra,b,z [ S’(1n) = p(a) ] + negl(n)

Lemma 5 Proves Theorem 2 • Define Fa,b,z := Ca,b#Da,b,z • p(a,b,z) := a1 • We claim that { Fa,b,z } is an IUF w.r.t the function p . • Algorithm A: When given a circuit F do: • Decompose F into circuits C,D such that F~C#D • Return A’D(C) • Claim 1: For any circuit F such that F~ Fa,b,z , • A(F) = a1 • Claim 2: For any p.p.t S • Pra,b,z[ SFa,b,z(1n) = a1 ] £½+ negl(n)

Proof of Lemma 5 Let (ENCk , DECk) be a private key encryption scheme. Define: Ia,k – constant function ENCk(a1)…ENCk(an) Hk(c,d,) := ENCk( DECk(c)  DECk(d) ) Ba,b,k (c1,…,cn) := a1if DECk(c1) = b1,…, DECk(cn) = bn Ba,b,k (c1,…,cn) := 0 otherwise Let { hk’ } be a pseudorandom function ensemble. We define: Da,b,k,,k’ := Ia,k,k’#Hk,k’# Ba,b,k

Unobfuscatable Encryption Scheme Definition 3: A (CPA secure) private key encryption scheme (GEN, ENC , DEC) is unobfuscatable if there is an alg A such that A(C) = k for any circuit C s.t. C~ENCk That is, A can totally break the encryption scheme given any circuit that computes the encryption function. Theorem 6: If secure private key encryption schemes exist then so do inherently unobfuscatable encryption schemes.

Proof of Theorem 6 Suppose that (GEN,ENC,DEC) is a (CPA) secure private key encryption scheme. It follows that one way function exist. Let { Fa,b,z } be the ensemble from the proof of Theorem 2 and change it to F’a,b,z,k such that there’s an algorithm A such that A(F’) = (a,b,z,k) for any circuit F’ such that F’ ~ F’a,b,z,k. Define (GEN’ , ENC’ ,DEC’) to be the following: GEN’(1n) := (k, a, b,z) where kGEN’(1n) ENC’ k a, b,z(m) := ENCk(m);F’a,b,z,k(m) DEC’ k a, b,z(c;y) := DECk(c)

Similar Results If signature schemes exist then so do unobfuscatable signature schemes. If pseudorandom functions exist then so do unobfuscatable pseudorandom functions. This results mean that any algorithm that satisfies Definition 1 can not be used to obtain the applications described before in the way that we thought. They do not mean that these applications can’t be obtained in other ways. (In particular, we believe that public key encryption schemes do exist).

Other Results • Generalization: obfuscators that only (strongly) approximate input circuit: for any circuit C , and for any input x • Pr[ C(x)  (O(C) )(x) ] = negl(|C|) • (probability only over O ‘s coin tosses) • Note: our proof does not directly apply here • A promise problem version of a complexity theory analog of Rice’s Theorem is false. • Weaker “obfuscation-like” notions: (e.g., sampling obfuscators).

Conclusions • Is there any hope for obfuscation? • Weaker / different definitions. • Restricted classes of algorithms.

On The (Im)possibility of Software Obfuscation