Obfuscation and Tamperproofing

Obfuscation and Tamperproofing Clark Thomborson 19 March 2010

What Secrets are in Software? • Source Code, Algorithms: competitors might provide similar functionality at less R&D cost. • Constants: pirates might exploit knowledge of a decryption key or other compact secret. • Internal function points: pirates might tamper with critical code e.g.if ( not licensed ) exit( ). • External interfaces: competitors might exploit a “service entrance”; attackers might create a “backdoor”.

Security Boundary for Obfuscated Code • Obfuscated code O(X) • Same behaviour as X • Released to attackers who want to know secrets: source code P, algorithm, unobfuscatedX, function points, … Source code P Algorithm Compiler Function Points Executable X Secret Interface Obfuscator CPU Secret Keys GUI

Security Boundary for Encrypted Code Source code P Executable X Compiler Algorithm Encrypter Function Points Encrypted code E(X) Decrypter Secret Interface Decrypted X Secret Keys CPU • Encryption requires a black-box CPU. • Note: I/O must be limited. No debuggers allowed! GUI

Design Issues for Encrypted Code • Key distribution • Tradeoff: security for expense & functionality. • Branches into an undecrypted block will stall the CPU until the target is decrypted. • This runtime penalty is proportional to block size. • Stronger encryption  larger blocks  larger runtime penalty. Another tradeoff. • The RAM buffer and the decrypter must be large and fast, to minimize the number of undecrypted blocks. • A black-box system with a large and fast RAM (more than will fit in the caches of a single-chip CPU) will be either expensive or insecure. A third tradeoff.

Debugging Encrypted Code Source code P Executable X Compiler Algorithm Encrypter E(X) Function Points Decrypter Secret Interface Decrypted X Secret Keys GUI CPU • Usually, a secret interface is an Easter Egg: easy to find if you know where to look! A confidentiality risk. • Mitigation: special hardware required to access the secret interface.

Tampering Attack on Encrypted Code Source code P Executable X Compiler Algorithm Encrypter E(X) Function Points E’(X) Decrypter Secret Interface Decrypted X Secret Keys GUI CPU • Random x86 code is likely to crash or loop (Barrantes, 2003). • Mitigation: cryptographically signed code. The system should test the signature on an executable before running it.

Intrusion Attack on Encrypted Code Source code P Executable X Compiler Algorithm Encrypter E(X) Function Points E(X) Decrypter Secret Interface Decrypted X Secret Keys GUI CPU • The attacker might find a way to inject code through the GUI. • Mitigations: secure programming techniques, type-safe programming languages, safety analysis on X, runtime intrusion detections, sandboxing, …

Tampering Attack on Obfuscated Code • Mitigation 1: O(X) might check its own signature. Note: O’(X) might not include this check! • Mitigation 2: obfuscate X so heavily that attacker is only able to inject random code. Source code P Algorithm Compiler Function Points Executable X Secret Interface O(X) Obfuscator Secret Keys O’(X) GUI CPU

Typical Obfuscation Techniques • Lexical obfuscations: • Obscure names of variables, methods, classes, interfaces, etc. (We obscure opcodes in our new framework.) • Data obfuscations: • Obscure values of variables, e.g. encoding several booleans in one int, or encoding one int in several floats; • Obscure data structures, e.g. transforming 2-d arrays into vectors, and vice versa; • Control obfuscations: • Inlining and outlining, to obscure procedural abstractions; • Opaque predicates, to obscure control flow. • (Control flow is obscured in our new obfuscation, because branching opcodes look like non-branching opcodes.)

Start / Stop Fetch Unit Decode Unit FSM Unit Execute Unit Obfuscated Interpretation • Put a secret FSM in the CPU fetch-execute hardware, or in the interpreter. The FSM translates opcodes immediately after the decode. • Software is “diversified” before it is obfuscated: basic blocks are subdivided, scrambled into random order, and instructions within blocks are reordered randomly (where possible). • Diversified software must be custom-translated for each FSM. • This implies that the software producer must know the serial number of its customer’s FSM. • We cannot allow the attacker to learn this information. • This is a classic key-distribution problem. Unfortunately, the keying is symmetric, because our opcode translation is not a one-way function. • Individualised FSMs could be distributed as obfuscated software or firmware, or might be hard-wired into CPU chips.

Cleartext: Let x = n Let p = 1 Loop: if x = = 0 exit add p, x sub x, 1 goto Loop; Obfuscated text: Let x = n Let p = 1 Loop: if x = = 0 exit sub p, x add x, 1 add p, 0 goto Loop; Obfuscated 2-op Ass’y Code “dummy instruction” to force FSM transition FSM translator (in CPU pipeline): sub add sub add/sub sub/add add/add sub/sub Starting State add

1 iconst_0 2 istore_2 3 iload_1 4 istore_1 5 if_icmpne Label3 6 Label1: 7 irem 8 iload_2 9 iload_1 10 iload_1 11 Label4: 12 goto Label2 13 iadd 14 istore_2 15 bipush 1 16 bipush 1 17 iload_1 18 pop 19 Label2: 20 iinc 1 1 21 bipush 1 22 goto Label4 23 Label3: 24 iconst_1 25 iload_0 26 if_icmple Label1 27 iadd 28 ireturn Obfuscated Java Bytecode • The translating FSM has 8 states, one for each opcode it translates: {goto, if_icmpne, iload_1, iconst_1, iconst_2, iadd, iload_2, irem} • Could you de-obfuscate this? • Could you develop a “class attack”? Note: each CPU has a different FSM.

Security Analysis • Tampering: an attacker should not be able to modify the obfuscated code. • Level 1 Attack: an attacker makes a desired change in program behaviour with a small number of localized changes to representation and semantics, i.e. changing “if (licensed) goto L” into “goto L”. • Level 2 Attack: an attacker makes a large change in program representation, i.e. by decompiling and recompiling. This may obliterate a watermark, and it will facilitate other attacks.

Prohibited Actions (cont.) • Reverse Engineering: an attacker should not be able to modify or re-use substantial portions (constants, objects, loops, functions) of an obfuscated code. • Level 3 Attack: an attacker makes large-scale changes in program behaviour, for example by de-obfuscating a decryption key to produce a “cracked” program. • Automated De-obfuscation: “class attack”. • Level 4 Attack: an attacker makes large-scale changes to the behaviour of a large number of obfuscated programs, for example by publishing a cracking tool suitable for use by script-kiddies.

3-D Threat Model • An adversary might have relevant knowledge & tools; • An adversary might have relevant powers of observation; • An adversary might have relevant control powers (i.e. causing the CPU to fetch and execute arbitrary codes). Goal of security analysis: what adversarial powers enable a level-k attack?

A. Knowledge and Tools • Level A0: adversary has an obfuscated code X’ and a computer system with a FSM that correctly translates and executes X’. • Level A1: adversary attended this seminar. • Level A2: adversary knows how to use a debugger with a breakpoint facility. • Level A3: adversary has tracing software that collects sequences of de-obfuscated instruction executions, correlated with sequences of obfuscated instructions; and adversary can do elementary statistical computations on these traces. • Level A4: adversary has an implementation of every FSM Fk(x), obfuscator Fk-1(x), and an efficient way to derive obfuscation key k from X’. • Our framework seems secure against level-A1 adversaries. • Level-A2 adversaries with sufficient motivation (and a debugger) will eventually progress to Level-A3 and then Level-A4 (which enables a level-4 “class attack”).

B. Observations • Level-B0 observation: run X’ on a computer, observe output. • Level-B1 observation: given X’’ and an input I, determine whether X’’(I) differs from X’(I) in its I/O behaviour. • Level-B2 observation: record a few opcodes and operands before and after FSM translation. (Use level-A2 tool.) • Level-B3 observation: record a complete trace of de-obfuscated instructions from a run of P’ • Level-B4 observation: determine the index x of a FSM which could produce a given trace from a run of P’ • We estimate that O(n2) level-B2 observations are enough to promote a level-A2 adversary to level-A3, for FSMs with n states. (The adversary could look for commonly-repeated patterns immediately before branches; these are likely to be “dummy sequences”. Branches may be recognized by their characteristic operand values.) • Level-B4 requires great cryptographic skill or level-C2 control.

C. Control Steps • Level-C0 control: keyboard and mouse inputs for a program run. • Level-C1 control: adversary makes arbitrary changes to the executable P’, then runs the resulting P’’ • Level-C2 control: adversary injects a few (arbitrarily chosen) opcodes into the fetch unit of the CPU after it reaches an execution breakpoint that is chosen by the adversary. (Use level-A2 tool: debugger.) • Level-C3 control: Adversary restarts the FSM, then injects arbitrary inputs into the fetch unit at full execution bandwidth. • Level-C4 control: Adversary can inject arbitrary inputs into software implementations of FSM F(x) and obfuscator F-1(x) for all x. • Level-C2 adversaries will eventually reach Levels C3 and then C4.

Summary and Discussion • New framework for obfuscated interpretation • Faster and cheaper than encryption schemes • Secure, unless an attacker is able to observe and control the FSM using a debugger (= a level-2 adversary). • We are still trying to develop an obfuscation-by-translation scheme that can be cracked only by a cryptographer who is also expert in compiler technology (= a level-4 adversary).

Future Work • Prototype implementation for Java bytecode. • Dummy insertions need not occur immediately before branches. • When translating a basic block, we will randomly choose among the efficiently-executable synonyms that end in the desired state. • This is the usual process of code optimization, plus randomization and a side-constraint. • Operand obfuscation!! • Operand values leak information about opcodes.

Obfuscation and Tamperproofing

Obfuscation and Tamperproofing

Presentation Transcript

Code Obfuscation

Indistinguishability Obfuscation for all Circuits

Reversing Trojan.Mebroot’s Obfuscation

JavaScript Obfuscation

Obfuscation for Evasive Functions

Advances in Obfuscation

Protecting Obfuscation Against Algebraic Attacks

Wireless Sensor Route Obfuscation

Software Obfuscation from Crackers’ viewpoint

Obfuscation techniques in Java

Code Obfuscation

Software Obfuscation

URL Obfuscation With @

Java Virtual Machine ( Obfuscation and Java )

Code Obfuscation Final Presentation

Obfuscation techniques in Java

Secure Obfuscation for Encrypted Signatures

Code Obfuscation Final Presentation

Code Obfuscation Midterm Presentation

Binary Obfuscation Using Signals

Software Security Through Code Obfuscation