240 likes | 401 Views
Obfuscation and Tamperproofing. Clark Thomborson 19 March 2010. What Secrets are in Software?. Source Code, Algorithms: competitors might provide similar functionality at less R&D cost. Constants: pirates might exploit knowledge of a decryption key or other compact secret.
E N D
Obfuscation and Tamperproofing Clark Thomborson 19 March 2010
What Secrets are in Software? • Source Code, Algorithms: competitors might provide similar functionality at less R&D cost. • Constants: pirates might exploit knowledge of a decryption key or other compact secret. • Internal function points: pirates might tamper with critical code e.g.if ( not licensed ) exit( ). • External interfaces: competitors might exploit a “service entrance”; attackers might create a “backdoor”.
Security Boundary for Obfuscated Code • Obfuscated code O(X) • Same behaviour as X • Released to attackers who want to know secrets: source code P, algorithm, unobfuscatedX, function points, … Source code P Algorithm Compiler Function Points Executable X Secret Interface Obfuscator CPU Secret Keys GUI
Security Boundary for Encrypted Code Source code P Executable X Compiler Algorithm Encrypter Function Points Encrypted code E(X) Decrypter Secret Interface Decrypted X Secret Keys CPU • Encryption requires a black-box CPU. • Note: I/O must be limited. No debuggers allowed! GUI
Design Issues for Encrypted Code • Key distribution • Tradeoff: security for expense & functionality. • Branches into an undecrypted block will stall the CPU until the target is decrypted. • This runtime penalty is proportional to block size. • Stronger encryption larger blocks larger runtime penalty. Another tradeoff. • The RAM buffer and the decrypter must be large and fast, to minimize the number of undecrypted blocks. • A black-box system with a large and fast RAM (more than will fit in the caches of a single-chip CPU) will be either expensive or insecure. A third tradeoff.
Debugging Encrypted Code Source code P Executable X Compiler Algorithm Encrypter E(X) Function Points Decrypter Secret Interface Decrypted X Secret Keys GUI CPU • Usually, a secret interface is an Easter Egg: easy to find if you know where to look! A confidentiality risk. • Mitigation: special hardware required to access the secret interface.
Tampering Attack on Encrypted Code Source code P Executable X Compiler Algorithm Encrypter E(X) Function Points E’(X) Decrypter Secret Interface Decrypted X Secret Keys GUI CPU • Random x86 code is likely to crash or loop (Barrantes, 2003). • Mitigation: cryptographically signed code. The system should test the signature on an executable before running it.
Intrusion Attack on Encrypted Code Source code P Executable X Compiler Algorithm Encrypter E(X) Function Points E(X) Decrypter Secret Interface Decrypted X Secret Keys GUI CPU • The attacker might find a way to inject code through the GUI. • Mitigations: secure programming techniques, type-safe programming languages, safety analysis on X, runtime intrusion detections, sandboxing, …
Tampering Attack on Obfuscated Code • Mitigation 1: O(X) might check its own signature. Note: O’(X) might not include this check! • Mitigation 2: obfuscate X so heavily that attacker is only able to inject random code. Source code P Algorithm Compiler Function Points Executable X Secret Interface O(X) Obfuscator Secret Keys O’(X) GUI CPU
Typical Obfuscation Techniques • Lexical obfuscations: • Obscure names of variables, methods, classes, interfaces, etc. (We obscure opcodes in our new framework.) • Data obfuscations: • Obscure values of variables, e.g. encoding several booleans in one int, or encoding one int in several floats; • Obscure data structures, e.g. transforming 2-d arrays into vectors, and vice versa; • Control obfuscations: • Inlining and outlining, to obscure procedural abstractions; • Opaque predicates, to obscure control flow. • (Control flow is obscured in our new obfuscation, because branching opcodes look like non-branching opcodes.)
Start / Stop Fetch Unit Decode Unit FSM Unit Execute Unit Obfuscated Interpretation • Put a secret FSM in the CPU fetch-execute hardware, or in the interpreter. The FSM translates opcodes immediately after the decode. • Software is “diversified” before it is obfuscated: basic blocks are subdivided, scrambled into random order, and instructions within blocks are reordered randomly (where possible). • Diversified software must be custom-translated for each FSM. • This implies that the software producer must know the serial number of its customer’s FSM. • We cannot allow the attacker to learn this information. • This is a classic key-distribution problem. Unfortunately, the keying is symmetric, because our opcode translation is not a one-way function. • Individualised FSMs could be distributed as obfuscated software or firmware, or might be hard-wired into CPU chips.
Cleartext: Let x = n Let p = 1 Loop: if x = = 0 exit add p, x sub x, 1 goto Loop; Obfuscated text: Let x = n Let p = 1 Loop: if x = = 0 exit sub p, x add x, 1 add p, 0 goto Loop; Obfuscated 2-op Ass’y Code “dummy instruction” to force FSM transition FSM translator (in CPU pipeline): sub add sub add/sub sub/add add/add sub/sub Starting State add
1 iconst_0 2 istore_2 3 iload_1 4 istore_1 5 if_icmpne Label3 6 Label1: 7 irem 8 iload_2 9 iload_1 10 iload_1 11 Label4: 12 goto Label2 13 iadd 14 istore_2 15 bipush 1 16 bipush 1 17 iload_1 18 pop 19 Label2: 20 iinc 1 1 21 bipush 1 22 goto Label4 23 Label3: 24 iconst_1 25 iload_0 26 if_icmple Label1 27 iadd 28 ireturn Obfuscated Java Bytecode • The translating FSM has 8 states, one for each opcode it translates: {goto, if_icmpne, iload_1, iconst_1, iconst_2, iadd, iload_2, irem} • Could you de-obfuscate this? • Could you develop a “class attack”? Note: each CPU has a different FSM.
Security Analysis • Tampering: an attacker should not be able to modify the obfuscated code. • Level 1 Attack: an attacker makes a desired change in program behaviour with a small number of localized changes to representation and semantics, i.e. changing “if (licensed) goto L” into “goto L”. • Level 2 Attack: an attacker makes a large change in program representation, i.e. by decompiling and recompiling. This may obliterate a watermark, and it will facilitate other attacks.
Prohibited Actions (cont.) • Reverse Engineering: an attacker should not be able to modify or re-use substantial portions (constants, objects, loops, functions) of an obfuscated code. • Level 3 Attack: an attacker makes large-scale changes in program behaviour, for example by de-obfuscating a decryption key to produce a “cracked” program. • Automated De-obfuscation: “class attack”. • Level 4 Attack: an attacker makes large-scale changes to the behaviour of a large number of obfuscated programs, for example by publishing a cracking tool suitable for use by script-kiddies.
3-D Threat Model • An adversary might have relevant knowledge & tools; • An adversary might have relevant powers of observation; • An adversary might have relevant control powers (i.e. causing the CPU to fetch and execute arbitrary codes). Goal of security analysis: what adversarial powers enable a level-k attack?
A. Knowledge and Tools • Level A0: adversary has an obfuscated code X’ and a computer system with a FSM that correctly translates and executes X’. • Level A1: adversary attended this seminar. • Level A2: adversary knows how to use a debugger with a breakpoint facility. • Level A3: adversary has tracing software that collects sequences of de-obfuscated instruction executions, correlated with sequences of obfuscated instructions; and adversary can do elementary statistical computations on these traces. • Level A4: adversary has an implementation of every FSM Fk(x), obfuscator Fk-1(x), and an efficient way to derive obfuscation key k from X’. • Our framework seems secure against level-A1 adversaries. • Level-A2 adversaries with sufficient motivation (and a debugger) will eventually progress to Level-A3 and then Level-A4 (which enables a level-4 “class attack”).
B. Observations • Level-B0 observation: run X’ on a computer, observe output. • Level-B1 observation: given X’’ and an input I, determine whether X’’(I) differs from X’(I) in its I/O behaviour. • Level-B2 observation: record a few opcodes and operands before and after FSM translation. (Use level-A2 tool.) • Level-B3 observation: record a complete trace of de-obfuscated instructions from a run of P’ • Level-B4 observation: determine the index x of a FSM which could produce a given trace from a run of P’ • We estimate that O(n2) level-B2 observations are enough to promote a level-A2 adversary to level-A3, for FSMs with n states. (The adversary could look for commonly-repeated patterns immediately before branches; these are likely to be “dummy sequences”. Branches may be recognized by their characteristic operand values.) • Level-B4 requires great cryptographic skill or level-C2 control.
C. Control Steps • Level-C0 control: keyboard and mouse inputs for a program run. • Level-C1 control: adversary makes arbitrary changes to the executable P’, then runs the resulting P’’ • Level-C2 control: adversary injects a few (arbitrarily chosen) opcodes into the fetch unit of the CPU after it reaches an execution breakpoint that is chosen by the adversary. (Use level-A2 tool: debugger.) • Level-C3 control: Adversary restarts the FSM, then injects arbitrary inputs into the fetch unit at full execution bandwidth. • Level-C4 control: Adversary can inject arbitrary inputs into software implementations of FSM F(x) and obfuscator F-1(x) for all x. • Level-C2 adversaries will eventually reach Levels C3 and then C4.
Summary and Discussion • New framework for obfuscated interpretation • Faster and cheaper than encryption schemes • Secure, unless an attacker is able to observe and control the FSM using a debugger (= a level-2 adversary). • We are still trying to develop an obfuscation-by-translation scheme that can be cracked only by a cryptographer who is also expert in compiler technology (= a level-4 adversary).
Future Work • Prototype implementation for Java bytecode. • Dummy insertions need not occur immediately before branches. • When translating a basic block, we will randomly choose among the efficiently-executable synonyms that end in the desired state. • This is the usual process of code optimization, plus randomization and a side-constraint. • Operand obfuscation!! • Operand values leak information about opcodes.