Linear Obfuscation to Combat Symbolic Execution

Linear Obfuscation to Combat Symbolic Execution 1 Nankai University 2 Pennsylvania State University 3 Singapore Management University Zhi Wang1, Jiang Ming2, Chunfu Jia1and Debin Gao3 European Symposium on Research in Computer Security 2011

Outline • Introduction • Linear Obfuscation • Evaluation • Conclusion

Trigger-based Code and Symbolic Execution • Trigger-based code only executes when specific inputs are received. • Symbolic execution • Combined with dynamic taint analysis and theorem proving • Discover trigger-based code • Find out the trigger condition

Conditional Code Obfuscation • Sharif et al. proposed a conditional code obfuscation scheme: • Obfuscate equality conditions • One-way hash function • Hard to reason about trigger conditions • Cryptographic functions might improve malware detection • Inequality conditions

Our goals • Less suspicious without using cryptographic functions • Support both equality and inequality conditions.

Linear Obfuscation • Use linear operations to combat symbolic execution without any cryptographic functions. • The obfuscated code becomes less suspicious in malware detection. • Introduce unsolvable conjectures into trigger conditions that inequality conditions are able to be easily obfuscated.

Unsolved Conjectures • Many unsolved conjectures involve simple linear operations. • Such operations are usually fast and commonly used in basic algorithms. • They are perfect candidates to be used in linear obfuscation. • Another advantage is that they can be used to obfuscate inequality conditions.

Collatz Conjecture(3x+1 Conjecture) Take any natural number n. If n is even, divide it by 2, if n is odd multiply it by 3 and add 1. Repeat the process , aiwill eventually reach 1 regardless of the value of n

Unsolved conjectures • These conjectures are similar to the Collatz conjecture in that they all converge to a fixed value regardless of the starting value.

Overview • Linear obfuscation does not hide the malicious behavior, but to hide the trigger conditions. • Linear obfuscation complicates symbolic execution by 3 steps. • Inserting a spurious input variable • Choosing an unsolved conjectures • Rebuilding the trigger condition

A linear obfuscation example

Semantics • Symbolic execution has a hard time figuring out the trigger condition, are we able to figure that out? • The new trigger conditions introduced by unsolvable conjectures are undecidable for symbolic execution. • But in the common program integer range(232 or 264), the new trigger conditions are decidable. • The 3x+1 conjecture has been tested and found to always reach 1 for all integers <= 20*258

How to insert a spurious variable • Only variables derived from program input are taken as symbol in symbolic execution. • Spurious variables must dependent upon real program inputs. • It is not the case that the more complicated the relationship between y and x is, the longer symbolic execution takes. • Floating point operations • Complex pointer operations

How to insert a spurious variable(2) • Symbolic execution will use concrete values to simplify the constraints. • So the relationship between x and y should be simple enough.

How to choose an unsolved conjecture • Convergent: the loop converges • Partially decidable: although no proof exists, it has been tested that the terminating condition is known under certain range. • Machine implementable: it can be easily implemented in common programming languages. • Simple/Linear: the implementation is simple and involves linear operations

Variation • Intuitively the trigger conditions is related to the converge value. • not only converge value can be used. For Collatz conjecture we can use 1, 2, 4 as terminating conditions. • Stopping time can also be used as terminating conditions. while (y > 1 )  for (i=0; i<1000; i++)

Rebuild Trigger Condition • Now, what we have? • a new spurious variable y = x+1000 • an unsolved conjecture with a trigger condition y == 1 • Depending on the original trigger condition, we modify it in three different ways.

Rebuild Trigger Condition • > or >= (e.g., x > 30): Since the spurious variable is always greater than or equal to 1, so x - y > 29 // 29 = 30 – 1. • < or <= (e.g., x < 30): Similarly, we have x + y < 31 // 31 = 30 + 1. • == (e.g., x == 30): This is equivalent to the intersection of two inequalities (x >= 30) && (x<= 30), and therefore we have (x+y >= 31) && (x – y <= 29)

Overhead in Size • Small: the size of the obfuscated code is less than one hundred bytes longer than the original program

Dynamic trigger condition • The obfuscated trigger condition is a sequence of dynamic conditions in the execution trace.

Pattern Match • Linear obfuscation might be susceptible to pattern recognition, assuming that the unsolved conjecture we use is known to attackers. • Solutions: • randomly choosing various unsolved conjectures • combining with other existing obfuscation techniques (e.g., opaque constants)

Control Flow Comparison • Similar to common program algorithm A quick sort algorithm Our obfuscated Code

Limitation • In our analysis, we assume that there is a single trigger condition, and show that symbolic execution has a hard time figuring it out. • However, the results may change when there is a larger set of trigger inputs that satisfy the trigger condition. • For example, x > 5.

Conclusion • In this paper, we introduce a novel linear obfuscation scheme that makes symbolic execution difficult in finding trigger conditions. • Our obfuscator applies the concept of unsolved conjectures and only adds a loop to the obfuscated code without cryptographic functions. • Security analysis shows that there does not exist other analyzing strategy in making the analysis simpler.

Thank you!

Linear Obfuscation to Combat Symbolic Execution