330 likes | 481 Views
16th ACM CCS. English Shellcode. Joshua Mason, Sam Small Johns Hopkins University. Fabian Monrose University of North Carolina. Greg MacManus iSIGHT Partners. Outline. Introduction On the arms race Related work Our approach Automatic generation Implementation Evaluation.
E N D
16th ACM CCS English Shellcode Joshua Mason, Sam Small Johns Hopkins University Fabian Monrose University of North Carolina Greg MacManus iSIGHT Partners
Outline • Introduction • On the arms race • Related work • Our approach • Automatic generation • Implementation • Evaluation Advaced Defense Lab
Introduction • Code-injection attack • Source code for script-language • Byte-code • Machine code • The common component • The injected code or … • shellcode Advaced Defense Lab
Misconception • Shellcode is delivered in tandemwith the exploitation. • Store shellcode in memory, then exploit • Shellcode takes the form of directly executable machine code. • polymorphism Advaced Defense Lab
Misconception…? • Even polymorphic shellcode is constrained by an essential component: the decoder. • Shellcode is fundamentally different in structure than non-executable payload data. • This paper!!! Decoder Encoded data Advaced Defense Lab
About This Paper • Automatically producing English Shellcode • Although it is not indistinguishable form authentic English prose. • Do you want to analyze? Advaced Defense Lab
On The Arms Race • Shellcode developers are often faced with constraints that limit the range of byte-values aceepted. • e.g. printable, alphanumeric, MIME • Encoding • Self-modification Advaced Defense Lab
On The Arms Race • Much literature describing code injection attacks assumes a standard attack template. • A NOP sled, shellcode, and one or more pointer • While emulation and static analysis have bean successful in identifying some failings of advanced shellcode. • But…overhead Advaced Defense Lab
On The Arms Race • It has been suggested that malicious polymorphic behavior cannot be modeled effectively. • On the infeasibility of Modeling Polymorphic Shellcode. • By Y. Song et al. Advaced Defense Lab
Related Work • Limit the spoils of exploitation and to prevent developers from writing vulnerable code • Preventing the execution of injected code • Content-based input-validation • Polymorphic • To identify self-decrypting shellcode • But … non-self-contained polymorphic shellcode Advaced Defense Lab
Our Approach • Shellcode is simply an ordered list of machine instructions. • “Shake ShakeShake!” • push %ebx; push “ake ”;push %ebx; push “ake ”;push %ebx; push “ake!”; • But add, mov, call • To develop an automated approach • Arbitrary shellcode English representation Advaced Defense Lab
High-level Overview • English shellcode is completely self-contained. Advaced Defense Lab
The Decoder • The decoder must be English-cpmpatible • Cannot use many instruction • E.g. loop instructions • Our decoder has the form: • Initialization • Decoder • Encoded payload Advaced Defense Lab
The Decoder principle • Only English-compatible instructions • English-compatible instructions that can produce useful instructions • Favor instructions that have less-constrained ASCII equivalents • push %eax (“P”) > push %ecx (“Q”) Advaced Defense Lab
Decoder - initialization • Overwriting registers and patching some instructions • Using inc instruction and manipulatiing the alignment of the stack Advaced Defense Lab
Decoder - Unpacking • “and r/m8, r8”(0x20, ASCII space character) • add • lods (load string from esi) Advaced Defense Lab
Decoder - Decoding • Two pointer: %esi, %edi ”,” and “ ” ”u” and “decode” ”G” Advaced Defense Lab
Decoder – Initialing Registers • Using popa instruction (ASCII character “a”) Advaced Defense Lab
Automatic Generation • Taken as-is, the custom decoder will have common English characters, but will not appearance of English text. • Add some instructions between decoder instructions • Augmenting a statistical language generation algorithm. Advaced Defense Lab
Automatic Generation • n-gram model length is 5 • the ith instruction in decoder have a level i • A sentence have score i when it complete level i Advaced Defense Lab
Using beam search algorithm • Keep the best m(=20,000) candidates during the process • For encoded payload, observe how many target byte are encoded Advaced Defense Lab
Implementation • The training data • Over 15,000 Wikipedia articles • 27,000 books from the Project Gutenberg • Language engine was constructed in the Java language using the LingPipe API • Scoring engine using ptrace API • Executor • Watcher • Taking 12 hours Advaced Defense Lab
An Optimized Design • Emulation • Expand 1 instruction into tens of instructions • Monitored direct execution • Maintain 2 machine state • Use 3 separate stacks • Pause 2 conditions • Encounter a jump • Change memory • Roughly in less than 1 hour Advaced Defense Lab
Evaluation • Exit(0) • 2054 bytes Advaced Defense Lab
Compare with Spectrum Analysis • Windows Bind DLL Inject Advaced Defense Lab