330 likes | 572 Views
Detection of ASCII Malware. Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen. Internet Worm and Malware. Huge damage potential Infects hundreds of thousands of computers Costs millions of dollars in damage Melissa, ILOVEYOU, Code Red, Nimda, Slammer, SoBig, MyDoom
E N D
Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen
Internet Worm and Malware • Huge damage potential • Infects hundreds of thousands of computers • Costs millions of dollars in damage • Melissa, ILOVEYOU, Code Red, Nimda, Slammer, SoBig, MyDoom • Mostly uses Buffer Overflow • Propagation is automatic (mostly)
Recent Trends • Shift in hacker’s mindset • Malware becoming increasingly evasive and obfuscative • Emergence of Zero-day worms • Arrival of Script Kiddies
Motivation for ASCII Attacks • Prevalence of servers expecting text-only input • Text-based protocols • Presumption of text being benign • Deployment of ASCII filter for bypassing text
IDS Detecting ASCII Attack? • Disassembly-based IDS • All jump instructions are ASCII • Higher proportion of branches • Exponential disassembly cost • High processing overhead for IDS • Frequency-based IDS • PAYL evaded by ASCII worm
Constraints of ASCII Malware • Opcode Unavailability • Shellcode requires binary opcodes • Here only xor, and, sub, cmp etc. • Must generate opcodes dynamically • Difficulty in Encryption • No backward jump • Can’t use same decrypter routine for each encrypted block • No one-to-one correspondence between ASCII and binary ASCII binary
Buffer Overflow using ASCII Overflowing a buffer using an ASCII string:
Detection of ASCII Malware • Opcode Unavailability • Dynamic generation of opcodes needs more ASCII instructions for each binary instruction • Difficulty in Encryption • No backward jump means decrypter block for each encrypted block must be hardcoded • Long sequence of contiguous valid instructions likely high MEL What is this MEL?
Maximum Executable Length • Indicates maximum length of an execution path • Need to disassemble (and execute) from all possible entry points • All branching must be considered • Abstract payload execution • Used for binary worms with sled • Effectiveness dwindled presently
Benign Text has Low MEL • Contains characters that correspond to invalid instructions • Privileged Instruction (I/O) • Arbitrary Segment Selector • More Memory-accessing instructions – may use uninitialized registers • Long sequence of contiguous valid instructions unlikely low MEL
Proposed Solution • Find out the maximum length of valid instruction sequence • If it is long enough, the stream contains a malware • Question: • How long is “long”?
Probabilistic Analysis • Toss a coin n times • What is the probability that the max distance between two consecutive heads is ? Head (H) Invalid Instruction (I) Tail (T) Valid Instruction (v) THTTHTTTTTHTTT VIVVIVVVVVIVVV
Probabilistic Analysis n = number of coin tosses p = probability of a head Xi= R.V.s for inter-head distances Xmax= Max inter-head distance C.D.F of Xmax= Prob [Xmax≤x] = [1 – p(1-p)x]n F.P. rate = 1 - Prob [Xmax≤τ] = 1 - [1 – p(1-p)τ]n
Probabilistic Analysis For a fixed N = k (exactly k invalid instructions)
Probabilistic Analysis For all possible values of N:
Threshold Calculation n,p ,(false positive rate) Known (max inter-head distance) Unknown Threshold
Independence Assumption • Validity of an instruction is an independent event • All the Xi’s are independent (while Xi = n)
Threshold Calculation With increasing n, we must choose a larger to keep the same rate of false positive
Threshold Calculation With decreasing p, we must choose a larger to keep the same rate of false positive
Determinen E[I] = E[Prefix chain length] + E[core instruction length] Obtained from character frequency of input data
Determinep • Privileged instructions • Wrong Segment Prefix Selector • Un-initialized memory access Invalid Instructions Only 1. and 2. can be determined on a standalone basis
Experimental Setup • Benign data setup • ASCII stream captured from live CISE network using Ethereal • Malicious data setup • Existing framework used to generate ASCII worm by converting binary worms • Promising experimental results for max valid instruction length • Benign: all max values all below threshold • Malicious: values significantly higher than
Contrasting with APE • Full content examination • Threshold calculation • Sled Vs. malware • Exploiting text-specific properties
Multilevel Encryption Encryption binary ASCII ASCII Only Visible decrypter Decryption ASCII ASCII binary
Multilevel Encryption Text 0x20 – 0x3F Binary Binary Text 0x40 – 0x5F Text 0x60 – 0x7E