Detection of ASCII Malware

Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen

Internet Worm and Malware • Huge damage potential • Infects hundreds of thousands of computers • Costs millions of dollars in damage • Melissa, ILOVEYOU, Code Red, Nimda, Slammer, SoBig, MyDoom • Mostly uses Buffer Overflow • Propagation is automatic (mostly)

Recent Trends • Shift in hacker’s mindset • Malware becoming increasingly evasive and obfuscative • Emergence of Zero-day worms • Arrival of Script Kiddies

Motivation for ASCII Attacks • Prevalence of servers expecting text-only input • Text-based protocols • Presumption of text being benign • Deployment of ASCII filter for bypassing text

IDS Detecting ASCII Attack? • Disassembly-based IDS • All jump instructions are ASCII • Higher proportion of branches • Exponential disassembly cost • High processing overhead for IDS • Frequency-based IDS • PAYL evaded by ASCII worm

Buffer Overflow

Constraints of ASCII Malware • Opcode Unavailability • Shellcode requires binary opcodes • Here only xor, and, sub, cmp etc. • Must generate opcodes dynamically • Difficulty in Encryption • No backward jump • Can’t use same decrypter routine for each encrypted block • No one-to-one correspondence between ASCII and binary ASCII binary

Creation of ASCII Malware

Buffer Overflow using ASCII Overflowing a buffer using an ASCII string:

Detection of ASCII Malware • Opcode Unavailability • Dynamic generation of opcodes needs more ASCII instructions for each binary instruction • Difficulty in Encryption • No backward jump means decrypter block for each encrypted block must be hardcoded • Long sequence of contiguous valid instructions likely  high MEL What is this MEL?

Maximum Executable Length • Indicates maximum length of an execution path • Need to disassemble (and execute) from all possible entry points • All branching must be considered • Abstract payload execution • Used for binary worms with sled • Effectiveness dwindled presently

Benign Text has Low MEL • Contains characters that correspond to invalid instructions • Privileged Instruction (I/O) • Arbitrary Segment Selector • More Memory-accessing instructions – may use uninitialized registers • Long sequence of contiguous valid instructions unlikely  low MEL

Proposed Solution • Find out the maximum length of valid instruction sequence • If it is long enough, the stream contains a malware • Question: • How long is “long”?

Probabilistic Analysis • Toss a coin n times • What is the probability that the max distance between two consecutive heads is ? Head (H) Invalid Instruction (I) Tail (T) Valid Instruction (v) THTTHTTTTTHTTT VIVVIVVVVVIVVV

Probabilistic Analysis n = number of coin tosses p = probability of a head Xi= R.V.s for inter-head distances Xmax= Max inter-head distance C.D.F of Xmax= Prob [Xmax≤x] = [1 – p(1-p)x]n F.P. rate = 1 - Prob [Xmax≤τ] = 1 - [1 – p(1-p)τ]n

Probabilistic Analysis For a fixed N = k (exactly k invalid instructions)

Probabilistic Analysis For all possible values of N:

Threshold Calculation n,p ,(false positive rate) Known (max inter-head distance) Unknown Threshold

Independence Assumption • Validity of an instruction is an independent event • All the Xi’s are independent (while  Xi = n)

Threshold Calculation With increasing n, we must choose a larger  to keep the same rate of false positive 

Threshold Calculation With decreasing p, we must choose a larger  to keep the same rate of false positive 

Determinen E[I] = E[Prefix chain length] + E[core instruction length] Obtained from character frequency of input data

Determinep • Privileged instructions • Wrong Segment Prefix Selector • Un-initialized memory access Invalid Instructions Only 1. and 2. can be determined on a standalone basis

Experimental Setup

Implementation

Experimental Setup • Benign data setup • ASCII stream captured from live CISE network using Ethereal • Malicious data setup • Existing framework used to generate ASCII worm by converting binary worms • Promising experimental results for max valid instruction length • Benign: all max values all below threshold  • Malicious: values significantly higher than 

Experimental Results (DAWN)

Experimental Results (APE-L)

Contrasting with APE • Full content examination • Threshold calculation • Sled Vs. malware • Exploiting text-specific properties

Multilevel Encryption Encryption binary ASCII ASCII Only Visible decrypter Decryption ASCII ASCII binary

Multilevel Encryption Text 0x20 – 0x3F  Binary Binary     Text 0x40 – 0x5F Text 0x60 – 0x7E 

Questions

Thank you

Detection of ASCII Malware

Detection of ASCII Malware

Presentation Transcript

Windows Malware: Detection And Removal

Network-level Malware Detection

Data Mining for Malware Detection

ASCII ART

Polymorphic Malware Detection

Analyzing Malware Detection Effectiveness with Multiple Anti-Malware Programs

Analyzing Malware Detection Efficiency with Multiple Anti-Malware Programs

Malware Detection

IMDS: Intelligent Malware Detection System

Data Mining for Malware Detection

Behavior-Based Malware Detection

Efficient Detection of Split Personalities in Malware

Malware Classification And Detection

Malware detection with OSSEC

Graph Techniques for Malware Detection

Data Mining for Malware Detection

Malware Detection in Android Applications