340 likes | 467 Views
PHMMs for Metamorphic Detection. Mark Stamp. Viruses. Viruses and worms --- types of malware Various definitions are used For our purposes, “virus” used generically How to detect malware? Signature detection used most often
E N D
PHMMs for Metamorphic Detection Mark Stamp PHMMs for Metamorphic Detection
Viruses • Viruses and worms --- types of malware • Various definitions are used • For our purposes, “virus” used generically • How to detect malware? • Signature detection used most often • In simplest form, search for a string of bits found in the malware • Could also include wildcards, heuristics, etc. PHMMs for Metamorphic Detection
Metamorphic Viruses • Metamorphic viruses change “shape” • For each instance, internal structure changes • But function stays the same • If the change is sufficient, signature detection fails • In principle, metamorphic malware among most difficult to detect • But, not too many have been seen in the wild • Why not??? PHMMs for Metamorphic Detection
Metamorphic Detection • How to detect metamorphic malware? • Previous research: HMMs are effective • Train model on opcodes extracted from metamorphic “family” viruses • Determine a threshold score • Then, to score an unknown exe, extract opcodes and score against the model PHMMs for Metamorphic Detection
Profile HMM • Standard HMM does not take positional information into account • Profile HMM analogous to defining HMM at each position in a sequence • Position info is taken into account • So, PHMM uses more information • This might yield stronger models PHMMs for Metamorphic Detection
PHMMs • Will PHMM outperform HMM? • Possible advantage of PHMM • Uses more information… • …since position within sequence is taken into account • Possible disadvantages of PHMM • More complex, more costly to compute • Might overfit the data • “More” is not always “better” PHMMs for Metamorphic Detection
The Plan • Extract opcodes from metamorphic family viruses • Pairwise align opcode sequences • Generate multiple sequence alignment (MSA) from pairwise alignments • Generate PHMM from MSA • Determine threshold, error rates PHMMs for Metamorphic Detection
Metamorphic Techniques • Morphing usually applied at asm level • Many techniques can be used, such as… • Equivalent code substitution • Register swap • Different code, same function • Garbage code/dead code insertion • Code reordering • Subroutine reordering • Arbitrary reordering using jumps PHMMs for Metamorphic Detection
Metamorphic Techniques • Opaque predicates • “Conditional” that isn’t • By combining several techniques, can get achieve desired effect • Metamorphism sufficient to break signature detection • Function of code remains unchanged PHMMs for Metamorphic Detection
Metamorphic Example • Original code • Morphed version 2 • Morphed version 1 PHMMs for Metamorphic Detection
Metamorphic Viruses • Real-world metamorphic viruses PHMMs for Metamorphic Detection
Virus Construction Kits • Construction kits --- anyone can easily build (metamorphic) malware • First 2 are not very metamorphic • But, NGVCK is highly metamorphic • So, we consider NGVCK here PHMMs for Metamorphic Detection
AV Techniques • Signature detection is most popular • So, of course, virus writers want to evade signature detection • Metamorphism can provide strong defense against signature detection PHMMs for Metamorphic Detection
HMMs • See previous presentation PHMMs for Metamorphic Detection
PHMMs • See previous presentation PHMMs for Metamorphic Detection
PHMMs • PHMMs are designed to deal with biological sequences • Goal is to find evidence that sequences related by mutation and selection • Basic processes usually considered are • Substitution --- subsequence replaced • Insertion --- subsequence inserted • Deletion --- subsequence removed PHMMs for Metamorphic Detection
PHMMs and Computer Viruses • The same basic processes can occur in metamorphic viruses • That is, substitution, insertion, deletion • But also have to deal with • Permutation --- re-ordering of sequence • Metamorphics may do lots of permuting • Permutation can be viewed as series of insertions/deletions • But “close” sequences might be “far” apart PHMMs for Metamorphic Detection
Permutation and Alignment • Permutations are problematic… • How to deal with this? • Maybe we can pre-process sequences • But, adds complexity and cost • More about this later PHMMs for Metamorphic Detection
Test Data • Virus construction kits from VX Heavens • We generated the following viruses • 10 VCL32 viruses • 30 MS-MPC viruses • 200 NGVCK viruses • Also, 40 cygwin utilities • These serve as “normal” files PHMMs for Metamorphic Detection
NGVCK Pairwise Alignment • Align two NGVCK opcode sequences • This looks reasonable PHMMs for Metamorphic Detection
Gap Percentages • Recall, with PHMM, the more gaps, the weaker the model • MSAs for metamorphic viruses • But, VCL32 based on 5 files, PS-MPC based on 10, NGVCK based on 20 files PHMMs for Metamorphic Detection
VCL32 • Using five VCL32 viruses… • Generate pairwise alignments • Generate MSA • Then generate PHMM • PHMM has 1820 states • Can’t show the whole model here • So, next slides give 3 states, 126,127,128 PHMMs for Metamorphic Detection
VCL32 Transition Probabilities • State transition probabilities • The A matrix for states 126,127,128 PHMMs for Metamorphic Detection
VCL32 Emission Probabilities • Emission probabilities • The E matrix • States 126,127,128 • Emissions only for match, insert states • “Add-one” rule was used here PHMMs for Metamorphic Detection
Results • Typical PHMM results for VCL32 • Can set threshold for 100% detection • It doesn’t get any better than that! PHMMs for Metamorphic Detection
Results • Typical MS-MPC results using PHMM • Again, perfect detection PHMMs for Metamorphic Detection
Results • But, VCL32 and MS-MPC are easy cases • Not very metamorphic • Probably detectable using signatures • In contrast, NGVCK highly metamorphic • So, NGVCK is the important test • See next slides PHMMs for Metamorphic Detection
Results • Typical results for NGVCK • Note that normal files score higher than NGVCK! • This is bad! PHMMs for Metamorphic Detection
Pre-Processing • For NGVCK, is there any hope? • Can try pre-processing • Goal is to undo some of the effect of permutation • Able to reduce gap percentage in MSA • Before, gap percentage was 88.3% • After, gap percentage is 44.9% • Big improvement, but is it big enough? PHMMs for Metamorphic Detection
Results • NGVCK with pre-processing • Much better, but not good enough • Error rate is still significant PHMMs for Metamorphic Detection
Conclusions • HMMs developed in 1960s • Standard machine learning technique • Many applications • PHMMs relatively recent • Developed for biological applications • Here, a novel application of PHMMs • 100% detection for some examples… • …poor detection for others PHMMs for Metamorphic Detection
Possible Improvements • Improved pre-processing • To better account for permutation • Local alignment • For example, align subroutines • Baum-Welch re-estimation of PHMM obtained from MSA • Other??? PHMMs for Metamorphic Detection
Last Word • Very trendy to apply biological analogies to information security • On the one hand… • Results here provide evidence supporting trend of looking to biological analogies • On the other hand… • Results here are “cautionary tale against applying biological analogies too literally” PHMMs for Metamorphic Detection
References • Profile hidden Markov models for metamorphic virus detection, S. Attaluri, S. McGhee and M. Stamp, Journal in Computer Virology, Vol. 5, No. 2, May 2009, pp. 151-169 • Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin, et al PHMMs for Metamorphic Detection