1 / 34

PHMMs for Metamorphic Detection

PHMMs for Metamorphic Detection. Mark Stamp. Viruses. Viruses and worms --- types of malware Various definitions are used For our purposes, “virus” used generically How to detect malware? Signature detection used most often

opa
Download Presentation

PHMMs for Metamorphic Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PHMMs for Metamorphic Detection Mark Stamp PHMMs for Metamorphic Detection

  2. Viruses • Viruses and worms --- types of malware • Various definitions are used • For our purposes, “virus” used generically • How to detect malware? • Signature detection used most often • In simplest form, search for a string of bits found in the malware • Could also include wildcards, heuristics, etc. PHMMs for Metamorphic Detection

  3. Metamorphic Viruses • Metamorphic viruses change “shape” • For each instance, internal structure changes • But function stays the same • If the change is sufficient, signature detection fails • In principle, metamorphic malware among most difficult to detect • But, not too many have been seen in the wild • Why not??? PHMMs for Metamorphic Detection

  4. Metamorphic Detection • How to detect metamorphic malware? • Previous research: HMMs are effective • Train model on opcodes extracted from metamorphic “family” viruses • Determine a threshold score • Then, to score an unknown exe, extract opcodes and score against the model PHMMs for Metamorphic Detection

  5. Profile HMM • Standard HMM does not take positional information into account • Profile HMM analogous to defining HMM at each position in a sequence • Position info is taken into account • So, PHMM uses more information • This might yield stronger models PHMMs for Metamorphic Detection

  6. PHMMs • Will PHMM outperform HMM? • Possible advantage of PHMM • Uses more information… • …since position within sequence is taken into account • Possible disadvantages of PHMM • More complex, more costly to compute • Might overfit the data • “More” is not always “better” PHMMs for Metamorphic Detection

  7. The Plan • Extract opcodes from metamorphic family viruses • Pairwise align opcode sequences • Generate multiple sequence alignment (MSA) from pairwise alignments • Generate PHMM from MSA • Determine threshold, error rates PHMMs for Metamorphic Detection

  8. Metamorphic Techniques • Morphing usually applied at asm level • Many techniques can be used, such as… • Equivalent code substitution • Register swap • Different code, same function • Garbage code/dead code insertion • Code reordering • Subroutine reordering • Arbitrary reordering using jumps PHMMs for Metamorphic Detection

  9. Metamorphic Techniques • Opaque predicates • “Conditional” that isn’t • By combining several techniques, can get achieve desired effect • Metamorphism sufficient to break signature detection • Function of code remains unchanged PHMMs for Metamorphic Detection

  10. Metamorphic Example • Original code • Morphed version 2 • Morphed version 1 PHMMs for Metamorphic Detection

  11. Metamorphic Viruses • Real-world metamorphic viruses PHMMs for Metamorphic Detection

  12. Virus Construction Kits • Construction kits --- anyone can easily build (metamorphic) malware • First 2 are not very metamorphic • But, NGVCK is highly metamorphic • So, we consider NGVCK here PHMMs for Metamorphic Detection

  13. AV Techniques • Signature detection is most popular • So, of course, virus writers want to evade signature detection • Metamorphism can provide strong defense against signature detection PHMMs for Metamorphic Detection

  14. HMMs • See previous presentation PHMMs for Metamorphic Detection

  15. PHMMs • See previous presentation PHMMs for Metamorphic Detection

  16. PHMMs • PHMMs are designed to deal with biological sequences • Goal is to find evidence that sequences related by mutation and selection • Basic processes usually considered are • Substitution --- subsequence replaced • Insertion --- subsequence inserted • Deletion --- subsequence removed PHMMs for Metamorphic Detection

  17. PHMMs and Computer Viruses • The same basic processes can occur in metamorphic viruses • That is, substitution, insertion, deletion • But also have to deal with • Permutation --- re-ordering of sequence • Metamorphics may do lots of permuting • Permutation can be viewed as series of insertions/deletions • But “close” sequences might be “far” apart PHMMs for Metamorphic Detection

  18. Permutation and Alignment • Permutations are problematic… • How to deal with this? • Maybe we can pre-process sequences • But, adds complexity and cost • More about this later PHMMs for Metamorphic Detection

  19. Test Data • Virus construction kits from VX Heavens • We generated the following viruses • 10 VCL32 viruses • 30 MS-MPC viruses • 200 NGVCK viruses • Also, 40 cygwin utilities • These serve as “normal” files PHMMs for Metamorphic Detection

  20. NGVCK Pairwise Alignment • Align two NGVCK opcode sequences • This looks reasonable PHMMs for Metamorphic Detection

  21. Gap Percentages • Recall, with PHMM, the more gaps, the weaker the model • MSAs for metamorphic viruses • But, VCL32 based on 5 files, PS-MPC based on 10, NGVCK based on 20 files PHMMs for Metamorphic Detection

  22. VCL32 • Using five VCL32 viruses… • Generate pairwise alignments • Generate MSA • Then generate PHMM • PHMM has 1820 states • Can’t show the whole model here • So, next slides give 3 states, 126,127,128 PHMMs for Metamorphic Detection

  23. VCL32 Transition Probabilities • State transition probabilities • The A matrix for states 126,127,128 PHMMs for Metamorphic Detection

  24. VCL32 Emission Probabilities • Emission probabilities • The E matrix • States 126,127,128 • Emissions only for match, insert states • “Add-one” rule was used here PHMMs for Metamorphic Detection

  25. Results • Typical PHMM results for VCL32 • Can set threshold for 100% detection • It doesn’t get any better than that! PHMMs for Metamorphic Detection

  26. Results • Typical MS-MPC results using PHMM • Again, perfect detection PHMMs for Metamorphic Detection

  27. Results • But, VCL32 and MS-MPC are easy cases • Not very metamorphic • Probably detectable using signatures • In contrast, NGVCK highly metamorphic • So, NGVCK is the important test • See next slides PHMMs for Metamorphic Detection

  28. Results • Typical results for NGVCK • Note that normal files score higher than NGVCK! • This is bad! PHMMs for Metamorphic Detection

  29. Pre-Processing • For NGVCK, is there any hope? • Can try pre-processing • Goal is to undo some of the effect of permutation • Able to reduce gap percentage in MSA • Before, gap percentage was 88.3% • After, gap percentage is 44.9% • Big improvement, but is it big enough? PHMMs for Metamorphic Detection

  30. Results • NGVCK with pre-processing • Much better, but not good enough • Error rate is still significant PHMMs for Metamorphic Detection

  31. Conclusions • HMMs developed in 1960s • Standard machine learning technique • Many applications • PHMMs relatively recent • Developed for biological applications • Here, a novel application of PHMMs • 100% detection for some examples… • …poor detection for others PHMMs for Metamorphic Detection

  32. Possible Improvements • Improved pre-processing • To better account for permutation • Local alignment • For example, align subroutines • Baum-Welch re-estimation of PHMM obtained from MSA • Other??? PHMMs for Metamorphic Detection

  33. Last Word • Very trendy to apply biological analogies to information security • On the one hand… • Results here provide evidence supporting trend of looking to biological analogies • On the other hand… • Results here are “cautionary tale against applying biological analogies too literally” PHMMs for Metamorphic Detection

  34. References • Profile hidden Markov models for metamorphic virus detection, S. Attaluri, S. McGhee and M. Stamp, Journal in Computer Virology, Vol. 5, No. 2, May 2009, pp. 151-169 • Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin, et al PHMMs for Metamorphic Detection

More Related