1 / 32

Background

Background. Current Approach to Malware Identification Signature Generation One or more strings of machine code unique to the malicious program. Produced by human examination or possibly automation. Scanning for a Signature Known Offset, Known Order Any Offset, Known Order

julie
Download Presentation

Background

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Background • Current Approach to Malware Identification • Signature Generation • One or more strings of machine code unique to the malicious program. • Produced by human examination or possibly automation. • Scanning for a Signature • Known Offset, Known Order • Any Offset, Known Order • Any Offset, Any Order

  2. BufferOverflow • Idea introduced in, “Smashing the stack for fun and profit” • Most common method of attack • IDS apply pattern matching to defend against buffer overflow attacks

  3. Polymorphic Viruses • Dark Avenger introduced polymorphism in 1992 • Method against pattern matching • Cipher the code and generate a decipher routine which is different each time.

  4. push byte 0x68 push dword 0x7361622f push dword 0x6e69622f mov ebx,esp xor edx,edx push edx push ebx mov ecx,esp push byte 11 pop eax int 80h "\x6A\x68\x68\x2F\x62\x61\x73\x68\x2F\x62\x69\x6E\x89\xE3\x31\xD2\x52\x53\x89\xE1\x6A\x0B\x58\xCD\x80”

  5. [PPPPPPPPPPPPPPPP] – Plain Text • [KKKKKKKKKKKKKK] – Cipher Key • [DDDKKKKKKKKKKKKKK] – Encrypted Code • If Decipher routine [DDD] does not change much, • signatures can be created. • Generate a different decipher routine each time with a • different key • Do not use simple XOR alone. ADD/SUB/ROL/ROR can • be used • Use “Dead Code” between decipher code. Use fake • registers

  6. NOP ShellCode Cram Bytes Return Address CLET – A Polymorphic Shell Code • NIDS tries to find consecutive NOP and apply pattern matching on ShellCode. Usually a combination of NOP, Return Addresses can be identified • GOALS of CLET • Generate fake NOPs • Cipher Shellcode (use random methods more than only XOR), • and use a randomly generated decipher routine • Avoid a big return address zone to prevent against data mining • methods. NOP Decipher Routine ShellCode Cram Bytes Return Address

  7. Fake NOPs with 2,3 byte instructions • NOPs are necessary before the shellcode, since we don’t know where our JMP will end up. • For a “NOP sledge”, anywhere within the sledge is fine.(1 byte instructions) • We can replace NOP’s be other “non-dangerous” instructions. • Problem: There are not too many one byte non-dangerous instructions. (Advantage to NIDS) • Many one byte instructions are also alphanumeric • How do we generate random fake-NOPs using other instructions which are several-bytes long ?

  8. We could generate two-byte instructions, the second byte of which is a one-byte instruction or the first byte of a two-byte instruction. • Consider “\x15\x11\xF8\xFA\x81\xF9\x27\x2F\x90\x9E” ADC %eax,%edx CMP %ecx,$0x272F909E ADC $0x11F8FA81 STC DAA DAS NOP SAHF

  9. ./clet -n nnop : generate nnop NOP. -a : use american english dictonnary to generate NOP. -c : print C form of the buffer. -i nint : decryption routine has nint instructions (default is 5) -f file : spectrum file used to polymorph. -b ncra : generate ncra cramming bytes using spectrum or not -B : cramming bytes zone is adapted to beginning -t : number of bytes generated is a multiple of 4 -x XXXX : XXXX is the address for the address zone FE011EC9 for instance -z nadd : generate address zone of nadd*4 bytes -e : execute shellcode. -d : dump shellcode to stdout. -s : spectrum analysis. -S file : load shellcode from file. -E [1-3]: load an embeded shellcode. -h : display this message.

  10. Metamorphic viruses • Do not encrypt themselves • Contain a Mutation Engine • They completely rewrite themselves at random, keeping functionality the same but appearing different. • Involves incredible skill to create this kind of virus. Eg: W32.Simile

  11. A: A_Instr1 JMP Z Y: B_Instr1 B_Instr2 JMP X Z: A_Instr2 JMP Y C: C_Instr1 C_Instr2 X: NOP NOP JMP C • Example • Code flow A: A_Instr1 A_Instr2 B: B_Instr1 B_Instr2 C: C_Instr1 C_Instr2 Mutate Code

  12. Example: • Insertion of Dead Code • Code Transposition add r1, r2, r3 mov r4, r1 add r5, r6, r7 add r1, r2, r3 mov r4, r1 add r5, r6, r7

  13. Example: • Insertion of Dead Code • Code Transposition add r1, r2, r3 nop push r1 pop r1 mov r4, r1 add r5, r6, r7 add r1, r2, r3 mov r4, r1 add r5, r6, r7

  14. Example: • Insertion of Dead Code • Code Transposition add r1, r2, r3 nop push r1 add r5, r6, r7 pop r1 mov r4, r1 add r1, r2, r3 mov r4, r1 add r5, r6, r7

  15. Biological Immune Systems • Human body under constant siege by pathogen which replicate • Bacteria • Parasites • Viruses • Fungi • Homeostasis: a stable state of equilibrium • The immune system (IS) helps maintain homeostasis • Different pathogens eliminated in different ways, hence the IS must pick the correct “effectors”

  16. IS  Immune System Mapping • IS detects abuses of security policy • Responds by counter attacking the source of abuse • Policy specified by “Natural Selection” and emphasizes the following • Availability - enables body to continue functioning • Correctness - prevents the IS from attacking itself – AutoImmune Disorder • Integrity - ensures that genes that encode for correctness of functions is not corrupted • Accountability - finding and destroying pathogens responsible for illness

  17. Principles of an Artificial Immune System • Computers for an analogy to computer systems • Some principles • Distributed protection – millions of local interactions • Diversity – every individual has a different immune system • Robustness – lack of hierarchy • Adaptability – IS adapts (learns) pathogenic structure • Memory – of previously encountered pathogens • Implicit policy specification – Self / nonSelf. It knows its behavior • Flexibility – more resources used when infection detected • Scalable – being distributed, communication is localized

  18. Architecture of an Immune System • Multilayered protection • Skin – anatomic barrier • Physiological – pH, temperature provide inappropriate living conditions • Innate Immune system – consists of endocyte and phagocyte systems which involve motile scavenger cells such as macrophages that ingest extracellular molecules • Adaptive Immune system – • Learns specific kind of pathogens and retains memory for faster responses the next time • Previously unknown pathogens generate “learning” • Primary response and Secondary Reponse

  19. Adaptive Immune System • Consists of white blood cells called Lymphocytes • Lymphocytes are Mobile independent detectors which cooperate and circulate via the lymph system • Millions of cells (detectors) making simple localized interactions

  20. Recognition • Receptors – surface of immune cells • Epitope – location on surface of pathogens (protein fragements – peptides) • Detection – chemical bond established between receptor and epitope

  21. Monospecificity – all receptors on the lymphocytes are • identical • Pathogens often have multiple different epitopes • High affinity binding causing the lymphocyte to cross a • certain threshold activates the lymphocyte

  22. Adaptation • B-Cells (Generated and trained in Bone Marrow) • T- Cells (Generated and trained in Thymus) • Negative Selection (Clonal Detetion) • Cells are trained in the bone and the thymus where they are placed with “Self” • Receptor surfaces are formed at random and over generations are taught to NOT identify self • If immature cells are activated by binding to self they will be eliminated. • Mature cells will tolerate most self epitopes and are said to have undergone Central Tolerization • On maturation these cells are released into the lymph system where they attempt to form affinity bonds with Non-Self by competing with each other • If a certain cells wins this affinity race, copies of it are made which are subject to very high mutations called Somatic Hypermutations

  23. Goal • Evaluate a method of improving the identification of polymorphic malware using Genetic Algorithms (GAs). • Sig1(P) = {substring1,substring2,..substringM}

  24. Our Solution: The Classifier File 11001010001001010100101010101001010001011101101010010101001011010010100010010100110101000111101010101001010101010101010101010011101001000000101010101010101010101011100101010010101010101010011111011100010101010101010101010 110010101010 Classifier Is it of the target class?

  25. Our Solution: The Classifier Classifier Signature of File Fragments: File: 1011010010110100… 11001011010010110101101010100101011101001100101101001011010010010010100101110101101011010100101001001010100011110101011111010110101101011011000100111… 1001011010010110… 0010110100101101… 1101011010110101… return number_found > threshold

  26. Our Solution: The GA • Representation

  27. Our Solution: The GA • Operators • Common bit-wise mutation and uniform crossover. • Fitness Function • Accuracy on a Training Dataset • Training Dataset Should • Consist of examples of both classes. (Equal number?) • Be diverse. • Accurately represent reality.

  28. Experiments: Solution has Merit? • Target Class • Programs produced by CLET. • CLET is a polymorphic engine that creates shellcode buffer overflow malware. • Training Dataset • 20 CLET shellcodes, 20 non-CLET shellcodes • Validation of our Solution • Testing Dataset • 180 CLET shellcodes, 13 non-CLET shellcodes

  29. Experiments: Solution has Merit? • Procedure • Search for GA Parameters • All the normal parameters. • Plus: • Number of file fragments in a signature, • Number of bytes in a file fragment. • Results • Relative insensitivity to normal GA parameters. • Number of file fragments and bytes per file seem to be the most important • Led to three final experiments…

  30. Results • Testing • One best classifier of each type.

  31. Conclusions • Extensions • Verify results on larger datasets. • Evolve signature length. • Number of file fragments. • Number of bytes per file fragment. • Extend to other areas • Other types of polymorphic / metamorphic malware. • Parallelize the search process. • Non-polymorphic / metamorphic malware

More Related