1 / 34

TAintscope

TAintscope. A Checksum-Aware Directed fuzzing Tool for Automatic Software Vulnerability Detection. Tielei Wang 1 , Tao Wei 1 , Guofei Gu 2 , Wei Zou 1 1 Peking University, China 2 Texas A&M University, US. 2. terms.

shika
Download Presentation

TAintscope

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TAintscope A Checksum-Aware Directed fuzzing Tool for Automatic Software Vulnerability Detection Tielei Wang1, Tao Wei1, Guofei Gu2, Wei Zou1 1Peking University, China 2Texas A&M University, US

  2. 2 terms • Checksum – a way to check the integrity of data. Used in network protocols and files. • Fuzzing – generating malformed inputs and feeding them to the application. • Dynamic Taint Analysis – runs a program and observes which computations are affected by predefined taint sources (e.g. input) data Checksum function data Checksum field

  3. 3 The problem • The input mutation space is enormous . • Most malformed inputs dropped at an early stage, if the program employs a checksum mechanism.

  4. 4 The problem 1 void decode_image(FILE* fd){ 2 ... 3 int length = get_length(fd); 4 intrecomputed_chksum = checksum(fd, length); 5 intchksum_in_file = get_checksum(fd); //line 6 is used to check the integrity of inputs 6 if(chksum_in_file != recomputed_chksum) 7 error(); 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); 12 ... 13 for(i=0; i<Height; i++){// read ith row to p 14 read_row(p+Width*i, i, fd);

  5. 5 The IDEA • To infer whether/where a program checks the integrity of input. • Identify which input bytes can flow into sensitivepoints: Taint analysis at byte level – monitors how application uses the input data. • Create malformed input focusing the “hot bytes”. • Repair checksum fields in input, to expose vulnerability. • Fully automatic • Found 27 new vulnerability – acrobat reader, googlepicasa and more.

  6. 6 How does it work? • Dynamic taint tracing • Detecting checksum • Directed fuzzing • Repairing crashed samples

  7. 7 How does it work? Modified Program Crashed Samples Checksum Locator Directed Fuzzer Checksum Repairer Instruction Profile Hot Bytes Info Reports Execution Monitor

  8. 8 How does it work? • Dynamic taint tracing • Runs the program with well-formed input. • Execution monitor records: • Which input bytes related to arguments of API functions (e.g. malloc, strcpy) – “hot bytes” report. • Which bytes each conditional jump instruction depends on (e.g. JZ, JE, JB) – checksum report. • Considering only data flow (no control flow).

  9. 9 How does it work? • Dynamic taint tracing • Instruments instructions – movement (e.g. MOV, PUSH), arithmetic (e.g. SUB, ADD), logic (e.g. AND, XOR) • Taints all values written by an instruction with union of all taint labels associated with values used by that instruction. • Considering also eflags register. eax {0x6, 0x7}, ebx {0x8, 0x9} add eax, ebx eax {0x6, 0x7, 0x8, 0x9}, eflags {0x6, 0x7, 0x8, 0x9}

  10. 10 How does it work? • Dynamic taint tracing - EXAMPLE Input size is 1024 bytes “hot bytes” report: 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); … 0x8048d5b: invoking malloc: [0x8,0xf] …

  11. 11 How does it work? • Dynamic taint tracing - EXAMPLE Input size is 1024 bytes checksum report: 6 if(chksum_in_file != recomputed_chksum) 7 error(); … 0x8048d4f: JZ: 1024: [0x0,0x3ff] …

  12. 12 How does it work? 2. Detectingchecksum Checksum detector: • identify potential checksum check points • the recomputed checksum value depends on many input bytes • Instruments conditional jump. Before execution, checks whether the number of marks associated with eflags register exceeds a threshold. • Problem with decompressed bytes.

  13. 13 How does it work? 2. Detectingchecksum Refinement: • Well-formedinputs can pass the checksum test, • but most malformed inputs cannot

  14. 14 How does it work? 2. Detectingchecksum Refinement: • Well-formedinputs can pass the checksum test, • but most malformed inputs cannot • Run well-formed inputs, identify the always-taken and always-not-taken instructions.

  15. 15 How does it work? 2. Detectingchecksum Refinement: • Well-formedinputs can pass the checksum test, • but most malformed inputs cannot • Run well-formed inputs, identify the always-taken and always-not-taken instructions. • Run malformed inputs, also identify the always-taken and always-not-taken instructions.

  16. 16 How does it work? 2. Detectingchecksum Refinement: • Well-formedinputs can pass the checksum test, • but most malformed inputs cannot • Run well-formed inputs, identify the always-taken and always-not-taken instructions. • Run malformed inputs, also identify the always-taken and always-not-taken instructions. • Identify the conditional jump instructions that behaves completely different when processing well-formed and malformed inputs.

  17. 17 How does it work? 2. Detectingchecksum Checksum detector: • Creates bypass rules – always-taken, always-not-taken 6 if(chksum_in_file != recomputed_chksum) 7 error(); … 0x8048d4f: JZ: 1024: [0x0,0x3ff] … 0x8048d4f: JZ: always-taken

  18. 18 How does it work? 2. Detectingchecksum Checksum detector: • Checksum field identification Input bytes that affects chksum_in_file are the checksum field. 6 if(chksum_in_file != recomputed_chksum) 7 error();

  19. 19 How does it work? 3. Directed fuzzing • Generates malformed test cases – feeds them to the original or instrumented program. • According to the bypass rules, alters the execution traces at check points – sets the eflags register.

  20. 20 How does it work? 3. Directed fuzzing • All malformed test cases are constructed based on the “hot bytes” information • Using attack heuristics: bytes that influence memory allocation are set to small, large or negative. bytes that flow into string functions are replaced by characters such as %n, %p. • Output – test cases that could cause to crash or consume 100% CPU.

  21. 21 How does it work? 3. Directed fuzzing 6 if(chksum_in_file != recomputed_chksum) 7 error(); 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); … 0x8048d4f: JZ: 1024: [0x0,0x3ff] … … 0x8048d5b: invoking malloc: [0x8,0xf] … Checksum report “hot bytes” report Bypass info 0x8048d4f: JZ: always-taken

  22. 22 How does it work? 3. Directed fuzzing 6 if(chksum_in_file != recomputed_chksum) 7 error(); 8 int Width = get_width(input_file); 9 int Height = get_height(input_file); 10 int size = Width*Height*sizeof(int); 11 int* p = malloc(size); Before executing 0x8048d4f, the fuzzer sets the flag ZF in eflags to an opposite value … 0x8048d4f: JZ: 1024: [0x0,0x3ff] … … 0x8048d5b: invoking malloc: [0x8,0xf] … Checksum report “hot bytes” report Bypass info 0x8048d4f: JZ: always-taken

  23. 23 How does it work? 4. Repairing crashed samples • Fixing is expensive - fixes checksum fields only in test cases that caused crashing. • How? Cr – row data in the checksum field D – input data protected by checksum filed Checksum() – the complete checksum algorithm T – transformation We want to pass the constraint: Checksum(D) == T(Cr)

  24. 24 How does it work? 4. Repairing crashed samples Using symbolic execution to solve: Checksum(D) is a runtime determinable constant: Only Cr is a symbolic value. • Common transformations (e.g. converting from hex/oct to decimal), can be solved by existing solvers (STP). Checksum(D) == T(Cr) c== T(Cr)

  25. 25 How does it work? 4. Repairing crashed samples If the new test case cause the original program to crash, a potential vulnerability is detected!

  26. 26 evaluation An incomplete list of applications:

  27. 27 evaluation “hot bytes” identification results – memory allocation

  28. 28 evaluation Checksum identification results: Threshold = 16

  29. 29 evaluation Correct checksum fields:

  30. 30 evaluation 27 previous unknown Vulnerabilities: MS Paint Google Picasa Adobe Acrobat ImageMagick irfanview gstreamer Winamp XEmacs wxWidgets PDFlib Amaya dillo

  31. 31 evaluation Vulnerabilities detected by TaintScope:

  32. 32 Discussion • TaintScope cannot deal with secure integrity check schemes (e.g. cryptographic hash algorithms, digital signature) – impossible to generate valid test cases. • Limited effectiveness when all input data are encrypted (tracking decrypted data). • Checksum check points identification can be affected by the quality of inputs. • Not tracks control flow propagation. • Not all instructions of x86 are instrumented by the execution monitor.

  33. 33 Conclusion TaintScope can perform: • Directed fuzzing • Identify which bytes flow into system/library calls. • dramatically reduce the mutation space. • Checksum-aware fuzzing • Disable checksum checks by control flow alternation. • Generate correct checksum fields in invalid inputs.

  34. 34 questions

More Related