290 likes | 477 Views
The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08). Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang. Purdue University November 11 th , 2008. Motivation -- Most software takes structural input.
E N D
The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE’08) Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11th, 2008
Applications -- Software Testing/Debugging • Using Input Grammar to Generate Test Cases • K. Hanford. Automatic Generation of Test Cases. In IBM Systems Journal, 9(4), 1970. • P. Purdom. A sentence generator for testing parsers. In BIT Numerical Mathematics, 12(3), 1972 • Grammar based whitebox fuzz [PLDI’08] • Delta Debugging • Reducing large failure input [TSE’02] • Hierarchical Delta Debugging (HDD) [ICSE’06] • Execution Fast Forwarding • Reducing Event Log for failure replay[FSE’06]
Applications -- Computer Security • Malware, Attack instanceSignature generation • Exploit (input) Signature • Payload length, keywords, Field structure… • Penetration testing Software vulnerability • Play with Input (fuzz) • Packet Vaccine [CCS’06] • ShieldGen [IEEE S&P’07] • Malware Protocol Replayer • Malware feature Replay the protocol Input Format
Challenges • Input structure exists in a machine unfriendly way • Plain text (ASCII Stream, e.g., C File) • Binary Code (Protocol Message Stream) • Known specification (RFC) • Implementation Deviation • Unknown Specification • Malware • Bot Botnet protocol • Legal software • SAMBA protocol (12 years for open source community)
Challenges • May not have the Source Code Access • Penetration testing • Malware analysis • Legal software • Working on binary
Our Contributions • 2 different approaches to handling 2 types of parsers • Using Dynamic Control Dependency to handle top down parsers • A newdynamic analysis to handle bottom up parsers by identifying and analyzing the parsing stack • Experimental results show that the proposed analyses are highly effective in producing very precise input syntax trees
Outline • Motivation • Technical Description • Handling Inputs with A Top-down Parser • Handling Inputs with A Bottom-up Parser • Evaluation • Discussion • Related Work • Conclusion
I. Top down Parser • Parse input in a top-down manner. S B S H N bB|ε HB 1|2 hN B H h N b B B b 1 h1bbε ε
Implementation Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 H S H N B HB bB|ε 1|2 hN B
Execution Trace c=getchar() Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } h 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 if(c==‘h’) c = getchar() 1 if(c==‘1’||’2’) c = getchar() b1 while(c==‘b’) c = getchar() b2 if(c==‘ε’’) b2 while(c==‘b’) c = getchar() ε if(c==‘ε’’) h1bbε Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes break
Execution Trace c=getchar() Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } h 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 if(c==‘h’) c = getchar() 1 if(c==‘1’||’2’) c = getchar() b1 while(c==‘b’) c = getchar() c = getchar() b2 if(c==‘ε’’) if(c==‘ε’’) b2 while(c==‘b’) while(c==‘b’) c = getchar() ε if(c==‘ε’’) h1bbε Control Dependency: A Stmt Y is control-dependent on X iff X directly determines whether Y executes break
Control dependency graph for the execution trace Void Parser () { char c =getchar(); if (c == ’h’) { c = getchar(); if c ==‘1’ || c==‘2’) { c=getchar(); }else error(); } else error (); while(c==‘b’){ c=getchar(); if(c==‘ε’){ break; } }error(); } START 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 b1 h c=getchar() if(c==‘h’) while(c==‘b’) 1 b2 S c = getchar() c = getchar() if(c==‘1’||’2’) if(c==‘ε’’) b2 B H while(c==‘b’) c = getchar() ε h N b B if(c==‘ε’’) c = getchar() break B b 1 A Control Dependency Graph: A Graph in which any given node directly controls its child node execution ε
Eliminate non data use node START b1 h c=getchar() if(c==‘h’) while(c==‘b’) 1 b2 S c = getchar() c = getchar() if(c==‘1’||’2’) if(c==‘ε’’) b2 B H while(c==‘b’) c = getchar() ε h N b B if(c==‘ε’’) c = getchar() break B b 1 ε
Add Data Use Leaf Node START b1 h if(c==‘h’) while(c==‘b’) 1 b2 S if(c==‘1’||’2’) if(c==‘ε’’) b2 B H while(c==‘b’) ε h N b B if(c==‘ε’’) B b 1 ε
Add Data Use Leaf Node START if(c==‘h’) while(c==‘b’) S h if(c==‘1’||’2’) if(c==‘ε’’) b1 B H while(c==‘b’) b2 1 b2 h N b B if(c==‘ε’’) ε B b 1 ε
Eliminate Redundant Node START 2 if(c==‘h’) 91 while(c==‘b’) S h 4 if(c==‘1’||’2’) 111 if(c==‘ε’’) b1 B H 92 while(c==‘b’) b2 1 b2 h N b B 112 if(c==‘ε’’) Identical Node ε B b 1 ε
II. Bottom up parser • Parse input in a bottom up manner • Programming languages • lex/yacc S S AB A aa B b A B a b a aab
A General Bottom Up Parsing Algorithm while (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A→ βstack.pop (|β|); stack.push (A); } } S AB aab A aa B b • Trace: • while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ;stack.pop(aa); stack.push(A)….
A General Bottom Up Parsing Algorithm while (…) { if (stack should not be reduced ) { stack.push(c); … } else{ //A→ βstack.pop (|β|); stack.push (A); } } S AB aab A aa B b • Trace: • while (…) ; if (stack should not be reduced ) ;stack.push(a), while (…) ; if (stack should not be reduced ) ; stack.push(a), while (…) ; if (stack should not be reduced ) ;stack.pop(aa); stack.push(A)….
Tree Construction Push(S) Push(B) S S AB Pop(b) aab A aa Push(A) B b Push(b) A B Identical Node Push(a) Push(a) • Stack Operation Trace: • Push(a), Push(a), Pop(aa), Push(A) • Push(b), Pop(b), Push(B), Pop(AB), Push(S) b a a Identify the parsing stack
Evaluation – Bottom up grammar Identical Node
Performance Overhead 5X-45X 6X-8X
Discussion • Grammar categories • Top down, bottom up, any others? • Possible to evade the control dependency structure in top down parser implementation. • Individual input • Multiple input final grammar • Syntactic Structure • Semantics
Related Work • Network Protocol Format Reverse Engineering • Instruction Semantics (Comparison, loop keyword, delimiter) • Polyglot [CCS’07] • Automatic Network Protocol Analysis [NDSS’08] • Tupni [CCS’08] • Execution Context (Call stack, PC) • AutoFormat [NDSS’08] • Limitations • Part of the problem space • Only top-down parsers. • Part of the problem’s essence. • Comparison (predicate), call stack control dependency
Conclusion • Two dynamic analyses to construct input structure from program execution. • No source code access or any symbolic information. • Highly effective and produce input syntax trees with high quality.
Q & A Thank you To further contact us: {zlin,xyzhang}@cs.purdue.edu