230 likes | 361 Views
Modeling Regular Replacement for String Constraints Solving. Xiang Fu Hofstra University Chung- Chih Li Illinois State University. malicious scripts . Hacker. Cool page! . Server. Background. Problem? Lack of Sufficient Sanitation of Text Inputs. One Typical Error. 1 <? php
E N D
Modeling Regular Replacement for String Constraints Solving Xiang Fu Hofstra University Chung-Chih Li Illinois State University NFM 2010
malicious scripts Hacker Cool page! Server Background Problem? Lack of Sufficient Sanitation of Text Inputs NFM 2010
One Typical Error 1 <?php 2 $msg = $_POST[”msg”]; 3 $sanitized = pregreplace( 4 ”/\< s c r i p t .*?\>.*?\<\/ s c r i p t .*?\ >/ i ”, 5 ” ” , 6 $msg ) ; 7 savetodb($sanitized ) 8 ?> Reluctant Kleene Star Attacker’s Input <<script></script>script>alert(’a’)</script> <script>alert(’a’)</script> NFM 2010
Bigger Picture • Objective: Automatic Discovery of Vulnerabilities SUSHI Bytecode Symbolic Execution String Constraint Solver Test Replayer Attack Pattern NFM 2010
Our Contribution • Atomic Replacement Constraints • Consider Two Semantics • Greedy • Reluctant • Modeling Using Finite State Transducer (FST) • Compact Representation of FST • Security Analysis NFM 2010
Finite State Transducer • Accepts Regular Relation • Union, Concat, Composition • Intersection, Complement • Used for Modeling Rewriting Rules [Kaplan94, Karttunen96] ε:1 a:2 1 2 3 4 b:3 A (ab,123) ∈ L(A) NFM 2010
Hierarchical FST &Modeling Declarative Semantics Goal: Replacement Regular Search Pattern Any String not Containing patter r Identical Relation Id(∑* - ∑* r ∑*) r : ω Id(∑* - ∑* r ∑*) 2 1 3 4 ε:ε NFM 2010
Modeling Reluctant Semantics • 2 Steps • Mark the beginning of pattern • Do the replacement Goal: Key: Left-Most Matching NFM 2010
Input Word Search Pattern a a b b c d a b c a b d a+b+c x Begin Marker # a # a b b c d # a b c a b d s1 reluc(r)#’ : ω #: ε x d x a b d ε: ε s2 f1 Id(∑) NFM 2010
The Challenge: Begin Marker Input Word Search Pattern a a b b c d a b c a b d a+b+c x # # # # Look-ahead Capability? 3 Steps: End marker Generic end marker Begin marker Non-determinism NFM 2010
1 2 3 4 Preliminary End Marker Search Pattern a+b+c x c:c b:b a: a b :b ε:$ Idea: Start with End Marker for Reverse of Search Pattern a: a 5 Reversed Pattern A1 cb+a+ Problem: Input tape accepts cb+a+ only! NFM 2010
Generic End Marker Pattern cb+a+ c:c b:b 1 c:c b:b a:a ε:$ 1 Deterministic! a:a a:a a:a b:b c:c c:c 2 3 4 5 5,1 2,1 4,1 3,1 b:b A2 c c b a a c c b a $ a $ Input Word Output Word NFM 2010
Finally, the Begin Marker Search Pattern a+b+c x 0 ε:ε c:c b:b ε:ε ε:ε c:c 1 b:b a:a ε:# 1 3 2 4 5 a:a a:a b:b 2,1 5,1 4,1 3,1 c:c c:c b:b A3 NFM 2010
Input Word Search Pattern a a b b c d a b c a b d a+b+c x Begin Marker # a # a b b c d # a b c a b d s1 reluc(r)#’ : ω #: ε x d x a b d ε: ε s2 f1 Id(∑) NFM 2010
Greedy Semantics Goal: greedy Challenge: Look-ahead longestmatch NFM 2010
Search Pattern a+ x aabab Step 1: Begin Marker #a#ab#ab Step 2: ND End Marker #a#ab#a$b #a$#a$b#a$b #a#a$b#a$b #a#a$b#ab Step 3: Pairing Markers #aa$b#a$b #aaba$b Step 4: Checking Match #a$#a$b#a$b Step 5: Check Longest Step 6: Replacement xbxb NFM 2010
Login Servlet Applications • Solve String Constraints Input: user name After filtering single quote and length restriction NFM 2010
Solving Atomic Constraint Goal: A1 Id(P) Project to Input Tape Solution NFM 2010
SUSHI Constraint Solver Type I Type II Type III • Solves Simple Linear String Constraints (SISE) • Relies on • dk.brics.automaton for FSA operations • Self-made Java package for FST operations • Supports 16-bit Unicode • Compact Transition Representation (I,I) (II,I) (III,II) NFM 2010
Login Servlet Efficiency of Solver 1.4 Seconds on 2Ghz PC Benchmark Equations 1 Flex SDKXSS Attack 2 Equation Size: 565 74 Seconds Shorter than Security Track #1022748 3 4 NFM 2010
Related Work Our Contribution: Precise Modeling of Various Regular Substitution Semantics • Forward String Analysis • Christensen & Møller [SAS’03] • Wasserman & Su [PLDI’07, ICSE’08] • Bjørner & Tillmann [TACAS’09] • Backward String Analysis • Kiezun & Ganesh [ISSTA’09] • Yu & Bultan [SPIN’08, ASE’09] • Fu [COMPSAC’07, TAVWEB’08] • Natural Language Processing • * Kaplan and Kay [CL’1994] NFM 2010
Limitations • SISE String Constraints • All Variables Appear on LHS (Once) • No Easy Solution for Equation System Yet • No string length • Future Directions • Encoding string length in automata • Finite model on bit-vector NFM 2010
Questions? NFM 2010