1 / 64

Fast and Precise Sanitizer Analysis with Bek

Fast and Precise Sanitizer Analysis with Bek. Pieter Hooimeijer Ben Livshits David Molnar Prateek Saxena Margus Veanes. 2011-08-10 USENIX Security. < img src =' some untrusted input '/>. < img src =' some untrusted input '/>. Question: What could possibly go wrong?.

rainer
Download Presentation

Fast and Precise Sanitizer Analysis with Bek

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast and Precise Sanitizer Analysis with Bek Pieter Hooimeijer Ben Livshits David Molnar PrateekSaxena MargusVeanes 2011-08-10 USENIX Security

  2. <imgsrc='some untrusted input'/>

  3. <imgsrc='some untrusted input'/> Question: What could possibly go wrong?

  4. <imgsrc='some untrusted input'/> Attacker:im.png' onload='javascript:...

  5. <imgsrc='some untrusted input'/> Attacker:im.png' onload='javascript:...

  6. <imgsrc='some untrusted input'/> Attacker:im.png' onload='javascript:... Result: <imgsrc='im.png' onload='javascri

  7. <imgsrc='some untrusted input'/> Attacker:im.png' onload='javascript:... Result: <imgsrc='im.png' onload='javascri FAIL

  8. A tale of two sanitizers…

  9. ' &#39; single quote html entity

  10. some untrusted input

  11. some untrusted input Library A Name: Around for: Availability: HtmlEncode Years Readily available to C# developers

  12. some untrusted input Library A Name: Around for: Availability: Library B Name: Around for: Availability: HtmlEncode Years Readily available to C# developers HtmlEncode Years Readily available to C# developers

  13. Library A Name: Around for: Availability: Library B Name: Around for: Availability: HtmlEncode Years Readily available to C# developers HtmlEncode Years Readily available to C# developers ✔ ✘ ' ' ' &#39;

  14. MS AntiXSS .NET WebUtility private static string HtmlEncode(string input, booluseNamedEntities, MethodSpecificEncoderencoderTweak) { if (string.IsNullOrEmpty(input)) { return input; } if (characterValues == null) { InitialiseSafeList(); } if (useNamedEntities && namedEntities == null) { InitialiseNamedEntityList(); } // Setup a new character array for output. char[] inputAsArray = input.ToCharArray(); intoutputLength = 0; intinputLength = inputAsArray.Length; char[] encodedInput = new char[inputLength * 10];   SyncLock.EnterReadLock(); try { for (int i = 0; i < inputLength; i++) { char currentCharacter = inputAsArray[i]; intcurrentCodePoint = inputAsArray[i]; char[] tweekedValue; // Check for invalid values if (currentCodePoint == 0xFFFE || currentCodePoint == 0xFFFF) { throw new InvalidUnicodeValueException(currentCodePoint); } else if (char.IsHighSurrogate(currentCharacter)) { if (i + 1 == inputLength) { throw new InvalidSurrogatePairException(currentCharacter, '\0'); } // Now peak ahead and check if the following character is a low surrogate. char nextCharacter = inputAsArray[i + 1]; char nextCodePoint = inputAsArray[i + 1]; if (!char.IsLowSurrogate(nextCharacter)) { throw new InvalidSurrogatePairException(currentCharacter, nextCharacter); } // Look-ahead was good, so skip. i++; // Calculate the combined code point long combinedCodePoint = 0x10000 + ((currentCodePoint - 0xD800) * 0x400) + (nextCodePoint - 0xDC00); char[] encodedCharacter = SafeList.HashThenValueGenerator(combinedCodePoint); encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (char.IsLowSurrogate(currentCharacter)) { throw new InvalidSurrogatePairException('\0', currentCharacter); } else if (encoderTweak != null && encoderTweak(currentCharacter, out tweekedValue)) { for (int j = 0; j < tweekedValue.Length; j++) { encodedInput[outputLength++] = tweekedValue[j]; } } else if (useNamedEntities && namedEntities[currentCodePoint] != null) { char[] encodedCharacter = namedEntities[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (characterValues[currentCodePoint] != null) { // character needs to be encoded char[] encodedCharacter = characterValues[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else { // character does not need encoding encodedInput[outputLength++] = currentCharacter; } } } finally { SyncLock.ExitReadLock(); } return new string(encodedInput, 0, outputLength); } public static string HtmlEncode(string s) { if (s == null) return null; intnum = IndexOfHtmlEncodingChars(s, 0); if (num == -1) return s; StringBuilder builder=new StringBuilder(s.Length+5); int length = s.Length; intstartIndex = 0; Label_002A: if (num > startIndex) { builder.Append(s, startIndex, num-startIndex); } char ch = s[num]; if (ch > '>') { builder.Append("&#"); builder.Append(((int) ch).ToString(NumberFormatInfo.InvariantInfo)); builder.Append(';'); } else { char ch2 = ch; if (ch2 != '"') { switch (ch2) { case '<': builder.Append("&lt;"); goto Label_00D5; case '=': goto Label_00D5; case '>': builder.Append("&gt;"); goto Label_00D5; case '&': builder.Append("&amp;"); goto Label_00D5; } } else { builder.Append("&quot;"); } } Label_00D5: startIndex = num + 1; if (startIndex < length) { num = IndexOfHtmlEncodingChars(s, startIndex); if (num != -1) { goto Label_002A; } builder.Append(s, startIndex, length-startIndex); } return builder.ToString(); }

  15. MS AntiXSS .NET WebUtility private static string HtmlEncode(string input, booluseNamedEntities, MethodSpecificEncoderencoderTweak) { if (string.IsNullOrEmpty(input)) { return input; } if (characterValues == null) { InitialiseSafeList(); } if (useNamedEntities && namedEntities == null) { InitialiseNamedEntityList(); } // Setup a new character array for output. char[] inputAsArray = input.ToCharArray(); intoutputLength = 0; intinputLength = inputAsArray.Length; char[] encodedInput = new char[inputLength * 10];   SyncLock.EnterReadLock(); try { for (int i = 0; i < inputLength; i++) { char currentCharacter = inputAsArray[i]; intcurrentCodePoint = inputAsArray[i]; char[] tweekedValue; // Check for invalid values if (currentCodePoint == 0xFFFE || currentCodePoint == 0xFFFF) { throw new InvalidUnicodeValueException(currentCodePoint); } else if (char.IsHighSurrogate(currentCharacter)) { if (i + 1 == inputLength) { throw new InvalidSurrogatePairException(currentCharacter, '\0'); } // Now peak ahead and check if the following character is a low surrogate. char nextCharacter = inputAsArray[i + 1]; char nextCodePoint = inputAsArray[i + 1]; if (!char.IsLowSurrogate(nextCharacter)) { throw new InvalidSurrogatePairException(currentCharacter, nextCharacter); } // Look-ahead was good, so skip. i++; // Calculate the combined code point long combinedCodePoint = 0x10000 + ((currentCodePoint - 0xD800) * 0x400) + (nextCodePoint - 0xDC00); char[] encodedCharacter = SafeList.HashThenValueGenerator(combinedCodePoint); encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (char.IsLowSurrogate(currentCharacter)) { throw new InvalidSurrogatePairException('\0', currentCharacter); } else if (encoderTweak != null && encoderTweak(currentCharacter, out tweekedValue)) { for (int j = 0; j < tweekedValue.Length; j++) { encodedInput[outputLength++] = tweekedValue[j]; } } else if (useNamedEntities && namedEntities[currentCodePoint] != null) { char[] encodedCharacter = namedEntities[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (characterValues[currentCodePoint] != null) { // character needs to be encoded char[] encodedCharacter = characterValues[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else { // character does not need encoding encodedInput[outputLength++] = currentCharacter; } } } finally { SyncLock.ExitReadLock(); } return new string(encodedInput, 0, outputLength); } public static string HtmlEncode(string s) { if (s == null) return null; intnum = IndexOfHtmlEncodingChars(s, 0); if (num == -1) return s; StringBuilder builder=new StringBuilder(s.Length+5); int length = s.Length; intstartIndex = 0; Label_002A: if (num > startIndex) { builder.Append(s, startIndex, num-startIndex); } char ch = s[num]; if (ch > '>') { builder.Append("&#"); builder.Append(((int) ch).ToString(NumberFormatInfo.InvariantInfo)); builder.Append(';'); } else { char ch2 = ch; if (ch2 != '"') { switch (ch2) { case '<': builder.Append("&lt;"); goto Label_00D5; case '=': goto Label_00D5; case '>': builder.Append("&gt;"); goto Label_00D5; case '&': builder.Append("&amp;"); goto Label_00D5; } } else { builder.Append("&quot;"); } } Label_00D5: startIndex = num + 1; if (startIndex < length) { num = IndexOfHtmlEncodingChars(s, startIndex); if (num != -1) { goto Label_002A; } builder.Append(s, startIndex, length-startIndex); } return builder.ToString(); } • Same behavior on all inputs? • If not, what is a differentiating input? • Can it generate any known ‘bad’ outputs?

  16. A tale of 151 sanitizers…

  17. PHP Trunk Changes to html.c, 1999—2011

  18. PHP Trunk Changes to html.c, 1999—2011 R7,841 April 1999 135 loc R309,482 March 2011 1693 loc

  19. R32,564 September 2000 ENT_QUOTES introduced PHP Trunk Changes to html.c, 1999—2011 R7,841 April 1999 135 loc R309,482 March 2011 1693 loc

  20. R32,564 September 2000 ENT_QUOTES introduced R242,949 September 2007 $double_encode=true PHP Trunk Changes to html.c, 1999—2011 R7,841 April 1999 135 loc R309,482 March 2011 1693 loc

  21. PHP Trunk Changes to html.c, 1999—2011 • Safe to apply twice? • Safe to combine with other sanitizers?

  22. Motivation • Writing string sanitizers correctly is difficult • There is no cheap way to identify problems with sanitizers • ‘Correctness’ is a moving target • What if we could say more aboutsanitizer behavior?

  23. Contributions • Bek • Frontend: a small language for string manipulation; similar to how sanitizers are written today • Backend: a model based on symbolic finite transducerswith algorithms for analysis and code generation

  24. Contributions • Bek • Frontend: a small language for string manipulation; similar to how sanitizers are written today • Backend: a model based on symbolic finite transducerswith algorithms for analysis and code generation • Evaluation • Converted sanitizers from a variety of sources • Checked properties like reversibility, idempotence, equivalence, and commutativity

  25. Contributions • Bek • Frontend: a small language for string manipulation; similar to how sanitizers are written today • Backend: a model based on symbolic finite transducerswith algorithms for analysis and code generation • Evaluation • Converted sanitizers from a variety of sources • Checked properties like reversibility, idempotence, equivalence, and commutativity

  26. Bek: Architecture s := iter(cint)[b:= false;] {case (!b&&cin"[\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; Bek Program

  27. Bek: Architecture Transformation Symbolic Finite Transducers s := iter(cint)[b:= false;] {case (!b&&cin"[\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; Microsoft.Automata Z3 Bek Program

  28. Bek: Architecture Transformation Symbolic Finite Transducers Does it do the right thing? Counterexample “\' vs. \\'” Analysis s := iter(cint)[b:= false;] {case (!b&&cin"[\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; Microsoft.Automata Z3 Bek Program

  29. Bek: Architecture Transformation Symbolic Finite Transducers Does it do the right thing? Counterexample “\' vs. \\'” Analysis s := iter(cint)[b:= false;] {case (!b&&cin"[\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; Microsoft.Automata Z3 Bek Program Code Gen Code Gen C# JavaScript C

  30. Bek: Architecture Transformation Symbolic Finite Transducers Does it do the right thing? Counterexample “\' vs. \\'” Analysis s := iter(cint)[b:= false;] {case (!b&&cin"[\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; Microsoft.Automata Z3 Bek Program Code Gen Code Gen C# JavaScript C

  31. A Bek Program: Escape Quotes t := iter(cins)[b:= false;] {case (!b&&cin"['\"]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); };

  32. iterate over the characters in string s A Bek Program: Escape Quotes t := iter(cins)[b:= false;] {case (!b&&cin"['\"]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); };

  33. while updating one boolean variable b iterate over the characters in string s A Bek Program: Escape Quotes t := iter(cins)[b:= false;] {case (!b&&cin"['\"]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); };

  34. Bek: Architecture Transformation Symbolic Finite Transducers Does it do the right thing? Counterexample “\' vs. \\'” Analysis s := iter(cint)[b:= false;] {case (!b&&cin"[\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; Microsoft.Automata Z3 Bek Program Code Gen Code Gen C# JavaScript C

  35. A Symbolic Finite Transducer

  36. A Symbolic Finite Transducer symbolic predicates

  37. A Symbolic Finite Transducer symbolic predicates output lists

  38. Bek: Architecture Transformation Symbolic Finite Transducers Does it do the right thing? Counterexample “\' vs. \\'” Analysis s := iter(cint)[b:= false;] {case (!b&&cin"[\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; Microsoft.Automata Z3 Bek Program Code Gen Code Gen C# JavaScript C

  39. Bek: Architecture Transformation Symbolic Finite Transducers Does it do the right thing? Counterexample “\' vs. \\'” Analysis s := iter(cint)[b:= false;] {case (!b&&cin"[\"\\]"):b:= false;yield('\\', c);case (c=='\\'):b:= !b;yield(c); case (true): b:= false; yield(c); }; Microsoft.Automata Z3 Bek Program Now what? Code Gen Code Gen C# JavaScript C

  40. Equivalence Checking SFT Algorithms

  41. Equivalence Checking SFT Algorithms AntiXSS.HtmlEncode WebUtility.HtmlEncode

  42. Join Composition SFT Algorithms SFT A  B SFT A SFT B in in out out

  43. Join Composition SFT Algorithms SFT A  B SFT A SFT B in in out out JavaScriptEncode(HtmlEncode(w)) HtmlEncode(JavaScriptEncode(w))

  44. Pre-Image Computation Regular Language Regular Language S SFT A in

  45. Pre-Image Computation Regular Language Regular Language S ? SFT A in

  46. Contributions • Bek • Frontend: a small language for string manipulation; similar to how sanitizers are written today • Backend: a model based on symbolic finite transducerswith algorithms for analysis and code generation • Evaluation • Converted sanitizers from a variety of sources • Checked properties like reversibility, idempotence, equivalence, and commutativity

  47. Some Questions • What features are needed to port existing sanitizers? • Can we check interesting properties on real sanitizers? • Will HtmlEnc implementations protect against XSS Cheat Sheet samples?

More Related