Automatically Hardening Web Applications Using Precise Tainting

Automatically Hardening Web Applications Using Precise Tainting Anh Nguyen-Tuong Salvatore Guarnieri Doug Greene Jeff Shirley David Evans University of Virginia

phpBB Worm • December 21, 2004 • Over 40,000 sites defaced • PHP injection • Loads Perl scripts to spread itself • Uses Google to search for other phpBB sites

$words = explode (' ', trim (htmlspecialchars (urldecode ($HTTP_GET_VARS ['highlight'])))); ... $highlight_match[] = ... $words[$i] ...; ... … preg_replace (... $highlight_match ...) Original user input: '_%2527_attack User input after HTTP_GET_VARS call: \'_%27_attack User input after explicit urldecode call: \'_'_attack phpBB Vulnerability

Classes of Attacks • Code injection • Cause user provided data to be executed while data is being processed • PHP injection (phpBB worm) • SQL injection • Output generation • Cause user provided data to be displayed to visitors of the website: Cross Site Scripting

SQL Injection • Attacker constructs data that injects database commands • Example: $res = executeQuery ("SELECT real_name FROM users WHERE user = '" . $user . "'AND pwd = '" . $pwd . "' ");

Cross Site Scripting • Inserts user provided data onto a webpage that may include JavaScript • Executes with permissions of hosting website • Simple example: Hello

Importance • Over 12% of Secunia Advisories • 4 of last 10 advisories from FrSIRT • Cross Site Scripting and Code Injection are responsible for many attacks on the internet • It is very hard to write bug free code

Previous Approaches • Static techniques • Dynamic techniques before deployment • Dynamic techniques during deployment

Static • Static analyzers [Shanker+ 01] • Code inspections [Fagan76] • SQL prepared statements [Fisk04, Php05] • Pros • No runtime overhead • Can be done before website is released to the public • Cons • Coding practices may need to change • Inspections are only as good as the inspector • Many false positives

Dynamic Before Deployment • Automated Test Suites: [Huang+ 04], [Tenable05], [Kavado05], [Offutt+ 04], [Watchfire05], [SPI05] • Human testing • Pros • Coding practices do not need to change • Attempts to simulate real world attacking conditions • Cons • Only tests known attacks, cannot show absence of vulnerability • Requires developer effort to fix security holes

Automated Dynamic: Firewalls • Incoming [Scott, Sharp 02] • Incoming and Outgoing [Watchfire04], [Kavado05], [Teros04] • Pros • No need to modify web service • Cons • Only prevent recognized attacks • Coarse policies without knowing application semantics

Automated: Magic Quotes • Escape all quotes supplied by a user • Implemented in PHP and other scripting languages • Extremely successful • Do not require the programmer to do anything • Prevent many SQL injection attacks • But, prevent only a specific class of attacks

Previous Work Limitations • Being precise about what constitutes an attack is a lot of work • Automated techniques suffer from not exploiting the application semantics • We want a system that works as effortlessly as magic quotes, but prevents a wider class of attacks

Our Approach • Fully automated • Aware of application semantics • Replace PHP interpreter with a modified interpreter that: • Keeps track of which information comes from untrusted sources (precise tainting) • Checks how untrusted input is used

file.php 2 3 File System 1 PHPrevent PHP Interpreter 4 Client 8 Database 5 HTTP Server 6 7 System APIs Web Server

Coarse Grain Tainting • Provided by many scripting languages (Perl, Ruby) • Untrusted input is tainted • Everything touched by tainted data becomes tainted $query = "SELECT real_name FROM users WHERE user = '" . $user . "'AND pwd = '" . $pwd . "' "; Entire$query string is tainted

Precise Tainting • Untrusted input is tainted • Taint markings are maintained at character level • Depends on semantics of program • Only really tainted data is tainted $query = "SELECT real_name FROM users WHERE user = '" . $user. "'AND pwd = '" . $pwd . "' ";  $query = "SELECT real_name FROM users WHERE user = '' OR 1 = 1; -- ';'AND pwd = '' ";

Precise Checking • Wrappers around PHP functions that handle updating and checking precise taint information • Conservative: no false negatives while minimizing false positives • Behavior only changes when an attack is likely

Preventing SQL Injection • Parse the query using the Postgres SQL parser: identify interpreted text • Disallow SQL keywords or delimiters in interpreted text that is tainted • Query is not sent to database • Error response it returned "SELECT real_name FROM users WHERE user = ''OR 1 = 1; -- ';' AND pwd = '' ";

Preventing PHP Injection • Disallow tainted data to be used in functions that treat input strings as PHP code or manipulate system state • We place wrappers around these functions to enforce this rule • phpBB attack prevented by wrappers around preg_replace

Preventing Cross Site Scripting • Wrappers around output functions • Buffer output and then parse the tainted output with HTML Tidy • Check the parsed HTML against a white list to ensure there is no dangerous output • Dangerous content was determined by examining HTML grammar • Sanitize it by removing tags Hello Safe Hello Unsafe

Current Status • Modified PHP interpreter: PHPrevent • Prevents PHP injection, SQL injection and cross site scripting attacks • Overly conservative: we have not specified precise semantics for most PHP functions • Performance • Initial measurements indicate performance overhead is acceptable

Future Work: Theory and Analysis • End-to-end information flow security • Replace ad-hoc taint marking with principled mechanism • Analyze data flow at interpreter level • Infer taint specifications for PHP functions using dynamic analysis • Verify that taint marking in PHP specification is consistent with interpreter implementation

Future Work: Implementation • Full implementation of precise tainting for PHP APIs • Handle persistent state • Track tainting through database store • Multiple tainting types with different checking rules • Incorporate modifications into main PHP distribution

Summary • Many websites are prone to attacks even after using current methods • Our method: • Fully automated • Prevents large classes of attacks • Easy to deploy

Thank You www.cs.virginia.edu/sammyg

Automatically Hardening Web Applications Using Precise Tainting