Taint Tracking Through UTF Extension

Taint Tracking Through UTF Extension by Bože Zekan supervised by Dr. Mark Shtern, Dr. Vassilios Tzerpos Computer Science and Engineering Faculty York University funded by NSERC USRA Grant

Topics To Be Covered • Some threats from user input • Taint tracking • Previous work • Our work

Topics To Be Covered Our work • Unicode • Implementations • Results

The Problem We Are Addressing • Estimated that > 80% of web services contain security vulnerabilities 1 • Many of these (50 to 82%) are user command injection vulnerabilities 1 [1] Chin, Erika, and Wagner, David. Efficient Character-level Taint Tracking for Java. In Procedings of SWS’09, November 13, 2009, Chicago, Illinois, USA. ACM 978-1-60558-789-9/09/11

Our Goal Reduce security vulnerabilities that may occur when dealing with user input Userinput: - input from an actual physical person - input from another program, file, database, etc OR - any data that is not a literal constant in our program or has not been generated by the manipulation of literal constants in our program

Some User Command Injection Threats: • SQL injection • Cross-site scripting (XSS) • Path traversal • Shell injection attacks, http response splitting, ...

SQL Injection query = "SELECT * FROM students WHERE name = '" + studentName + "'"; SELECT * FROM students WHERE name = 'bobby'

SQL Injection From: Exploits of a Mom webcomic at http://xkcd.com/327/

SQL Injection query = "SELECT * FROM students WHERE name = '" + studentName + "'"; SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students; --'

Cross-Site Scripting (XSS) html="" + name + " " + when + " " + comment + ""; Anonymous 0 Hours Ago Have you noticed that Soros spelled backwards is still Soros? Coincidence, I think not!

Cross-Site Scripting (XSS) html="" + name + " " + when + " " + comment + ""; Anonymous 0 Hours Ago <script> window.location="http://www.mybadsite.com/"</script>

Path Traversal filename = "/srv/www/users/bobby/" + filename; filename: /srv/www/users/bobby/myhomework1.doc

Path Traversal filename = "/srv/www/users/bobby/" + filename; filename: /srv/www/users/bobby/../cse3000/tentativetestquestions.doc  /srv/www/users/cse3000/tentativetestquestions.doc

To Prevent the Propagation of Malicious Data Possible solution #1: Carefully parse/sanitize/analyze all data being sent to a sensitive data sink SELECT * FROM students WHERE name = 'bobby' SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students; --' Anonymous 0 Hours Ago Have you noticed that Soros spelled backwards is still Soros? Coincidence, I think not! Anonymous 0 Hours Ago <script>window.location = "http://www.mybadsite.com/"</script> /srv/www/users/bobby/myhomework1.doc /srv/www/users/bobby/../cse3000/tentativetestquestions.doc ... and hope that you catch everything from among all the possibly combinations, and don't discard any valid requests

To Prevent the Propagation of Malicious Data Possible solution #2: Carefully parse/sanitize/analyze all user supplied data being sent to a sensitive data sink SELECT * FROM students WHERE name = 'bobby' SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students;--‘ Anonymous 0 Hours Ago Have you noticed that Soros spelled backwards is still Soros? Coincidence, I think not! Anonymous 0 Hours Ago <script>window.location = "http://www.mybadsite.com/"</script> /srv/www/users/bobby/myhomework1.doc /srv/www/users/bobby/../cse3000/tentativetestquestions.doc ... and hope that you catch everything from among all the possibly combinations, and don't discard any valid requests

Taint Tracking Makes Possible Solution 2 • Taint tracking consists of three main steps: • 1. Identifying untrusted input at the point that it enters the program and • marking that it is untrusted (i.e., tainted). • 2. Propagating the taint information • At each subsequent computation, mark as tainted all data that is derived from an untrusted source. • Checking all data going into sensitive data sinks (e.g., a database, • or output response, or file) • Use the taint information to identify potential attacks.

Taint Tracking • Taint tracking comes in two possible flavours: • String level • – mark the entire string as tainted • Character level • - mark individual characters as tainted • - allows for finer granularity

How Can Character Level Tainting Be Achieved? One method, by Chin and Wagner, of USC Berkley 1 Expand the structure of the Java String class to include a boolean array which stores the taint status for each character in the string. [1] Chin, Erika, and Wagner, David. Efficient Character-level Taint Tracking for Java. In Procedings of SWS’09, November 13, 2009, Chicago, Illinois, USA. ACM 978-1-60558-789-9/09/11

The Chin and Wagner method Their achievement: Implementing a solution which minimizes the need to rewrite existing application code while transparently decreasing the vulnerability of applications to threats tracking Their shortcomings: • Specific to Java • Increases the memory required to store a string in Java • The taint status of the java char primitive cannot be determined • Not readily adapted to other programming languages • Their taint information cannot propagate onwards to a database, or an application, script, or procedure running in another programming language.

How can character level tainting be achieved? Our method: Expand Unicode to include tainted characters Our achievements: · Implement a solution which minimizes the need to rewrite existing application source code while transparently decreasing the vulnerability of applications to threats. · Is not specific to Java · Does not increase the memory required to store a string in Java · The taint status of the java char primitive can be determined · Is readily adapted to other programming languages · The taint information can propagate onwards to a database, or an application, script, or procedure running in another programming language

What is Unicode? • A scheme that assigns a codepoint to each character in current use throughout the world • Has been implemented in XML, Java, Microsoft.NET, web browsers, databases, and modern operating systems.

Unicode • Can accomodate 1,114,112 codepoints in 17 “planes” of 65,536 characters each • Most of the codespace is still unassigned • Mechanisms (ex. UTF-8, UTF-16 ...) exist that already allow software to manipulate and store all these codepoints even if no characters have been assigned to them

Our Design, Part 1Tainting & Propagating Taint • We create a “tainted” character for every character and assign it an unused codepoint Ex. Untainted  Tainted (ascii: 41hex) A A (Unicode: U+0041) (Unicode:U+E041) (ascii: 7Ahex) z z (Unicode: U+007A) (Unicode:U+E071) • Now wherever a character’s codepoint goes, it’s tainted or untainted status goes with it

Tainting Algorithms • To taint a user input character x: __codepoint(tainted x) = codepoint(x) + OFFSET • To check if character x is tainted or not: if (codepoint(x) is in tainted codepoint range)___character x is tainted //is user supplied else character x is untainted • To remove taint from tainted character x:__codepoint(x) = codepoint(tainted x) - OFFSET

Our Design, Part 2The Transparent Protection Framework Consider a typical vulnerable web application:

Designing The Added Transparent Protection Framework Consider a less vulnerable web application: • User’s OS has fonts which incorporate tainted characters • Request Intercept Wrapper uses custom taint aware classes/functions and is generic for a given technology • Application is on a server w/taint awareness built into its library functions • Database Driver Intercept Wrapper uses custom taint aware classes/functions specific to the database to check for SQL injection, and drop malicious queries

Implementation Details: The Font For a final, universally adopted application: • System fonts would be expanded to include tainted characters, which would look identical to their untainted counterparts Ex. untainted ABCDE ... vs tainted ABCDE ... For our proof of concept: • Tainted vs untainted character appear different • to easily distinguish them on computer screens and in documents Ex. untainted ABCDE ... vs tainted ...

Implementation Details: The Font • We used Type-Light freeware to modify Window's Courier New font - installed it by dragging out the original ttf file from the Fonts directory, and dragging in our new ttf file

Implementation Details: The Application • Has no knowledge of taint • Counts the number of visits of this user • 1st query to db checks if user’s name is in the db. • If no, then insert name into db and sets visits count to 1 • If yes, then increment visits count by 1 in the db • 2nd query to db outputs the number of visits for the user‘s _name from the db’s record

Implementation Details: The Transparent Protection Framework We implemented our framework on our typical web application in four different technologies: 1. PHP/Mysql on Apache (under Windows XP) 2. PHP/DB2 on Apache (under Linux) 3. Java Servlet/DB2 on Tomcat7 (under Linux) 4. PHP on Apache (under Linux) calling Java Servlet/DB2 ----on Tomcat7 (under Linux) To do this we set the UTF-8 or Unicode encoding option everywhere it was available, and Courier New as the selected font wherever possible.

Implementation Details: The Transparent Protection Framework

Implementation Details: The Form Page

Implementation Details: The Request Intercept Wrapper • Two versions were used: 1. PHP version which uses cURL to interact with the application 2. Java Servlet version which uses a connection to interact with the application • Both versions handled both the post and get requests. • Browser only sees wrapper's url, never the application page's url • Both will work with any form, no matter the combinations of controls

Implementation Details: PHP Application & Db Driver Intercept • Four applications exist - essentially the same code with minor variations • Two Database Driver Intecept Wrappers exist - essentially the same code with minor variations - they are php include files - each file has taint aware functions that wrap the _query and fetch array functions of their respective _databases

Implementation Results: PHP Application & Db Driver Intercept • Was not totally transparent - application needed modification to specify the include files, and rename two functions • But we did successfully: - propagate taint from user input all the way back to the user output - transparently detect and stop SQL injection - show our method work on different databases and different operating systems - produce an easy to implement solution to increase the security of legacy programs

Implementation Results: PHP Application & Db Driver Intercept

Implementation Details: Java Application • One application, reachable in two ways • Has modified String & Character classes that will not break application at ("A").equals(" ") or ('A').equals(' ')

Implementation Details: Java DB2 Database Intercept Wrapper • Is a collection of custom taint aware classes • The original ibm.db2.jdbc.app.DB2Driver class is wrapped with our taint aware Db2DriverIntercept class • We then drill down and also wrap the Connection, PreparedStatement, and ResultSet interfaces and augment their existing methods to provide transparent SQL injection protection

Implementation Results: Java Application & Db Driver Intercept • Was not totally transparent - application needs to call our driver instead of the IBM’s database driver • But we additionally showed that our character level taint method could: - work on different programming languages (php and java) and paradigms (procedural and OOP) - propagate between different languages and different servers - could be handled transparently by modifying Java’s String and Character class operations

Application Breaks & Work Arounds • Java: the char is a primitive if ('A'==' ') … is as far as we can keep taint information accurate  Thereafter, taint information is lost  no further propagation - if allowed to alter source code then replace ('A'==' ')with taint aware custom method ('A'.equals(' '))to allow taint to propagate even further within an application.

Application Breaks & Work Arounds • php: strings are considered primitive if ("AB"==" ") … is as far as we can keep taint information accurate  Thereafter, taint information is lost  no further propagation • if allowed to alter source code then replace ("AB"==" ") with taint aware custom method (("AB".equals(" "))to allow taint to propagate even further within an application. NB! If our method were to be adopted universally, the above could be overcome by modifying the JVM or PHP engine

Other Possible Uses of Our Character Level Tainting Method • Tainting and tracking of multiple input sources • there are a lot of unassigned codepoints • many tainted character sets could be created to indicate different data sources (ex. keyboard, file, database, remote login, ...) • Storing tainted characters in log files to make user input immediately recognizable • Tainted characters can be stored in a database & retrieved via using taint in queries

Other Possible Uses of Our Character Level Tainting Method

Taint Tracking Through UTF Extension

Taint Tracking Through UTF Extension

Presentation Transcript

Click-Through Tracking for Web Design

Tracking: Success through Partnerships

Smoke Taint and Mirrors

Sustainable agricultural extension through technology

Flowmonkey : A Fast Dynamic Taint Tracking Engine for JavaScript

Businesses tracking employees through gps

Efficient Character-level Taint Tracking for Java

The “Taint” Leakage Model

The “Taint” Leakage Model

Smoke Taint Removal

Sustainable agricultural extension through technology

Tracking through Optical Snow

RECOGNIZING FACIAL EXPRESSIONS THROUGH TRACKING

Dynamic Taint Analysis

Charset to UTF

Perl: TAINT mode

Taint 2.0 (taint analysis on steroids)

Taint Analysis Review

TEMPLE: TEMPLate Extension Through Knowledge Acquisition

Tracking Stopping Times Through Noisy Observations

Detection of boar taint

Evolution through Extension