320 likes | 472 Views
Anomaly Detection of Web-based Attacks. Cristopher Kruegel & Giovanni Vigna CCS ‘03 Presented By: Payas Gupta. Outline. Web based attacks. XSS attacks Buffer overflow Directory transversal Input validation Code red Anomaly Detection v/s Misuse Detection. Data Model.
E N D
Anomaly Detection of Web-based Attacks Cristopher Kruegel & Giovanni Vigna CCS ‘03 Presented By: Payas Gupta
Web based attacks • XSS attacks • Buffer overflow • Directory transversal • Input validation • Code red • Anomaly Detection v/s Misuse Detection
Data Model • Only GET requests with no header • 169.229.60.105 − johndoe [6/Nov/2002:23:59:59 −0800 "GET/scripts/access.pl?user=johndoe&cred=admin" 200 2122 • Only Query string, no path • For query q, Sq={a1,a2} a1=v1 a2=v2 Query Path
Detection model • Each model is associated with weight wm. • Each model returns the probability pm. • A value close to 0 indicates anomalous event i.e. a value of pm close to 1 indicates anomalous event.
Attribute Length • Normal Parameters • Fixed sized tokens (session identifiers) • Short strings (input from HTML form) • So, doesn’t vary much associated with certain prg. • Malicious activity • E.g. for buffer overflow • Goal: to approximate the actual but unknown distribution of the parameter lengths and detect deviation from the normal
Learning & Detection • Learning • Calculate mean and variance for the lengths l1,l2,...,ln for the parameters processed. • N queries with this attribute • Detection • Chebyshev inequality • This computation bound has to be weak, to result in high degree of tolerance (very weak) • Only obvious outliers are flagged as suspicious
Attribute character distribution • Attributes have regular structure, printable characters • There are similarities between the character frequencies of query parameters. • Relative character frequencies of the attribute are sorted in relative order • Normal • freq. slowly decrease in value • Malicious • Drop extremely fast (peak cause by single character distrib.) • Nearly not at all (random values) Passwd – 112 97 115 115 119 110 0.33 0.17 0.17 0.17 0.17 0 255 times ICD(0) = 0.33 & ICD(1) to ICD(4) = 0.17 ICD(5)=0
Why is it useful? • Cannot be evaded by some well-known attempts to hide malicious code in the string. • Nop operation substituted by similar behavior (add rA,rA,0) • But not useful in when small routine change in the payload distribution
Learning and detection • Learning • For each query attribute, its character distribution is stored • ICD is obtained by averaging of all the stored character distributions q1 q2 q3 avg
Learning and detection (cont...) • Pearson chi-square test • Not necessary to operate on all values of ICD consider a small number of intervals, i.e. bins • Calculate observed and expected frequencies • Oi= observer frequencies for each bin • Ei= relative freq of each bin * length of the attribute • Compute chi-square • Calculate probability from chi-square predefined table
Structural inference • Structural is the regular grammar that describes all of its normal legitimate values. • Why?? • Craft attack in a manner that makes its manifestation appear more regular. • For example, non-printable characters can be replaces by groups of printable characters.
Learning and detection • Basic approach is to generalize grammar as long as it seems reasonable and stop before too much structural information is lost. • MARKOV model and Bayesian probability • NFA • Each state S has a set of ns possible output symbols o which are emitted with the probability of ps(o). • Each transition t is marked with probability p(t), likelihood that the transition is taken.
Learning and detection (cont...) So, probability of ‘ab’ Start 0.3 0.7 a|p(a) = 0.5 b|p(b) = 0.5 a|p(a) = 1 0.2 0.4 1.0 0.4 c|p(c) = 1 b|p(b) = 1 1.0 1.0 Terminal P(w) = (1.0*0.3*0.5*0.2*0.5*0.4)+ (1.0*0.7*1.0*1.0*1.0*1.0)
Learning and detection (cont...) By adding the probabilities calculated for each input training element
Learning and detection (cont...) • Aim to maximize the product. • Conflict between simple models that tend to over-generalize and models that perfectly fit the data but are too complex. • Simple model- high probability, but likelihood of producing the training data is extremely low. So, product is low • Complex model- low probability, but likelihood of producing the training data is high. Still product is low. • Model starts building up and generating input data then the states starts building up using Viterbi algorithm.
Learning and detection (cont...) • Detection • The problem is that even a legitimate input that has been regularly seen during the training phase may receive a very small probability values • The probability values of all possible input words sum to 1 • Model return value 1 if valid output otherwise 0 when the value cannot be derived from the given grammar
Token finder • Whether the values of the attributes are from a limited set of possible alternatives (enumeration) • When malicious user try to usually pass the illegal values to the application, the attack can b detected.
Learning and detection • Learning • Enumeration: when different occurrences of parameter values is bound by some threshold t. • Random: when the no of different argument instances grows proportionally • Calculate statistical correlation
Learning and detection (cont...) • Detection • If any unexpected happens in case of enumeration, then it returns 0, otherwise 1 and in case of randomness it always return 1. < 0, enumeration > 0, random
Attribute presence of absence • Client-side programs, scripts or HTML forms pre-process the data and transform in into a suitable request. • Hand crafted attacks focus on exploiting a vulnerability in the code that processes a certain parameter value and little attention is paid on the order.
Learning and detection • Learning • Model of acceptable subsets • Recording each distinct subset Sq={ai,...ak} of attributes that is seen during the training phase. • Detection • The algorithm performs for each query a lookup of the current attribute set. • If encountered then 1 otherwise 0
Attribute order • Legitimate invocations of server-side programs often contain the same parameters in the same order. • Hand craft attacks don’t • To test whether the given order is consistent with the model deduced during the learning phase.
Learning and detection • Learning: • A set of attribute pairs O such that: • Each vertex vi in directed G is associated with the corresponding attribute ai. • For every query ordered list is processed. • Att. Pair (as,at) in this list, with s ~= t and 1<=s,t<=i, a directed edge is inserted into the graph from vs to vt.
Learning and detection (cont...) • Graph G contains all ordered constraints imposed by queries in the training data. • Order is determined by • Directed edge • Path • Detection • Given a query with attributes a1,a2,...,ai and a set of order constraints O, all the parameter pairs (aj,ak) with j~=k and 1 <= j,k <= I • Violation then return 0 otherwise 1
Evaluation • Data sets • Apache web server • GOOGLE • University of California, Santa Barbara • Technical university, Vienna • 1000 for training • All rest for testing
Significant entries for Nimda and Code red worm but removed. • Include only queries that results from the invocation of existing programs into the training and detection process. • Also for Google, thresholds were changed to account for higher variability in traffic
Conclusions • Anomaly-based intrusion detection system on web. • Takes advantage of application-specific correlation between server-side programs and parameters used in their invocation. • Parameter characteristics are learned from the input data. • Tested on Google, and two universities in US and Europe