250 likes | 390 Views
Analysis of Anomalous Payload-based Worm Detection and Signature Generation by Ke Wang, Gabriela Cretu, Salvatore J.Stolfo Columbia University. Topics. Main Goals Payload based anamoly detection PAYL Overview PAYL sensor system – phases Experiments and Results Related Work
E N D
Analysis of Anomalous Payload-based Worm Detection and Signature Generation by Ke Wang, Gabriela Cretu, Salvatore J.Stolfo Columbia University
Topics • Main Goals • Payload based anamoly detection • PAYL Overview • PAYL sensor system – phases • Experiments and Results • Related Work • References • Summary Sireesha Dasaraju 2 PAYL 10/29/06
Main Goals • Accurately detect ZERO-DAY worms. • Automatically generate signatures that can be shared with other vulnerable systems. Sireesha Dasaraju 3 PAYL 10/29/06
Payload-based anamoly detectionPAYL - Overview • Detect worms by analyzing the packet payload. • A model of “normal data” is maintained. • A new zero-day attack will have content data never seen by the victim host. • A newly infected host will begin sending outbound traffic that is very similar to the content it received. • Correlate ingress/egress anomalous payload alerts to detect the worm propagation. Sireesha Dasaraju 4 PAYL 10/29/06
PAYL – Overview – continued • Automatic signature generation. • Signatures generated based on correlated ingress/egresss content anamolies. • The overlapping content of the similar outgoing and incoming anomalous payloads determine the candidate worm signature. Sireesha Dasaraju 5 PAYL 10/29/06
PAYL – Overview – continued • Signature sharing • A central security system to be used by the coolaborating sites. • Any signature generated by any of the sites will be shared with the central system and will be exchanged with all the sites. • Each site can then update their onsite filtering rules. Sireesha Dasaraju 6 PAYL 10/29/06
PAYL sensor - Phases • The PAYL sensor operates in the following phases • Modeling Normal Data • Calibration • Detection • Signature generation Sireesha Dasaraju 7 PAYL 10/29/06
Modeling the normal content • Assumption – packet content available for modeling. • The technique used: • n-gram : A sequence of 'n' adjacent byte values in the packet payload. ( n = 1 for first implementation) • The frequency of each n-gram is computed. • This frequency represents the statistical centroid or model of the content flow. • The normalized average frequency and the variance of each gram are computed. • The byte value distribution is graphed. (Graph with the ASCII character on the x-axis and character frequency on the y-axis) Sireesha Dasaraju 8 PAYL 10/29/06
Graphs Sireesha Dasaraju 9 PAYL 10/29/06
Modeling the normal content -continued • A rank ordered distribution is then graphed. (Graph with the frequency count on x-axis and average character frequency on the y-axis) • A Z-string is determined from the rank ordered distribution. • A Z-string is a string of distinct bytes whose frequency in the data is ordered from frequent to least, ignoring those byte values that do not appear in the data. • The Z-String representation provides a privacy-preserving summary of the payload that may be exchanged between domains without revealing the true content. • Z-String mainly used for message exchange and cross domain correlation of alerts. Sireesha Dasaraju 10 PAYL 10/29/06
Calibrating the sensor • Calibration • A sample of test data is measured against the centroids and an initial value for a threshold setting is chosen. • Subsequent round of testing of new data updates the threshold settings to clibrate the sensor to the operating environment. • This way for each centroid, there is a distinct threshold value. Sireesha Dasaraju 11 PAYL 10/29/06
Detection • Detection • To compare the similarity between the actual data and the trained models, Mahalanobis distance technique is used. • In this technique, the mean frequency of the n-gram of the actual payload packet, is weighed against the centroid, to derive the difference in terms of a distance. • The distance is then compared to a threshold value. • If the distance greater than the threshold, an alert is issued. Sireesha Dasaraju 12 PAYL 10/29/06
Signature Generation • Technique for generating signatures : • When some incoming anomalous traffic to port i is detected, an ingress alert is generated and places the packet content on a buffer list of “suspects”. • Any outbound traffic from the port i is then compared to the buffer. • The comparision is done on the packet contents and a similarity score is computed. • If the score is higher than the threshold, this is a possible worm propagation and is blocked. Sireesha Dasaraju 13 PAYL 10/29/06
Signature Generation - contd • Packet comparsion Techniques : • String Equality • Egress payload is exactly the same as the ingress suspect packet contents. • Very strict, few false positives. • But if the worm changes even a single bit or its packet fragmentation between the input and output ports, it cannot be detected. • Similarity score is either 0 or 1. (1 -- equality) • Longest common substring (LCS) • The longer the common substring the greater the confidence. • Avoids the above fragmentation problem. Sireesha Dasaraju 14 PAYL 10/29/06
Signature Generation - contd • Computation overhead. • String lengths L1 and L2; Common substring length C, the similarity score is 2 * C/(L1+L2) • Longest common subsequence • The longest subsequence may not be contiguous • Can detect the polymorphic worms, but too many false positives. • String lengths L1 and L2; Common substring length C, the similarity score is 2 * C/(L1+L2) • Each of the above techniques result in some similarity score and will be compared against the threshold. • The common substring found will serve as the worm. Sireesha Dasaraju 15 PAYL 10/29/06
Experiments and Results • Data Used • Three distinct real world datasets. • Worm Set - CodeRed, CodeRedII, WebDav and a worm that exploits the IIS windows media service. • Data preparation • Each dataset is split into two distinct portions, one for training and the other for testing. • For each test dataset, a clean set of packets, free of any known worms, is created. • Into this clean test data, a set of worm data is inserted at the random places. Sireesha Dasaraju 16 PAYL 10/29/06
Results Sireesha Dasaraju 17 PAYL 10/29/06
Results • PAYL detected all the worms at a very low false positive rate. • For 0.1% false positive rate, • First Data Set resulted in 5.8 alerts per hour. • Second Data set resulted in 6 alerts per hour. • Third Data set resulted in 8 alerts per hour. • Tested the detection rate of W32.Blaster worm on TCP 135 port, using real RPC traffic inside Columbia's CS department. • The worm packets were detected with zero false positives. Sireesha Dasaraju 18 PAYL 10/29/06
Related Work • Rule-based network intrusion detection (eg. Snort) • Depend on the signatures. • Signatures can be generated only after the worm has been launched successfully. • The time between the worm launch and its wide-spread infestation is very short and is not enough time to generate the signatures for filtering and to patch the vulnerable systems. • Will miss the brand new attacks. • Sensors based on scan and probe activity • Detects based on network packet header analysis or monitoring the connection attempts and traffic volume. • Will miss the slow-propagating worms. • Will miss the attacks carrying malicious content in an otherwise normal connection. Sireesha Dasaraju 19 PAYL 10/29/06
Shield • Detection based on vulnerability signatures instead of the string-oriented content signatures. • Vulnerability signatures specify what an exploit would look like in the datagram of packets • A host based shield agent would drop any connections that match this specification. • Time tag to specify, test and deploy shields. Sireesha Dasaraju 20 PAYL 10/29/06
Related Work - continued • Honeycomb • Host-based intrusion detection system. • Automatically generate the signatures. • Uses honeypot to capture malicious traffic targetting dark space. • Applies the longest common substring algorithm on the packet content of a number of connections going to the same services. • The computed substring is a candidate worm signature. Sireesha Dasaraju 21 PAYL 10/29/06
Related work - continued • Autograph • Classifies traffic into two categories, a flow pool with suspicious scanning activity and a non-suspicious flow pool • TCP flow reassembly is applied to the suspicious flow pool and apply Rabin fingerprints to partition the payload into small blocks. • The most frequent substrings from these blocks form a worm signature. • Blacklisting is used in order to decrease the number of false positives. • Suspicious IPs and destination ports are exchanged between the multiple sensors at the collaborating sites. Sireesha Dasaraju 22 PAYL 10/29/06
Related Work - continued • Earlybird • Similar to Autograph system. • The substrings computed by Rabin fingerprints are Are maintained in a frequency count table, incrementing a count field each time the substring is encountered. • The information about source and destination Ips are recorded. • The table is sorted by the order of frequency counts. • To keep the false positives down, IP address dispersion is applied by counting the distinct source and destination IPs for each suspicious content. Sireesha Dasaraju 23 PAYL 10/29/06
Summary • PAYL can detect worms without signatures, so can detect the Zero-day worms. • Correlating the content of the ingress and egress alerts will reduce the false positives. • PAYL can generate detailed content signatures. • PAYL combined with centralized security system can help all the collaborating sites stay up-to-date on the latest worm signatures. • PAYL handles the zero-day worms better than the other detection systems mentioned in the related work. Sireesha Dasaraju 24 PAYL 10/29/06
References • K Wang, Gabriela Cretu, Salvatore J.Stoflo, Anomalous payload-based network intrusion detection , in Proceedings of Recent Advance in Intrusion Detection (RAID), Sept. 2004. • C.Kreibich and J.Crowcroft. Honeycomb-Creating Intrusion Detection Signatures Using Honeypots, In Proceedings of the 2nd Workshop on Hot Topics in Networks (HotNets-II), November 2003 • M.Locasto, J.Parekh, S.Stolfo, A.Keromytis, T.Malkin and V.Misra. Collaborative Distributed Intrusion Detection, Columbia University Tech Report CUCS-012-04,2004 • H.J.Wang, C.Guo, D.R.Simon, and A.Zugenmaier. Shield: Vulnerability-Driven Network Filter for Preventing Known Vulnerability Exploits. In Proceedings of the ACM SIGCOMM Conference, Aug.2004 • K-A Kim and B.Karp. Autograph: toward Automated Distributed Worm distribution, In Proceedings of the USENIX Security Symposium, August 2004. • S.Singh, C.Estan, G.Varghese and S.Savage. Automated Worm Fingerprinting, Sixth Symposium on Operating Systems Design and Implementation (OSDI), 2004 Sireesha Dasaraju 25 PAYL 10/29/06