1 / 25

Analysis of Anomalous Payload-based Worm Detection and Signature Generation by

Analysis of Anomalous Payload-based Worm Detection and Signature Generation by Ke Wang, Gabriela Cretu, Salvatore J.Stolfo Columbia University. Topics. Main Goals Payload based anamoly detection PAYL Overview PAYL sensor system – phases Experiments and Results Related Work

durin
Download Presentation

Analysis of Anomalous Payload-based Worm Detection and Signature Generation by

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Anomalous Payload-based Worm Detection and Signature Generation by Ke Wang, Gabriela Cretu, Salvatore J.Stolfo Columbia University

  2. Topics • Main Goals • Payload based anamoly detection • PAYL Overview • PAYL sensor system – phases • Experiments and Results • Related Work • References • Summary Sireesha Dasaraju 2 PAYL 10/29/06

  3. Main Goals • Accurately detect ZERO-DAY worms. • Automatically generate signatures that can be shared with other vulnerable systems. Sireesha Dasaraju 3 PAYL 10/29/06

  4. Payload-based anamoly detectionPAYL - Overview • Detect worms by analyzing the packet payload. • A model of “normal data” is maintained. • A new zero-day attack will have content data never seen by the victim host. • A newly infected host will begin sending outbound traffic that is very similar to the content it received. • Correlate ingress/egress anomalous payload alerts to detect the worm propagation. Sireesha Dasaraju 4 PAYL 10/29/06

  5. PAYL – Overview – continued • Automatic signature generation. • Signatures generated based on correlated ingress/egresss content anamolies. • The overlapping content of the similar outgoing and incoming anomalous payloads determine the candidate worm signature. Sireesha Dasaraju 5 PAYL 10/29/06

  6. PAYL – Overview – continued • Signature sharing • A central security system to be used by the coolaborating sites. • Any signature generated by any of the sites will be shared with the central system and will be exchanged with all the sites. • Each site can then update their onsite filtering rules. Sireesha Dasaraju 6 PAYL 10/29/06

  7. PAYL sensor - Phases • The PAYL sensor operates in the following phases • Modeling Normal Data • Calibration • Detection • Signature generation Sireesha Dasaraju 7 PAYL 10/29/06

  8. Modeling the normal content • Assumption – packet content available for modeling. • The technique used: • n-gram : A sequence of 'n' adjacent byte values in the packet payload. ( n = 1 for first implementation) • The frequency of each n-gram is computed. • This frequency represents the statistical centroid or model of the content flow. • The normalized average frequency and the variance of each gram are computed. • The byte value distribution is graphed. (Graph with the ASCII character on the x-axis and character frequency on the y-axis) Sireesha Dasaraju 8 PAYL 10/29/06

  9. Graphs Sireesha Dasaraju 9 PAYL 10/29/06

  10. Modeling the normal content -continued • A rank ordered distribution is then graphed. (Graph with the frequency count on x-axis and average character frequency on the y-axis) • A Z-string is determined from the rank ordered distribution. • A Z-string is a string of distinct bytes whose frequency in the data is ordered from frequent to least, ignoring those byte values that do not appear in the data. • The Z-String representation provides a privacy-preserving summary of the payload that may be exchanged between domains without revealing the true content. • Z-String mainly used for message exchange and cross domain correlation of alerts. Sireesha Dasaraju 10 PAYL 10/29/06

  11. Calibrating the sensor • Calibration • A sample of test data is measured against the centroids and an initial value for a threshold setting is chosen. • Subsequent round of testing of new data updates the threshold settings to clibrate the sensor to the operating environment. • This way for each centroid, there is a distinct threshold value. Sireesha Dasaraju 11 PAYL 10/29/06

  12. Detection • Detection • To compare the similarity between the actual data and the trained models, Mahalanobis distance technique is used. • In this technique, the mean frequency of the n-gram of the actual payload packet, is weighed against the centroid, to derive the difference in terms of a distance. • The distance is then compared to a threshold value. • If the distance greater than the threshold, an alert is issued. Sireesha Dasaraju 12 PAYL 10/29/06

  13. Signature Generation • Technique for generating signatures : • When some incoming anomalous traffic to port i is detected, an ingress alert is generated and places the packet content on a buffer list of “suspects”. • Any outbound traffic from the port i is then compared to the buffer. • The comparision is done on the packet contents and a similarity score is computed. • If the score is higher than the threshold, this is a possible worm propagation and is blocked. Sireesha Dasaraju 13 PAYL 10/29/06

  14. Signature Generation - contd • Packet comparsion Techniques : • String Equality • Egress payload is exactly the same as the ingress suspect packet contents. • Very strict, few false positives. • But if the worm changes even a single bit or its packet fragmentation between the input and output ports, it cannot be detected. • Similarity score is either 0 or 1. (1 -- equality) • Longest common substring (LCS) • The longer the common substring the greater the confidence. • Avoids the above fragmentation problem. Sireesha Dasaraju 14 PAYL 10/29/06

  15. Signature Generation - contd • Computation overhead. • String lengths L1 and L2; Common substring length C, the similarity score is 2 * C/(L1+L2) • Longest common subsequence • The longest subsequence may not be contiguous • Can detect the polymorphic worms, but too many false positives. • String lengths L1 and L2; Common substring length C, the similarity score is 2 * C/(L1+L2) • Each of the above techniques result in some similarity score and will be compared against the threshold. • The common substring found will serve as the worm. Sireesha Dasaraju 15 PAYL 10/29/06

  16. Experiments and Results • Data Used • Three distinct real world datasets. • Worm Set - CodeRed, CodeRedII, WebDav and a worm that exploits the IIS windows media service. • Data preparation • Each dataset is split into two distinct portions, one for training and the other for testing. • For each test dataset, a clean set of packets, free of any known worms, is created. • Into this clean test data, a set of worm data is inserted at the random places. Sireesha Dasaraju 16 PAYL 10/29/06

  17. Results Sireesha Dasaraju 17 PAYL 10/29/06

  18. Results • PAYL detected all the worms at a very low false positive rate. • For 0.1% false positive rate, • First Data Set resulted in 5.8 alerts per hour. • Second Data set resulted in 6 alerts per hour. • Third Data set resulted in 8 alerts per hour. • Tested the detection rate of W32.Blaster worm on TCP 135 port, using real RPC traffic inside Columbia's CS department. • The worm packets were detected with zero false positives. Sireesha Dasaraju 18 PAYL 10/29/06

  19. Related Work • Rule-based network intrusion detection (eg. Snort) • Depend on the signatures. • Signatures can be generated only after the worm has been launched successfully. • The time between the worm launch and its wide-spread infestation is very short and is not enough time to generate the signatures for filtering and to patch the vulnerable systems. • Will miss the brand new attacks. • Sensors based on scan and probe activity • Detects based on network packet header analysis or monitoring the connection attempts and traffic volume. • Will miss the slow-propagating worms. • Will miss the attacks carrying malicious content in an otherwise normal connection. Sireesha Dasaraju 19 PAYL 10/29/06

  20. Shield • Detection based on vulnerability signatures instead of the string-oriented content signatures. • Vulnerability signatures specify what an exploit would look like in the datagram of packets • A host based shield agent would drop any connections that match this specification. • Time tag to specify, test and deploy shields. Sireesha Dasaraju 20 PAYL 10/29/06

  21. Related Work - continued • Honeycomb • Host-based intrusion detection system. • Automatically generate the signatures. • Uses honeypot to capture malicious traffic targetting dark space. • Applies the longest common substring algorithm on the packet content of a number of connections going to the same services. • The computed substring is a candidate worm signature. Sireesha Dasaraju 21 PAYL 10/29/06

  22. Related work - continued • Autograph • Classifies traffic into two categories, a flow pool with suspicious scanning activity and a non-suspicious flow pool • TCP flow reassembly is applied to the suspicious flow pool and apply Rabin fingerprints to partition the payload into small blocks. • The most frequent substrings from these blocks form a worm signature. • Blacklisting is used in order to decrease the number of false positives. • Suspicious IPs and destination ports are exchanged between the multiple sensors at the collaborating sites. Sireesha Dasaraju 22 PAYL 10/29/06

  23. Related Work - continued • Earlybird • Similar to Autograph system. • The substrings computed by Rabin fingerprints are Are maintained in a frequency count table, incrementing a count field each time the substring is encountered. • The information about source and destination Ips are recorded. • The table is sorted by the order of frequency counts. • To keep the false positives down, IP address dispersion is applied by counting the distinct source and destination IPs for each suspicious content. Sireesha Dasaraju 23 PAYL 10/29/06

  24. Summary • PAYL can detect worms without signatures, so can detect the Zero-day worms. • Correlating the content of the ingress and egress alerts will reduce the false positives. • PAYL can generate detailed content signatures. • PAYL combined with centralized security system can help all the collaborating sites stay up-to-date on the latest worm signatures. • PAYL handles the zero-day worms better than the other detection systems mentioned in the related work. Sireesha Dasaraju 24 PAYL 10/29/06

  25. References • K Wang, Gabriela Cretu, Salvatore J.Stoflo, Anomalous payload-based network intrusion detection , in Proceedings of Recent Advance in Intrusion Detection (RAID), Sept. 2004. • C.Kreibich and J.Crowcroft. Honeycomb-Creating Intrusion Detection Signatures Using Honeypots, In Proceedings of the 2nd Workshop on Hot Topics in Networks (HotNets-II), November 2003 • M.Locasto, J.Parekh, S.Stolfo, A.Keromytis, T.Malkin and V.Misra. Collaborative Distributed Intrusion Detection, Columbia University Tech Report CUCS-012-04,2004 • H.J.Wang, C.Guo, D.R.Simon, and A.Zugenmaier. Shield: Vulnerability-Driven Network Filter for Preventing Known Vulnerability Exploits. In Proceedings of the ACM SIGCOMM Conference, Aug.2004 • K-A Kim and B.Karp. Autograph: toward Automated Distributed Worm distribution, In Proceedings of the USENIX Security Symposium, August 2004. • S.Singh, C.Estan, G.Varghese and S.Savage. Automated Worm Fingerprinting, Sixth Symposium on Operating Systems Design and Implementation (OSDI), 2004 Sireesha Dasaraju 25 PAYL 10/29/06

More Related