1 / 22

ANOMALY DETECTION AND CHARACTERIZATION: LEARNING AND EXPERIANCE

ANOMALY DETECTION AND CHARACTERIZATION: LEARNING AND EXPERIANCE. YAN CHEN – MATT MODAFF – AARON BEACH. NETWORK TRAFFIC: WHAT DOES IT LOOK LIKE?. Where are the anomalies?. Overview. Anomaly Detection using Prediction Algorithm Holt-Winters Basic:

rue
Download Presentation

ANOMALY DETECTION AND CHARACTERIZATION: LEARNING AND EXPERIANCE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ANOMALY DETECTIONAND CHARACTERIZATION:LEARNING AND EXPERIANCE YAN CHEN – MATT MODAFF – AARON BEACH

  2. NETWORK TRAFFIC: WHAT DOES IT LOOK LIKE? Where are the anomalies?

  3. Overview Anomaly Detection using Prediction Algorithm Holt-Winters Basic: one dimensional detection (value prediction) Intermediate: multi-dimensional detection (vector prediction) Advanced: Characterization by correlating many multi-dimensional detections in parallel (2nd power vector prediction) Automatic characterization updates using maliciousness rating system

  4. Holt-Winters • Prediction algorithm • Exponential Smoothing • Sum of three components • Baseline (intercept) • Linear Trend (slope) • Seasonal Trend

  5. Holt-Winters continued • Constants alpha, beta, and gamma are predetermined (between 0 and 1) • Used 0.1 for all of them based on how much new values should be weighted against old values • Choose a seasonal size • Choose 1 minute since we only had 1 day • Or two hours for ICMP detection • Measuring within a threshold of deviation (delta)

  6. Detecting Aberrations / Alarms Set a window size and the number of aberrations considered alarming If there are more aberrations than the limit within the time window, then alarm We used 10-15/30 and 1/1 aberration/window size depending on the time step and the characteristic nature of the variable combination being detected

  7. Network Traffic Data Network traffic data has many variables We look at: Source and Destination IP addresses Source and Destination port numbers Protocol type Bytes and packets in a traffic flow Unique flow defined by source and destination port/IP tuples Protocol flags (TCP flags) Over time these many variables form a dynamic vector of data

  8. What is Anomaly Detection? We predict “normal” vector space using the Holt-Winters Forecasting Method We define vector space beyond normal as “aberrant” If the network traffic vector travels into aberrant space it is considered an “anomaly” • Now lets look at a few examples of basic direct anomaly detection and alarm triggering

  9. Detection using port dimension • A clear port scan on port 21 (FTP) at 12:46-47 AM from one address outside the network

  10. Detection using Protocol: ICMP • ICMP spikes every 2 hours • Without seasonal values all of these may show up as malicious anomalies

  11. Port activity: Malicious or normal • While port 17300 is used by nothing except for the Kuang2 Trojan/Virus, port 10000 is used for NDMP server backup service and Dumaru.Y?

  12. Detection using three variables:Flow bytes/packets and TCP flag • SYN attack early in the morning?? • What about the little spikes are they syn attacks?

  13. Three variables is enough for detection but doesn’t tell us what the anomaly is, we need other variables for characterization Huge scan to port 4128, why just 4128 is it really just a DoS? All computers that that respond to the SYNs on 4128 receive requests on port 137 (NET BIOS a protocol which is used to support file and printer sharing) This data matches a method used to find exploitable systems for many viruses. This is called a NBTSTAT -A type scan, which is used to locate systems with open shares (port 4128) and then they try to execute the infection via a connection to the file share (port 137) An attack on port 137, however no large scan on port 137 only a scan on a relatively harmless port 4128 this indirect scanning could have avoided detection Possible suspects are: Nimda ,Bugbear, Msinit, Opaserv, Qaz Explaining detected anomalies

  14. More Advanced Detection For the previous detection example we could define a vector of malicious conditions The vector space would have had 10 variables 2 sets of (dst IP, dst port, bytes, packets, protocol) Each variable can have a condition or range that is malicious This combination of 2 sets of 5 ranges or conditions for different variables forms a unique malicious vector space! • Now lets look at an example of using three detection vectors in parallel to distinguish normal space from malicious space

  15. Comparing 3 Detections in parallel • Network seems to update SMTP servers every few hours, this should be taken into account, • Spikes in DNS traffic may be credited to seasonal updates • Due to some older SMTP server’s authentication protocol, port 113 traffic will mirror SMTP traffic on a smaller scale, if they are taken together both spike at the same relative ratio, this can help distinguish normal vector space for malicious and help define the conditions of malicious characterization vectors

  16. A degree of maliciousness at any one moment can be calculated by finding the percentage closer that the current traffic is to malicious conditions than the Normal/predicted values are. So any current network traffic vector (point) has a degree of maliciousness for each unique vector of malicious conditions Detecting a Malicious Vector • 0% = completely normal/predicted • >100% = completely within malicious space

  17. Anomalous but not Malicious What if data falls outside of threshold of deviation (out of normal space) but does not fall into malicious space. Undefined space Any action taken in these cases is ignorant and not based on previous knowledge so nothing should be done, a warning alarm should go off and a careful analysis and report of this data should be stored so that it might be studies later If this anomaly leads into malicious space, the malicious space may need to be expanded to include this newly detected anomaly

  18. Anomalous but not Malicious: continued Each non-malicious anomalous event should be stored and given a manual malicious rating later This rating can then be incorporated into all related malicious variable conditions The Detection conditions would then be continually updated by new anomalous data simply by the administrator rating how malicious a specific event was to their network, and in which way it was malicious (DoS, virus, etc) making updating done very easy without relying on outer sources

  19. Future Work / Implementation 3+ levels of detection Basic: checking maliciousness rating of one variable Intermediate: checking maliciousness of vectors of variables Advanced: checking vectors of maliciousness ratings of multiple detection vectors in parallel This can continue to be scaled to whatever level of complexity is necessary Each detection vector need only be checked once every time step (seconds, minutes, etc…) depending on how well server can perform. Detection precision increases with smaller time steps only one time step of data and vectors need be stored in memory

  20. Future Work / Implementation Computations per time step is equal to the average computation for one vector multiplied by the number of detection vectors Memory requirement will be equal to traffic data for one time step plus the average vector size multiplied by the number of vectors Based on processor speed, memory space, and number of characterizations being detected an optimal time step could be computed Future work could involve testing the plausibility of this system in high speed, large traffic volume situation

More Related