1 / 24

One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses

One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses. Krysta Svore Joint Work with Katherine Heller, Angelos Keromytis, and Salvatore Stolfo. Anomaly Detection. We would like to be able to detect anomalous behavior by monitoring Windows registry queries

ziv
Download Presentation

One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses Krysta Svore Joint Work with Katherine Heller, Angelos Keromytis, and Salvatore Stolfo

  2. Anomaly Detection • We would like to be able to detect anomalous behavior by monitoring Windows registry queries • We studied a host-based intrusion detection system that applies a one class support vector machine algorithm within the Registry Anomaly Detection (RAD) system

  3. Key Results • We show that the Probabilistic Anomaly Detection (PAD) algorithm dramatically outperforms the OCSVM algorithm due to its hierarchical prior • To improve the performance of the OCSVM algorithm, we need knowledge on how to construct well-defined kernels

  4. Host-based Intrusion Detection Systems (IDS) • Most often attacked operating system is Microsoft Windows • Combative methods • Virus scanner and security patches • Unable to combat unknown attacks • Requires frequent updates • Host-based IDS • Detects intrusions by monitoring system accesses • Used with data mining techniques for improved detection ability

  5. Our Approach • Use the Registry Anomaly Detection (RAD) system to monitor Windows registry queries • Apply a One Class Support Vector Machine (OCSVM) algorithm within the RAD system • Compare our system with previous work using the Probabilistic Anomaly Detection (PAD) algorithm within the RAD system

  6. The Windows Registry • Stores configuration settings for programs, security information, user profiles, and other system information • Consists of registry keys and their associated values • Queried by programs for information by accessing a specific registry key • Process: EXPLORER.EXE • Query: OpenKey • Key: HKCR\CLSID\B41DB860-8EE4-11D2-9906-E49FADC173CA\shellex\MayChangeDefaultMenu • Response: SUCCESS • ResultValue: NOTFOUND

  7. The Windows Registry • Activity varies from several accesses per minute to several thousand accesses per minute • Almost all programs use the registry

  8. Registry Anomaly Detection (RAD) System • Three components: • Audit Sensor • Analyzes the registry access • Model Generator • Develops a model of normal behavior • Anomaly Detector • Classifies a new registry access as normal or attack according to the generated model

  9. Probabilistic Anomaly Detection (PAD) Algorithm • Developed by E. Eskin et al., 2002 • Relies on consistency checks over normal data and labels a record anomalous if it fails any one of these tests • First order consistency check • Verifies a value is consistent with observed values of that feature in the normal data set • 5 total checks • Second order consistency check • Determines the conditional probability of a feature value given another feature value • 20 total checks

  10. PAD • Probability of an observed element • P(X = i) = C(Ni + α)/(kα+N) • where N - total number of observations Ni - number of observations of symbol i α - “pseudo count” for each observed symbol k - number of observed symbols • Probability of an unobserved element • P(X = i) = (1-C)/(L-k) L - number of possible symbols • Scaling factor • C= N/(N+L-k)

  11. An Example • LAN • We expect the probability of seeing a previously unobserved IP address to be small • Peering Center • We expect the probability of seeing a previously unobserved IP address to be almost 1

  12. One Class Support Vector Machine (OCSVM) Algorithm • Maps input data into a high dimensional feature space • Iteratively finds the maximal margin in the hyperplane which best separates the training data from the origin • Solves optimization problem to find rule f with maximal margin • f(x)=‹w,x›+b • If f(x)<0 , label x as anomalous

  13. OCSVM

  14. OCSVM: Kernels • Equivalent to solving the dual quadratic programming problem • minα (1/2) ∑i,j αiαjK(xi,xj) s.t. 0≤αi≤1/(νl) , ∑i αi = 1 • where αi - Lagrange multiplier v - parameter to control trade-off between distance of hyperplane from the origin and number of points in training dataset l - number of points in training dataset • Kernel function projects input vectors into a feature space allowing for nonlinear decision boundaries • Feature map: Φ: X → RN • Kernel Function: K(xi,xj) = ‹Φ(xi), Φ(xj)›

  15. Efficiency Comparison • Complexity of PAD • Time: O(v R ) • Space: O(vR ) • where v – number of unique record values R – number of record components • Complexity of OCSVM • Time: O(dL ) • Space: O(d(L+T)) • where d – number of dimensions L – number of records in training dataset T – number of records in test dataset 2 2 2 3

  16. Experimental Data • Collected training data on Windows NT 4.0 • Used approximately 500,000 attack-free records to train system • Tested system using approximately 300,000 records with 2,000 labeled as attack

  17. Experiments • Used three common kernels • Linear: K(x,y) = (x·y) • Polynomial: K(x,y) = (x·y+1)d , where d is the degree of the polynomial • Gaussian: K(x,y) = e -║x-y║2/(2σ2) , where σ2 is the variance • Used two types of feature vectors • Binary • One dimension for every unique entry for each of the five given record values • Frequency-based • Each feature corresponds to the number of occurrences of the corresponding record component in the training set

  18. Dimensionality of Binary and Frequency-based Feature Vectors • Binary feature vector • Dimension = Number of possible entries for feature value 1 * number of possible entries for feature value 2 * … * number of possible entries for feature value 5 • Frequency-based feature vector • Dimension = 5

  19. Evaluation • Used two statistics to compare the system’s accuracy • Detection rate • Percentage of attack records that have been correctly identified • False positive rate • Percentage of normal records that have been mislabeled as anomalous • Plotted results with a Receiver Operator Characteristic (ROC) curve • Plots percentage of false positives versus percentage of true positives

  20. Results

  21. Results

  22. Results • PAD algorithm outperforms OCSVM algorithm due to the use of a hierarchical prior to estimate the probabilities • Knows the likelihood of encountering a previously unencountered feature value • Ability of OCSVM to detect anomalies relies on the choice of the kernel • Need a novel, well-defined kernel which accounts for highly discriminative information

  23. Conclusions • Registry activity provides a good platform for anomaly detection • PAD is more reliable than OCSVM • Currently no known way to learn a “most optimal” kernel • By analyzing different algorithms, we can identify what needs to be captured in the kernel definition

  24. Future Work • Improve kernel definition to incorporate more discriminative information • Update the model as new data is labeled • Requires an efficient way of remodeling the data over time • Test our system on the UNIX platform

More Related