1 / 38

Data Mining Based Intrusion Detection System

Data Mining Based Intrusion Detection System. Krishna C Surendra Babu. Papers: A Data Mining Framework for Building Intrusion Detection Models (Wenke Lee, Salvotore J. Stolfo) - Research supported in parts by grants from DARPA

lynley
Download Presentation

Data Mining Based Intrusion Detection System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Based Intrusion Detection System Krishna C Surendra Babu

  2. Papers: • A Data Mining Framework for Building Intrusion Detection Models (Wenke Lee, Salvotore J. Stolfo) - Research supported in parts by grants from DARPA • Creation and Deployment of Data Mining-Based Intrusion Detection Systems in Oracle Database 10g

  3. Intrusion Detection System: • Intrusion Detection Techniques: • Anomaly Detection • Misuse Detection • DOS • Probing • Unauthorized access to local super user (U2R) • Unauthorized access from a remote machine (R2L)

  4. Requirements: • Reliable • Extensible • Easy to manage • Low maintenance cost

  5. A Data Mining Framework for Building Intrusion Detection Models • Data Mining Data mining refers to extracting or mining knowledge from large amounts of data. • Data Warehouse A data warehouse is a repository of information collected from multiple sources

  6. A Data Mining Framework for Building Intrusion Detection Models • Why Data Mining? • The dataset is large. • Constructing IDS manually is expensive and slow. • Update is frequent since new intrusion occurs frequently.

  7. Challenges for Data Mining in building IDS • Develop techniques to automate the processing of knowledge-intensive feature selection. • Customize the general algorithm to incorporate domain knowledge so only relevant patterns are reported • Compute detection models that are accurate and efficient in run-time

  8. Mining the data • Dataset Types: • Network based dataset • Host based dataset • Build IDS by mining in the records. • When an attack is detected, give alarms to the administration system.

  9. Framework of Building IDS • Preprocessing. Summarize the raw data. • Association Rule Mining. • Find sequence patterns (Frequent Episodes) based on the association rules. • Construct new features based on the sequence patterns. • Construct Classifiers on different set of features

  10. Preprocessing • To summarize raw data to high level event, • e.g network connection, time, duration, service, host, destination • Bro and NFR Packet filtering Techniques can be used.

  11. Classification • Classify each audit record into one of a discrete set of possible categories, normal or a particular kind of intrusion.

  12. Association rule mining Searches for interesting relationships among attributes in a given data set i.e. to derieve multi feature(attribute) correlations from a database table.

  13. Sequence Pattern Mining • Frequent Episodes. • X,Y->Z, [c,s,w] • With the existence of itemset X and Y, Z will occur in time w.

  14. Feature Construction • Feature extraction is the processes of determining what evidence that can be taken from raw audit data is most useful for analysis. • Construct new feature according to the frequent episode. • Some features will show close relationship to each other. Then combine the features. • Some frequent episode may indicate interesting new features.

  15. Build Model (classifier) • Build different classifiers for different attacks.

  16. Experiments • The DARPA data • 4G compressed tcpdump data of 7 weeks of network traffics. • Contains 4 main categories of attacks • DOS: denial of service, e.g., ping-of-death, syn flood • R2L: unauthorized access from a remote machine, • e.g., guessing password • U2R: unauthorized access to local super user privileges by a local unprivileged user, e.g., buffer overflow • PROBING: e.g., port-scan, ping-sweep

  17. Results • Training on the 7 weeks of labeled data, and testing on the 2 weeks unlabeled data. • The test data contains 14 attack types which do not exist in training data. • Comparing 4 methods: • Columbia: the IDS developed according to the framework introduced above • Group 1-3: three systems developed by knowledge engineering approaches.

  18. Results Detection rate on New and Old attacks. • Old attacks: type of attacks occur in both training and testing data. • New attacks: type of attacks occur in testing data only.

  19. Creation and Deployment of Data Mining Based Intrusion Detection Systems in Oracle Database 10G DAID A database centric architecture that leverages data mining with in the Oracle RDBMS to address the challenges. • Scheduling capabilities • Alert infrastructure • Data analysis tools • Security • Scalability • reliability

  20. Requirements for a production quality IDS • Centralized view of the data • Data transformation capabilities • Analytic and data mining methods • Flexible detector deployment, including scheduling that enables periodic model creation and distribution • Real-time detection and alert infrastructure • Reporting capabilities • Distributed processing • High system availability • Scalability with system load

  21. • Sensors • • Extraction, transformation and load (ETL) • • Centralized data warehousing • • Automated model generation • • Automated model distribution • • Real-time and offline detection • • Report and analysis • • Automated alerts

  22. Sensors • Collects audit information • Network traffic data • System logs on individual hosts • System calls made by processes

  23. ETL • Used for pre processing audit streams and feature extraction • Use SQL and user defined functions to extract key pieces of information. Ex: computes windowing analytic function to compute the number of http connections to a given host

  24. Model Generation Popular Techniques for misuse and anomaly detection: • Association Rules • Clustering • Support Vector Machines • Supervised learning methods for Classification • Decision Trees

  25. Model build functionality: • Dbms_data_mining PL/SQL package • to train linear SVM anomaly and misuse detection models. • Test dataset • Probing • Denial of service • Unauthorized access to a local superuser(u2r) • Unauthorized access from a remote machine(r2l) (37 subclasses of attacks under the 4 generic categories)

  26. Misuse Detection Problem • Anomaly Detection Problem • Accuracy of the system 92.1%

  27. Periodic Model Updates as new data is accumulated • Model rebuild when the performance falls below a predefined level

  28. Model Distribution Real Application Clusters (RAC)

  29. Detection Real time / offline Audit data are classified as attack or not by misuse detection SVM model.

  30. Functional index on the probability of a case being an attack or not • returns all cases in audit_data with probability greater than 0.5 of being an attack

  31. The query returns all cases where either model1 or model2 indicate an attack with probability higher than 0.4: • In this case, when the anomaly_model classifies a case as an attack with probability greater than 0.5, the misuse_model will attempt to identify the type of attack: • Combination of multiple models

  32. Reports and Analysis

  33. Conclusion • Data mining techniques are very useful in Intrusion Detection • Still need manually interpretation/advice in some processing steps • More efficient on known attacks than on unknown attacks only if the training data contains all normal behavior

More Related