290 likes | 306 Views
Unsupervised Intrusion Detection Using Clustering Approach. Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman. Outline. Introduct i on U s i ng C luster i ng for I ntrus i on D etect i on Methodology Overall Summary Conclusion References. Introduct i on.
E N D
Unsupervised Intrusion Detection Using ClusteringApproach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman
Outline • Introduction • UsingClustering for Intrusion Detection • Methodology • Overall Summary • Conclusion • References
Introduction • Intrusion detection is the process of monitoring the eventsoccurring in a computer system or network and analyzingthem for signs of possible incidents. • Incidents are violations or imminent threats of violation of: * computer securitypolicies, * acceptable use policies, * standard security practices.
Introduction • An intrusion detection system (IDS) is software that automates the intrusion detection process. • IDSs are primarilyfocuses on identifying possible incidents and detecting whenan attacker has successfully compromised a system by exploiting vulnerability in the system.
Signature-Based Detection • A signature is a pattern that corresponds to a known threat (e.g. a telnet attempt with a username of "root", which is a violation of an organization's security policy). • Signature-based detection is the process of comparing signatures against observed events to identify possible incidents. Advantage: Very effective at detecting known threats. Disadvantage: Ineffective at detecting previously unknown threats.
Anomaly-Based Detection • The process of comparing definitions of what activity is considered normal against observed events to identify significant deviations. • Capable of detecting previously unknown threats. • Uses host or network-specific profiles.
Detection by Stateful Protocol Analysis • The process of comparing predetermined profiles of generally accepted definitions of benign protocol activity for each protocolstate against observed events to identify deviations. • Relies on vendor-developed universal profiles that specify how particular protocols should and should not beused.
Using Clustering for Intrusion Detection • Methods other than Signature-Based Detectionuse data miningand machine learning algorithms to train on labeled networkdata. • For training data, there are two major paradigms: Misuse DetectionAnomaly Detection. Which one to use ???
Using Clustering for Intrusion Detection- Misuse Detection - • In misuse detection, machine learning algorithms areused with labeled data. • Byusing the extracted features from labeled networktraffic,network data is classified. • By using new data which includes new type of attacks,detection models are retrained.
Using Clustering for Intrusion Detection- Anomaly Detection - • In anomaly detection, models are built by training on normal data, deviations are searched over the normalmodel. • Generating purely normaldata is very difficult and costly in practice. • It is veryhard to guarantee that there are no attacks during the time the traffic is collected from the network.
Using Clustering for Intrusion Detection Misuse DetectionAnomaly Detection. • Use a mechanism to detect intrusions by using unlabeled data as a train model. • Find intrusions buried within thatdata.
Using Clustering for Intrusion Detection UnsupervisedAnomaly Detection Algorithm ASet of Unlabeled Data Detected Intrusion Clusters • Assumptions for unsupervised anomalydetection algorithm: • The intrusions are rare with respect to normal network traffic. • The intrusions are different from normal network traffic. • As a Result: • The intrusions will appear as outliersin the data. Connection Comparison with Detected Clusters Detected malicious attacks
Using Clustering for Intrusion Detection • The unsupervised anomaly detection algorithm clusters the unlabeled data instances together into clusters using a simple distance-based metric.
Using Clustering for Intrusion Detection Once data is clustered, all of the instances that appear in small clusters are labeled as anomaliesbecause; • The normal instances should form large clusters compared to the intrusions, • Malicious intrusions and normal instances are qualitatively different, so they do not fall into the same cluster. Intrusion cluster Normal cluster
Methodology • Description of the dataset • Metric & Normalization • Clustering Algorithm • Portnoy et.al. • Y-means Algorithm • Labeling Clusters • Intrusion Detection
Description of the dataset • KDD Cup 1999 Data • Main attack categories • DOS: Denial of Service, (e.g. synood) • R2L: Unauthorized access from a remote machine(e.g. guessing password) • U2R: Unauthorized access to local superuser (root) privileges (e.g. various buffer overflowattacks) • Probing: Surveillance and other probing (e.g. portscanning) • In total, 24 attack types in training data; 14 additional ones in test data...
Metric & Normalization • Euclidean Metric (for distance computation) • Feature Normalization (to eliminate the difference in the scale of features)
Clustering Algorithm (Portnoy et. al.) . . . d1 d2 Empty set of clusters d3 Xi • d1 is selected. • if d1 < W ( predefined threshold value ), • then Xi is assigned to that cluster. • - else, a new cluster is created, then Xi is assigned to it. Training set
Clustering Algorithm (Portnoy et. al.) • Advantage: No need to know the initial no. of clusters. • Disadvantage: Need to know W, which may label instances wrong in some cases. • However… 20/29
Clustering Algorithm (Y-means Algorithm) • 3 main parts: • assigning instances to k clusters • splitting clusters • merging clusters
Clustering Algorithm (Y-means Algorithm) 1. assigning instances to k clusters redefine cluster centroid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . k: no. of clusters n: no. of instances 1 < k < n Dataset 22/29
Clustering Algorithm (Y-means Algorithm) 2. splitting clusters t ( normal threshold) = 2.32 σ σ = standard deviation . di Xi ( instance ) . t • if di > t , Xi is an outlier. • New clusters are created firstly with the farthest outliers. Confident area
Clustering Algorithm (Y-means Algorithm) 3. merging clusters . Xi If Xi is in the confident area of two clusters, merge these clusters back.
Labeling Clusters • Our first assumption: # of normal instances >># of intrusions • Label instances in large clusters: normal • Label instances in small clusters: intrusion • Start labeling as normal, until 99% of data is labeled as normal, label rest of them as intrusion. Normal cluster Intrusion cluster
Intrusion Detection For test instance x, • Measure the distance to each cluster. • Select the nearest cluster C. • If C is normal cluster, label x as normal, • Otherwise label x as intrusion.
Overall Summary • IDS & IDS Technologies • Using Clustering for Intrusion Detection • Methodology • Description of the dataset • Metric & Normalization • Clustering Algorithm • Labeling Clusters • Intrusion Detection • Conclusion • Unsupervised Clustering is choosen. • KDD Cup 1999 Data • Y-means Algorithm is used for creating ID System.
References [1] KDDCup 1999 data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. [2] Y. Guan and A. A. Ghorbani. Y-means: A clusteringmethod for intrusion detection. In Proceedings ofCanadian Conference on Electrical and ComputerEngineering, pages 1083{1086, 2003. [3] L. Portnoy, E. Eskin, and S. Stolfo. Intrusion detectionwith unlabeled data using clustering. In Proceedings ofACM CSS Workshop on Data Mining Applied toSecurity (DMSA-2001), 2001. [4] K. Scarfone and P. Mell. Guide to intrusion detectionand prevention systems (idps), 2007.