Intrusion Detection

Intrusion Detection

Outline • Intrusion detection and computer security • Current intrusion detection approaches • Data Mining Approaches for Intrusion Detection • Summary

Intrusion Detection and Computer Security • Computer security goals: • Confidentiality, integrity, and availability • Intrusion is a set of actions aimed to compromise these security goals • Intrusion prevention (authentication, encryption, etc.) alone is not sufficient • Intrusion detection is needed

Intrusion Examples • Intrusions: Any set of actions that threaten the integrity, availability, or confidentiality of a network resource • Examples • Denial of service (DoS): attempts to starve a host of resources needed to function correctly • Scan: reconnaissance on the network or a particular host • Worms and viruses: replicating on other hosts • Compromises: obtain privileged access to a host by known vulnerabilities

Intrusion Detection • Intrusion detection: The process of monitoring and analyzing the events occurring in a computer and/or network system in order to detect signs of security problems • Primary assumption: User and program activities can be monitored and modeled • Steps • Monitoring and analyzing traffic • Identifying abnormal activities • Assessing severity and raising alarm

Monitoring and Analyzing Traffic • TCPdump and Windump • Provide insight into the traffic activity on a network • ftp://ftp.ee.lbl.gov/tcpdump.tar.Z • http://netgroupserv.polito.it/windump • Ethereal • GUI to interpret all layers of the packet

Goals of Intrusion Detection System (IDS) • Detect wide variety of intrusions • Previously known and unknown attacks • Suggests need to learn/adapt to new attacks or changes in behavior • Detect intrusions in timely fashion • May need to be real-time, especially when system responds to intrusion • Problem: analyzing commands may impact response time of system • May suffice to report intrusion occurred a few minutes or hours ago

Goals of Intrusion Detect. System (IDS) (2) • Present analysis in simple, easy-to-understand format • Be accurate • Minimize false positives, false negatives • False positive: An event, incorrectly identified by the IDS as being an intrusion when none has occurred • False negative: An event that the IDS fails to identify as an intrusion when one has in fact occurred • Minimize time spent verifying attacks, looking for them

IDS Architecture • Sensors (agent) • to collect data and forward info to the analyzer • network packets • log files • system call traces • Analyzers (detector) • To receive input from one or more sensors or from other analyzers • To determine if an intrusion has occurred • User interface • To enable a user to view output from the system or control the behavior of the system

IDS Architecture

Signature-Based Intrusion Detection • Human analysts investigate suspicious traffic • Extract signatures • Features of known intrusions • Use pre-defined signatures to discover malicious packets • Examples • LaBrea Tarpit by Tom Liston • Snort and Snort rules Marty Roesch

Snort by Marty Roesch • An open source free network intrusion detection system • Signature-based, use a combination of rules and preprocessors • On many platforms, including UNIX and Windows • www.snort.org • Preprocessors • IP defragmentation, port-scan detection, web traffic normalization, TCP stream reassembly, … • Can analyze streams, not only a single packet at a time

Problems in Signature-Based Intrusion Detection Systems • Many false positives: prone to generating alerts when there is no problem in fact • Signatures are not specific enough • A packet is not examined in context with those that precede it or those that follow • Cannot detect unknown intrusions • Rely on signatures extracted by human experts

Misuse vs. Anomaly Detection • Misuse detection: use patterns of well-known attacks to identify intrusions • Classification based on known intrusions • E.g., three consecutive login failures: password guessing. • Anomaly detection: use deviation from normal usage patterns to identify intrusions • Any significant deviations from the expected behavior are reported as possible attacks

Misuse vs. Anomaly Detection

Host-based vs. Network-based • According to data sources • Host-based detection: the data is collected from an individual host • Directly monitor the host data files and OS processes • Can determine exactly which host resources are the targets of a particular attack • Network-based detection: the data is traffic across the network • A set of traffic sensors within the network • Can easily harder against attacks and hide from the attackers

OUTLINE • Intrusion detection and computer security • Current intrusion detection approaches • Data Mining Approaches for Intrusion Detection • Summary

Current Intrusion Detection Approaches—Misuse Detection • Misuse detection : • Record the specific patterns of intrusions • Monitor current audit trails (event sequences) and pattern matching • Report the matched events as intrusions • Representation models: expert rules, Colored Petri Net, and state transition diagrams, etc.

Misuse Detection Example • Expert systems: use a set of rules to describe attacks • IDES, ComputerWatch, NIDX, P-BEST, ISOA • Signature analysis: capture features of attacks in audit trail • Haystack, NetRanger, RealSecure, MuSig • State-transition analysis: use state-transition diagrams • STAT,USTAT and NetSTAT • Other approaches • Colored petri nets, e.g., IDIOT • Case-based reasoning, e.g., AUTOGUARD

Current Intrusion Detection Approaches—Anomaly Detection • Anomaly detection: • Establishing the normal behavior profiles • Observing and comparing current activities with the (normal) profiles • Reporting significant deviations as intrusions • Statistical measures as behavior profiles: ordinal and categorical (binary and linear)

Anomaly DetectionExample • Statistical methods: multivariate, temporal analysis • IDES, NIDES, EMERALD • Expert systems • ComputerWatch, Wisdom & Sense

Problems of Current Intrusion Detection Approaches • Main problems: manual and ad-hoc • Misuse detection: • Known intrusion patterns have to be hand-coded • Unable to detect any new intrusions (that have no matched patterns recorded in the system) • Anomaly detection: • Selecting the right set of system features to be measured is ad hoc and based on experience • Unable to capture sequential interrelation between events

OUTLINE • Intrusion detection and computer security • Current intrusion detection approaches • Data Mining Approaches for Intrusion Detection • Summary

Why Can Data Mining Help? • Data mining: applying specific algorithms to extract patterns from data • Normal and intrusive activities leave evidence in audit data • From the data-centric point view, intrusion detection is a data analysis process

Why Can Data Mining Help? • Successful applications in related domains, e.g., fraud detection, fault/alarm management • Learn from traffic data • Supervised learning: learn precise models from past intrusions • Unsupervised learning: identify suspicious activities • Maintain or update models on dynamic data

Frequent Patterns • Patterns that occur frequently in a database • Mining Frequent patterns – finding regularities • Process of Mining Frequent patterns for intrusion detection • Phase I: mine a repository of normal frequent itemsets for attack-free data • Phase II: find frequent itemsets in the last n connections and compare the patterns to the normal profile

Frequent Pattern Mining in MINDS • MINDS: a IDS using data mining techniques • University of Minnesota • Summarizing attacks using association rules • {Src IP=206.163.27.95, Dest Port=139, Bytes[150, 200)}  {ATTACK}

Patterns About Alerts • Ning et al. CCS’02 • Find correlated alerts – the frequent patterns of alerts • Attack scenarios – the logical connections between alerts • A hyper-alerts correlation graph approach • Use the correlation of intrusion alerts to identify high level attacks

Associate rules • Used for link analysis • E.g.: • If the number of failed login attempts (num_failed_login_attempts) and the network service on the destination (service) are features, an example of rule is: • num_failed_login_attempts = 6, service = FTP => attack = DoS [1, 0.28 ]

Sequential Pattern Analysis • Models sequence patterns • (Temporal) order is important in many situations • Time-series databases and sequence databases • Frequent patterns  (frequent) sequential patterns • Sequential patterns for intrusion detection • Capture the signatures for attacks in a series of packets

Classification: A Two-Step Process • Model construction: describe a set of predetermined classes • Training dataset: tuples for model construction • Each tuple/sample belongs to a predefined class • Classification rules, decision trees, or math formulae • Model application: classify unseen objects • Estimate accuracy of the model using an independent test set • Acceptable accuracy  apply the model to classify data tuples with unknown class labels

Classification Methods • Basic Algorithm ID3 • Neural networks • Bayesian classification • Naïve Bayesian classification • Bayesian belief network • Support vector machines

Classification for Intrusion Detection • Misuse detection • Classification based on known intrusions • Example: Sinclair et al. “An application of machine learning to network intrusion detection” • Use decision trees and ID3 on host session data • Use genetic algorithms to generate rules • If <pattern> then <alert>

HIDE • “A hierarchical network intrusion detection system using statistical processing and neural network classification” by Zheng et al. • Five major components • Probes collect traffic data • Event preprocessor preprocesses traffic data and feeds the statistical model • Statistical processor maintains a model for normal activities and generates vectors for new events • Neural network classifies the vectors of new events • Post processor generates reports

Intrusion Detection by NN and SVM • S. Mukkamala et al., IEEE IJCNN May 2002 • Discover useful patterns or features that describe user behavior on a system • Use the set of relevant features to build classifiers • SVMs have great potential to be used in place of NNs due to its scalability and faster training and running time • NNs are especially suited for multi-category classification

Clustering • Group data into clusters • What is a good clustering • High intra-class similarity and low inter-class similarity • Depending on the similarity measure • The ability to discover some or all of the hidden patterns • Clustering Approaches • K-means • Hierarchical Clustering • Density-based methods • Grid-based methods • Model-based

Clustering for Intrusion Detection • Anomaly detection • Any significant deviations from the expected behavior are reported as possible attacks • Build clusters as models for normal activities • “A scalable clustering for intrusion signature recognition” by Ye and Li • Use description of clusters as signatures of intrusions

Alert Correlation • F. Cuppens and A. Miege, in IEEE S&P’02 • Use clustering and merging functions to recognize alerts that correspond to the same occurrence of an attack • Create a new alert that merge data contained in these various alerts • Generate global and synthetic alerts to reduce the number of alerts further

Mining Data Streams • Continuous arrival data in multiple, rapid, time-varying, possibly unpredictable and unbounded streams • Many applications • Financial applications, network monitoring, security, telecommunications data management, web application, manufacturing, sensor networks, etc.

Mining Data Streams for Intrusion Detection • Maintaining profiles of normal activities • The profiles of normal activities may drift • Identifying novel attacks • Identifying clusters and outliers in traffic data streams

A Systematic Framework—J.Stolfo et al. • Build good models: • select appropriate features of audit data to build intrusion detection models • Build better models: • architect a hierarchical detector system that combines multiple detection models • Build updated models: • dynamically update and deploy new detection system as needed

A Systematic Framework • Support for the feature selection and model construction: • Apply data mining algorithms to find consistent inter- and intra- audit record (event) patterns • Use the features and time windows in the discovered patterns to build detection models • A support environment to semi-automate this process

A Systematic Framework • Combining multiple detection models: • Each (base) detector model monitors one aspect of the system • They can employ different techniques and be independent of each other • The learned (meta) detector combines evidence from a number of base detectors • An intelligent agent-based architecture: • learning agents: continuously compute (learn) the detection models • detection agents: use the (updated) models to detect intrusions

A Systematic Framework

Building Classifiers for Intrusion Detection—J.Stolfo et al. • Experiments in constructing classification models for anomaly detection • Two experiments: • sendmail system call data • network tcpdump data • Use meta classifier to combine multiple classification models

Classification Models on sendmail • The data: sequence of system calls made by sendmail. • Classification models (rules): describe the “normal” patterns of the system call sequences. • The rule set is the normal profile of sendmail • Detection: calculate the deviation from the profile • large number/high scores of “violations” to the rules in a new trace suggests an exploit

Classification Models on sendmail • The sendmail data: • Each trace has two columns: the process ids and the system call numbers • Normal traces: sendmail and sendmail daemon • Abnormal traces: sunsendmailcap, syslog-remote, syslog-remote, decode, sm5x and sm56a attacks

Classification Models on sendmail • Lessons learned: • Normal behavior can be established and used to detect anomalous usage • Need to collect near “complete” normal data in order to build the “normal” model • But how do we know when to stop collecting? • Need tools to guide the audit data gathering process

Intrusion Detection