COD ( Cluster Onset Detection ) : Online Temporal Clustering for Outbreak Detection

COD(Cluster Onset Detection): Online Temporal Clustering for Outbreak Detection Tomas Singliar (U. Pitt.), Denver H. Dash (Intel Research, U. Pitt.) AAAI’07 (American Association for AI National Conference)

Reference • When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions • Denver H. Dash, etc. • AAAI’06 • COD: Online Temporal Clustering for Outbreak Detection • Tomas Singliar, Denver H. Dash • AAAI’07 Speaker: Li-Ming Chen

Challenge: Slowly Propagating Attacks • Worm attacks – 2 opposite extremes: • 1. Much faster to allow rapid spread !! • 2. Much slower to prevent detection !! • Most of the existing detection techniques rely on the fact that worms are reproducing quickly • Slow propagation attacks • Difficult to detect – under the veil of normal network traffic • Still dangerous – can propagate exponentially Speaker: Li-Ming Chen

Other Challenges • Global Infection: • IDSes (individual entities) can only see a partial picture of the larger network wide behavior of the worm •  require collaboration detection (AAAI’06) • Homogeneous assumption: • Detection techniques treat the population as a monolithic entity •  also note that, hosts or detectors (collaborators) are not always homogeneous (AAAI’07) Speaker: Li-Ming Chen

LD LD GD LD Architecture Model • Global Detector: • aggregates messages • from LDs • Performs probabilistic • inference to determine • whether an infection • being present or not • Concept of Collaboration Detection: • LDs (designed to be weak but general classifiers) may raise false alarm at a relatively high frequency • GD can combine LDs’ weak information to infer the existence of an attack • Where to place the GDs in the network ? • Centralized/Distributed placement “Weak” host-based Local Detector Speaker: Li-Ming Chen

Paper 1 • When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions • Denver H. Dash, etc. • AAAI’06 • COD: Online Temporal Clustering for Outbreak Detection • Tomas Singliar, Denver H. Dash • AAAI’07 Speaker: Li-Ming Chen

Architecture Speaker: Li-Ming Chen

A binary classifier Normal or abnormal Detect by heuristic: Counts # of new outgoing connections to unique Dst. addresses and ports Observation  see pic. In slow worm detection, set threshold to 4 (CPI) The space of LD: Inward-looking Outward-looking About the “Weak” LDs within 37 hosts LD threshold Pre-define as 4 (CPI) Propagation rate of previous worms (Blaster, Slapper, CR2, Slammer, Witty) within 5 weeks, observe 37 hosts, will have (37*5*7*24*60*60/50)= 2,237,760 obs., then compute distribution… Speaker: Li-Ming Chen

4 possible GD models • Traditional collaborative counting schemes: • PosCount • Tests whether Σ(positive counts) > threshold or not • CuSum • Detect changes in the trend of a statistic • DBN-based schemes: • CP-DBN • A simplified causal model • Models an attack as occurring uniformly across the population or not at all • E-DBN • Models the dynamics of a system that is being swept by and epidemic outbreak Speaker: Li-Ming Chen

How GDs work? • Input of a GD: Lt, a binary subset of LD observations at time t • GD output: St, some measure of how likely a global anomaly is to be occurring at time t • The system of GDs makes up an ensemble !! • There are many ensemble techniques could be used • This paper only use the max function to determine whether a global alarm should be raised or not Speaker: Li-Ming Chen

How GDs work? (cont’d) • Traditional collaborative counting schemes: • PosCount • Tests whether Σ(positive counts) > threshold or not • CuSum • Detect changes in the trend of a statistic • DBN-based schemes: • CP-DBN • A simplified causal model • Models an attack as occurring uniformly across the population or not at all • E-DBN • Models the dynamics of a system that is being swept by and epidemic outbreak Speaker: Li-Ming Chen

CP-DBN Ai = {T, F}, attack has taken place at time i or not. Oli = {on, off}, LD l is on or off at time i. observation time T (hidden states) LD0 (observable states) total M LDs TP rate FP rate Speaker: Li-Ming Chen

E-DBN (hidden states) • To model the exponential • growing trend: • T denotes observation time • At = {0, 1}, the anomaly state • at time t • Nt = {0, …, N}, # of infected hosts • S is the spreading rate • Ot = {0, …, N}, # of observed LDs that fired (observable states) state transition between unobserved state variables Speaker: Li-Ming Chen

E-DBN (cont’d) • Assuming a worm attack, the growth rate in the number of infected hosts ΔNt+1 is modeled by a binomial: • The likelihood of ot detectors firing when nt hosts are infected is modeled by a binomial: • where susceptible chance of a hit Speaker: Li-Ming Chen

How DBN-based GDs works? Anomaly Am at the most likely time m based on some observations from t-T to t given DBN model then, do ensemble decision making (using max function) Speaker: Li-Ming Chen

Performance Evaluation • Parameters: • Spread rate S = • 1 conn. per 20 sec. • Address density = • 1/1000 (ratio of • vulnerable hosts) • LD threshold = • 4 conn. per 50 sec. • LD comm. with GD • per 10 sec. PosCount only raise a detection after the entire network is infected Desired FP rate better Speaker: Li-Ming Chen

Paper 2 • When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions • Denver H. Dash, etc. • AAAI’06 • COD: Online Temporal Clustering for Outbreak Detection • Tomas Singliar, Denver H. Dash • AAAI’07 Speaker: Li-Ming Chen

New Approach:COD (Cluster Onset Detection) • What to cluster? • Partition the population (e.g., hosts) into subgroups, • then COD tries to detect susceptible subgroups • Why clustering? • Traditional outbreak detection methods treat the population as a monolithic entity • Real populations are heterogeneous • Different subpopulations are susceptible to different degrees • Clustering can boost the signal-to-noise ratio for detection Speaker: Li-Ming Chen

COD Model – detection architecture • “Weak” host-based LDs • Periodically send their status to a GD • Use the same feature and rule: • Fire whenever the number of outgoing connections exceeds 4 in a 50 second interval • Centralized GD • Collects messages and determines whether the positive local detections corroborate each other • Periodically outputs a signal that represents its belief of infection being present Speaker: Li-Ming Chen

COD Model – data Time j • Dataset X • Row: Xi corresponds to a single LD i • Column: X*j corresponds to the value of a feature function in a discrete time interval j • Use temporal stratified sampling • Each time interval has a fixed position • Ex. 12am-1am, 1am-2am etc. • To account for obvious diurnal behavior in the system LD i Sum of alarms (might be FP) Speaker: Li-Ming Chen

COD Model – clustering Assuming different classes generate their detections randomly at different rates and can take a fairly large range of values, Xij can be assumed as Poisson distributed Naïve Bayes clustering model NB features are positive local detection counts Xij arriving from a machine i during a time interval j F() = sum(alarms) for each machine In a time interval, a LD may fire several times Speaker: Li-Ming Chen

COD Model – clustering (cont’d) • Some details: • How to determine the number m of clusters? • By using a greedy heuristic to find optimal value • Not mentioned about λkjx • At the end of each interval, • The feature value will be updated and the model is re-learned • How to cluster? • The posterior on the cluster variable M defines the assignment of local detectors into clusters: Speaker: Li-Ming Chen

COD Model, example host ID Time (hr) (burn-in) • A typical example of how the hosts in the dataset get assigned into clusters. • 5 clusters (colors) & 1 day burn-in period • Clusters are rather stable and cluster membership changes rarely • At the end, most hosts have been infected Speaker: Li-Ming Chen

COD Model, demonstrate daily pattern host ID Local detection count in a time interval Time (hr) • Clustering  group hosts according to the daily pattern of their local detection activity • 5 groups (two of which are composed of a single host) • reflects the applications and habits of the host and can provide better estimation for deteciton Speaker: Li-Ming Chen

4-step Cluster Interpretation • Detect “highly active” cluster (presumably infected) • Compute “average detection rate” for each host • Compute “average (local) detection rate” for each cluster and identify the most active cluster • Performing a one-sided, unbalanced-design t-test with null hypothesis • Host detection rates in the most active cluster and remainder of the population are the same ! • Comparing the outcome of the t-test to a historical histogram of values to determine if the system is in an anomalous state num. of positive detections at host i Speaker: Li-Ming Chen

Experimental Evaluation • Some details in configuration: • Normal traffic trace: 5 weeks traces from 37 hosts • Inject worm traffic for testing • LDs send a message every 10 seconds • Focus on metrics: FAR, TTD (FI) • False Alarm Rate, Time To Detect, Fraction of Infection • Aim to control FAR to 1 per week • Compare the results with E-DBN (the baseline) • Traffic trace will be recycled to simulate more hosts • Observe the effects of number of cluster, network size and interval length Speaker: Li-Ming Chen

COD vs. E-DBN AMOC: plot the expected time to detection (since the outbreak began) as a function of the false alarm rate COD outperforms E-DBN (FI reduce) COD/adaptive performs better but more costly to run! Speaker: Li-Ming Chen

Scaling with Network Size • The performance actually improves with scaling of the system • Larger number of datapoints gives the model more information and refines the clustering Speaker: Li-Ming Chen

Effect of Interval Length • Interval length affects the performance in two (opposite) ways: • More freq. re-clustering eliminates part of the “mid-interval” blind spot • Longer interval yield features with less variance. • The results show that: • Better Perf. is achieved with • longer intervals. (better • smoothing over any random • fluctuation) • Lower frequency of the • detection Algo. Invocation • gives fewer false alarms • And for slow worm, delayed • detection is okay! standard deviation (in a day) Speaker: Li-Ming Chen

Conclusion • Use distribution scheme and collaborative inference to support slow worm detection • Dividing the population into subgroups according to susceptibility increase the SNR ratio and can lead to detection performance boost • Subgroups are more homogeneous in their usage and application patterns • Not require prior knowledge of the population Speaker: Li-Ming Chen

My Comments • Other features on a host can reveal diurnal patterns? • Host-based LD can acquire rich information about the attack, but building a host-based distributed detection system is much harder • Clustering is a way to deal with stealthy attacks Speaker: Li-Ming Chen

COD ( Cluster Onset Detection ) : Online Temporal Clustering for Outbreak Detection