260 likes | 395 Views
On the Utility of Anonymized Flow Traces for Anomaly Detection. Author : Martin BURKHART∗, Daniela BRAUCKHOFF†, Martin MAY‡ Journal: ITC SS 2008 Advisor: Yuh-Jye Lee Reporter: Yi-Hsiang Yang Email: M9915016@mail.ntust.edu.tw. Contributions.
E N D
On the Utility of Anonymized Flow Traces for Anomaly Detection Author : Martin BURKHART∗, Daniela BRAUCKHOFF†, Martin MAY‡ Journal: ITC SS 2008 Advisor: Yuh-Jye Lee Reporter: Yi-Hsiang Yang Email: M9915016@mail.ntust.edu.tw
Contributions • Introduce a generic methodology for evaluating the impact of anonymization • Quantify the utility of anonymized data for a three-week long data • Present an overall estimate for the impact of anonymization
Outline • Introduction • Methodology • Measurement Results • Conclusion
Introduction • Traffic data is hindered • Releasing data introduces a threat to users’ privacy • Anomaly detection • Have been evaluated with anonymized data • Focus on the anonymization of IP addresses • Blackmarking • Truncation • Random Permutation • (Partial) Prefix-Preserving permutation
Utility of Anonymized Data for Anomaly Detection • Granularity design space has two dimensions • Subset size • The size of the network (subnet) that is to be analyzed • Resolution • The address granularity which the traffic is analyzed • Assume the whole design space is available
• Cell 1 [00,00]: Select all traffic and set the resolution to the minimum. • Cell 5 [00,16]: Select all traffic and set the resolution to /16 networks.
IP address anonymization techniques • Blackmarking (BM) • Blindly replaces all IP addresses in a trace with the same value • Truncation (TR{t}) • Replaces the t least significant bits of an IP address with 0 • Random permutation (RP) • Translates IP addresses using a random permutation • Partial prefix-preserving permutation (PPP{p}) • Permutes the host and network part of IP addresses independently
IP address anonymization techniques • Prefix-preserving permutation (PP) • Permutes IP addresses so that two addresses sharing a common real prefix
Methodology • Data captured from the four border routers of the Swiss Academic and Research Network • IP address range contains about 2.4 million IP addresses • Traffic volume varies between 60 and 140 million NetFlow records per hour • Analyzed a three-week period (from August 19th to September 10th 2007) 713 Terabytes • Un-sampled and Non-anonymized flow data
Methodology-Ground Truth • Visual inspection of metric timeseries • Computed the timeseries for five well-known metrics • byte, packet, flow counts, unique IP address counts, and the Shannon entropy¶ of flows per IP address • At 15-minute intervals • 2016 data points per metric
Methodology-Ground Truth • Assigning ground truth to each interval • If the analyzed metric timeseries exposed an unusual event, classified that interval as anomalous • Identifying the anomaly type • Assigned the anomalous events to different types • Volume • A sharp increase or decrease in the volume based metrics • (D)DoS • Drop in the destination IP address entropy
Methodology-Ground Truth • Scan • Increase in the destination IP address count and entropy • Network Fluctuation • Cause an increase or decrease in the IP address counts at the highest resolution • Unknown
Methodology-Anomaly Detection • Use Kalman filter • Efficient recursive filter
Methodology • 60 studied metrics are different variants of • Three volume-based metrics (vbm) • Byte, packet and flow counts • Two feature-based metrics (fbm) • Unique IP address count • Shannon entropy of flows per IP address • Total (3[vbm] + (2[fbm] × 2[src/dst] × 3[res])) × 2[in/out] × 2[udp/tcp] = 60 detection metrics
Measurement Results • Volume Anomalies • Exposed by volume-based metrics • For TCP blackmarking and random permutation perform slightly better
Measurement Results • Scanning and denial of service anomalies • Feature-based metrics
Measurement Results • Network fluctuations • Feature-based metrics at lower resolutions
Measurement Results • Blackmarking • Decreases the utility for detecting anomalies in UDP and TCP traffic except volume anomalies • Random permutation • Very bad with the detection of anomalies in UDP traffic • Preserving the utility for TCP traffic
Measurement Results • Truncation of 8 or 16 bit • Decreases the utility for detecting anomalies in TCP traffic by roughly10 percent • Performing well for UDP traffic • (Partial) prefix-preserving permutation • No significant negative impact for detecting anomalies in UDP and TCP traffic
Implicit Traffic Aggregation • Analyzing the count of additional flows for 170 webservers • Truncating a single bit • Around 10% of the webservers have a resulting traffic increase of 100% or more and 50% no additional traffic • Unaffected servers : 20% for 2 bits, 5% for 4 bits, and even 0% for 8 bits • 25% for 2 bits, 55% for 4 bits and 89% for 8 bits at least a doubling of traffic
Conclusion • Anonymization techniques impact statistical anomaly detection • Introduced the detection granularity design space • Analyzed the utility of anonymized traces