230 likes | 246 Views
A Signal Analysis of Network Traffic Anomalies. Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Summer, 2002. Motivation. Traffic anomalies are a fact of life in computer networks Outages, attacks, etc…
E N D
A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Summer, 2002
Motivation • Traffic anomalies are a fact of life in computer networks • Outages, attacks, etc… • Anomaly detection and identification is challenging • Operators typically monitor by eye using SNMP or IP flows • Obviously, this does not scale! • Simple thresholding is ineffective • Some anomalies are obvious, other are not • Characteristics of anomalous behavior in IP traffic are not well understood • Do same types of anomalies have same characteristics? • Can characteristics be effectively used in detection systems?
Introduction • Objective: Improve our understanding network traffic anomalies • Approach: Wavelet analysis of data set that includes IP flow data, SNMP data and a catalog of observed anomalies • Method: Integrated Measurement Analysis Platform for Internet Traffic (IMAPIT) • Results: We demonstrate how anomalies can be exposed using wavelets and develop new method for exposing short-lived events
Related Work • Network traffic characterization • Eg. Caceres89, Leland93, Paxson97, Zhang01 • Focus on typical behavior • Abry98 use wavelets to analyze LRD traffic • Fault and anomaly detection techniques • Eg. Feather93, Brutlag00 • Focus on thresholds and time series models • Eg. Paxson99 • Rule based tool for intrusion detection • Eg. Moore01 • Backscatter technique can be used to identify DoS attacks • Eg. Huang01 • Wavelet-based approach to detecting network performance problems
Simple Network Management Protocol • SNMP is the standard protocol for monitoring/managing networked systems • SNMP defines a set of MIB (management information base) data exported from routers • RFC2863 • We sample High Capacity Interface using MRTG (Multi-Router Traffic Grapher) at 5 minute intervals • Archive byte and packet traffic in each direction • 64-bit counters on each of 15 WAN links • SNMP count precision is yet to be determined…
IP Flows • An IP Flow is defined as a unidirectional series of packets between source/dest IP/port pair over a period of time • Exported by Lightweight Flow Accounting Protocol (LFAP) enabled routers (Cisco’s NetFlow, Juniper cflowd flow export) • We use FlowScan [Plonka00] to collect and post-process IP flow data collected at 5 minute intervals • Combines flow collection engine, database, visulaization tool • Provides a near real-time visualization of network traffic • Breaks down traffic into well known service or application {SRC_IP/Port,DST_IP/Port,Pkts,Bytes,Start/End Time,TCP Flags,IP Prot …}
Our Approach to Data Gathering • Consider anomalies in IP flow and SNMP data • Collected at UW border router (Juniper M10) • Archive of ~6 months worth of data (packets, bytes, flows) • Includes catalog of anomalies (after-the-fact analysis) • Group observed anomalies into four categories • Network anomalies (41) • Steep drop offs in service followed by quick return to normal behavior • Flash crowd anomalies (4) • Steep increase in service followed by slow return to normal behavior • Attack anomalies (46) • Steep increase in flows in one direction followed by quick return to normal behavior • Measurement anomalies (18) • Short-lived anomalies which are not network anomalies or attacks
Our Approach to Analysis • Wavelets provide a means for describing time series data that considers both frequency and time • Particularly useful for characterizing data with sharp spikes and discontinuities • More robust than Fourier analysis which only shows what frequencies exist in a signal • Tricky to determine which wavelets provide best resolution of signals in data • We use tools developed at UW which together make up IMAPIT • FlowScan software • The IDR Framenet software
Our Wavelet System • After evaluating different candidates we selected a wavelet system called Pseudo Splines(4,1) Type 2. • A framelet system developed by Daubechies et al. ‘00 • Very good frequency localization properties • Three output signals are extracted from input • Low Frequency (L): synthesis of all wavelet coefficients from level 9 and up • Mid Frequency (M): synthesis of wavelet coefficients 6, 7, 8 • High Frequency (H): synthesis of wavelet coefficients 1 to 5 • Thresholding (set to zero all coefficients whose absolute value is below a threshold) is used on these coefficients
Anomaly Detection via Deviation Score • We develop an automated means for identifying short-lived anomalies based on variability in H and M signals • Compute local variability (using specified window) of H and M parts of signal • Combine local variability of H and M signals (using a weighted sum) and normalize by total variability to get deviation score V • Apply threshold to V then measure peaks • Our analysis shows that V peaks over 2.0 indicate short-lived anomalies with high confidence • We threshold at V = 1.25 and set window size to ~3 hours
Deviation Score Evaluation • How effective is deviation score at detecting anomalies? • Compare versus set of 39 anomalies • Set is unlikely to be complete so we don’t treat false-positives • Compare versus Holt-Winters Forecasting • Sophisticated time series technique • Requires some configuration • Holt-Winters reported many more positives and sometimes oscillated between values
Conclusion and Next Steps • We present an evaluation of signal characteristics of network traffic anomalies • Using IP flow and SNMP data collected at UW border router • 106 anomalies have been grouped into four categories • IMAPIT developed to apply wavelet analysis to data • Deviation score developed to automate anomaly detection • Results • Characteristics of anomalies exposed using different filters and data • Deviation score is effective detection method • Future • Development of anomaly classification methods • Application of results in (distributed) detection systems