330 likes | 474 Views
Modeling Network Traffic as Images. Seong Soo Kim and A. L. Narasimha Reddy Computer Engineering Department of Electrical Engineering Texas A&M University {skim, reddy}@ee.tamu.edu. Contents. Introduction and Motivation Network Traffic as Images Visual Representation
E N D
Modeling Network Traffic as Images Seong Soo Kim and A. L. Narasimha Reddy Computer Engineering Department of Electrical Engineering Texas A&M University {skim, reddy}@ee.tamu.edu
Contents • Introduction and Motivation • Network Traffic as Images • Visual Representation • Requirements for Representing Network Traffic as Images • Sampling Rates • Visual modeling Network Traffic as Images • normal traffic, semi-random attacks, random attacks • Image Processing for Network Traffic • Validity of intra-frame DCT • Inter-frame differential coding • Conclusion Texas A & M University ICC 2005
Contents • Introduction and Motivation • Network Traffic as Images • Visual Representation • Requirements for Representing Network Traffic as Images • Sampling Rates • Visual modeling Network Traffic as Images • normal traffic, semi-random attacks, random attacks • Image Processing for Network Traffic • Validity of intra-frame DCT • Inter-frame differential coding • Conclusion Texas A & M University ICC 2005
Attack/ Anomaly • Bandwidth attacks/anomalies, Flash crowds • DoS – Denial of Service : • UDP flooding, TCP SYN flooding, ICMP flooding • Typical Types: • Single attacker (DoS) • Multiple Attackers (DDoS) • Multiple Victims (Worm) • Aggregate Packet header data as signals • Signal/image based anomaly/attack detectors Texas A & M University ICC 2005
Motivation (1) • Previous studies looked at individual flow’s behavior • Partial state • RED-PD • These become ineffective with DDoS Aggregate • Link speeds are increasing • currently at G b/s, soon to be at 10~100 G b/s • Need simple, effective mechanisms to implement at line speeds. • Look at aggregate information of traffic • Use sampling to reduce the cost of processing • Process aggregate data to detect anomalies. Texas A & M University ICC 2005
Motivation (2) • Signature (rule)-based approaches are tailored to known attacks • Look for packets with port number #1434 (SQL Slammer) • Become ineffective when traffic patterns or attacks change • New threats are constantly emerging • Do not want to rely on attack specific information • Most current monitoring/policing tools are done off-line • Flowscan, FlowAnalyzer, AutoFocus • Quick identification of network anomalies is necessary to contain threat • Can we design generic (and generalized) mechanisms for attack detection and containment? • Measurement (network)-based real-time detection Texas A & M University ICC 2005
Contents • Introduction and Motivation • Network Traffic as Images • Visual Representation • Requirements for Representing Network Traffic as Images • Sampling Rates • Visual modeling Network Traffic as Images • normal traffic, semi-random attacks, random attacks • Image Processing for Network Traffic • Validity of intra-frame DCT • Inter-frame differential coding • Conclusion Texas A & M University ICC 2005
Packet Header • Carry a rich set of information • Data : Packet counts, Byte counts, Number of Flows • Domain : source/destination Address, source/destination Port numbers, Protocol numbers • Image/Video can represent each data in each domain • Image processing/Video analysis decipher the patterns of traffic • single multiple (Worm) : horizontal lines • multiple single (DDoS) : vertical lines Texas A & M University ICC 2005
Domain size Reduction(1) • Header fields may have large domain spaces • IPv4 addresses 232, IPv6 addresses 264 • Need to minimize storage and processing complexity for real-time processing • Employ “domain folding” • For example: A data structure of a 2 dimensional array count[i][j] • To record the packet count for the address j in ith field of the IP address • Effects • 32-bit address into four 8-bit fields • Smaller memory 232 (4G) 4*256 (1K) • Running time O(n) to O(lgn) • Form of hashing • Advantages • It is possible to reverse the hashing to identify the target IP address restrictively Texas A & M University ICC 2005
Data structure for reducing domain size (2) • Simple example • IP 1 = 165. 91. 212. 255, No. of Flows = 3 IP 2 = 64. 58. 179. 230, No. of Flows = 2 IP 3 = 216. 239. 51. 100, No. of Flows = 1 IP 4 = 211. 40. 179. 102, No. of Flows = 10 IP 5 = 203. 255. 98. 2, No. of Flows = 2 0 64 128 192 255 3 3 3 3 Texas A & M University ICC 2005
0 64 128 192 255 2 3 2 10 1 10 2 3 1 2 1 2 12 3 2 1 10 2 3 Data structure for reducing domain size (2) • Simple example • IP 1 = 165. 91. 212. 255, No. of Flows = 3 IP 2 = 64. 58. 179. 230, No. of Flows = 2 IP 3 = 216. 239. 51. 100, No. of Flows = 1 IP 4 = 211. 40. 179. 102, No. of Flows = 10 IP 5 = 203. 255. 98. 2, No. of Flows = 2 Texas A & M University ICC 2005
Visual Representation Texas A & M University ICC 2005
Contents • Introduction and Motivation • Network Traffic as Images • Visual Representation • Requirements for Representing Network Traffic as Images • Sampling Rates • Visual modeling Network Traffic as Images • normal traffic, semi-random attacks, random attacks • Image Processing for Network Traffic • Validity of intra-frame DCT • Inter-frame differential coding • Conclusion Texas A & M University ICC 2005
Image based analysis • Generating useful signals based on traffic image • Treat the traffic data as images • Apply image processing based analysis • Enables applying image/video processing for the analysis of network traffic. • Some attacks become clearly visible to the human eye. • Video compression techniques lead to data reduction • Scene change analysis leads to anomaly detection • Motion prediction leads to attack prediction • Pattern recognition leads to anomaly identification Texas A & M University ICC 2005
Impacts of Design Factors for presenting Network traffic as Images (1) • Sampling Rates • For discriminating current traffic situation based on stationary property, we should select a sampling frequency for deriving the most stable images • The periodicity of traffic Texas A & M University ICC 2005
Impacts of Design Factors for presenting Network traffic as Images (2) • Sampling Rates • The traffic is stationary in normal times and the selection of sampling period is not crucial. • The traffic changes dynamically with time in attack times and the sampling period is a crucial factor. • 30 ~ 120 sec. sampling. Texas A & M University ICC 2005
Flow-based Network Traffic Images • The number of flows based visual representation • The number of flows in (source/destination) address domain • The black dots/lines illustrate more concentrated traffic intensity. • An analysis is effective for revealing flood types of attacks • Image reveals the characteristics of traffic • Normal behavior mode • A single target (DoS) • Semi-random target : a subnet is fixed and other portion of address is changed (Prefix-based attacks) • Random target : horizontal (Worm) and vertical scan (DDoS) Texas A & M University ICC 2005
Network traffic as images – normal network traffic • Standard deviation of most significant DCT coefficients of images • energy distribution of number of flows over address domain. • At normal traffic state, this signal is at a middle level between later two anomalous cases. • Legitimate flows do not form any regular shape due to their random distribution over address domain. Texas A & M University ICC 2005
Network traffic as images – semi-random targeted attacks • The difference between attackers (or victims) and legitimate users is remarkable • higher variance than normal traffic • The specific area of data structure is shown in a darker shade. • traffic is concentrated on a (aggregated) single destination or a subnet. Texas A & M University ICC 2005
Network traffic as images –random targeted attacks • All of the addresses are exploited in hostscans attacks • Uniform intensity low variances • Whole region of the image in uniform intensity. • Horizontal/vertical lines indicate anomalies in 2D image • Random (sequential, dictionary scan) attacks • Horizontal scan : From the same source aimed at multiple targets -- Worm propagation • Vertical scan : From several machines (in a subnet) to a single destination -- DDOS • Worm propagation type attack • DDoS propagation type attack Texas A & M University ICC 2005
Summary of Visual representation of traffic data • Worm attacks – horizontal line in 2D image • DDoS attacks – vertical line in 2D image • Line detection algorithm • Visual images look different in different traffic modes • Motion prediction can lead to attack prediction Texas A & M University ICC 2005
Contents • Introduction and Motivation • Network Traffic as Images • Visual Representation • Requirements for Representing Network Traffic as Images • Sampling Rates • Visual modeling Network Traffic as Images • normal traffic, semi-random attacks, random attacks • Image Processing for Network Traffic • Validity of intra-frame DCT • Inter-frame differential coding • Conclusion Texas A & M University ICC 2005
Generation of useful Signal Scene change analysis - DCT • We can apply various image processing techniques • From generated images, we can generate useful signals through DCT (Discrete Cosine Transform) • DCT is effective for storage reduction and approximation of the energy distribution in image • Variance of leading DCT coefficients in 8-by-8 blocks Instead of whole DCT coefficients, we can choose only the dominant coefficient Texas A & M University ICC 2005
Impact of Selecting DCT coefficients (1) • TCG (GT) : Transformation Coding Gain • TCG measures the amount of energy packed in the low frequency (leading) coefficient • The higher TCG leads to smaller intra-frame MSE and higher compression Texas A & M University ICC 2005
Impacts of Selecting DCT coefficients (2) • Intra_frame DCT • Random traffic can be packed within fewer coefficients than semi-random traffic • Using inter-frame differential coding,we can improve the GT • For MSE of 0.3349, the required coefficients reduce from 42 to 3 • TCG increases 2.6 times Texas A & M University ICC 2005
Impacts of Design Factors for presenting Network traffic as Images • Sampling rates on DCT coefficients • A sampling rate of 60 seconds maintains the minimum intra-frame MSE over the entire range of retained DCT coefficients • We can choose 30 ~ 120 sec. as appropriate sampling period. Texas A & M University ICC 2005
Attack Estimation (1)- Motion prediction • Step 1: complexity reduction • Pixels below a mean packet count • Normalized absolute difference similarity • Step 2: to find a block of addresses Texas A & M University ICC 2005
Attack Estimation (2)- Motion prediction • Step 3: to calculate the quantitative components • Starting position • Motion vector • Step 4: compensating errors Texas A & M University ICC 2005
Advantages • Not looking for specific known attacks • Generic mechanism • Works in real-time • Latencies of a few samples • Simple enough to be implemented inline Texas A & M University ICC 2005
Contents • Introduction and Motivation • Network Traffic as Images • Visual Representation • Requirements for Representing Network Traffic as Images • Sampling Rates • Visual modeling Network Traffic as Images • normal traffic, semi-random attacks, random attacks • Image Processing for Network Traffic • Validity of intra-frame DCT • Inter-frame differential coding • Conclusion Texas A & M University ICC 2005
Conclusion • We studied the feasibility of analyzing packet header data through Image and DCT analysis for detecting traffic anomalies. • We evaluated the effectiveness of our approach by employing network traffic. • Can rely on many tools from signal/image processing area • More robust offline analysis possible • Concise for logging and playback • Real-time resource accounting is feasible • Real-time traffic monitoring is feasible • Simple enough to be implemented inline Texas A & M University ICC 2005
Thank you !! Texas A & M University ICC 2005
Processing and memory complexity • Two samples of packet header data 2*P, P is the size of the sample data • Summary information (DCT coefficients etc.) over samples S • Total space requirement O(P+S) • P is 232 4*256 = 1024 (1D), 264 256K (2D) • S is 32*32 16 • Memory requires 258K • Processing O(P+S) • Update 4 counters per domain • Per-packet data-plane cost low. Texas A & M University ICC 2005