350 likes | 493 Views
Worm Origin Identification Using Random Moonwalks. Yinglian Xie, V. Sekar, D. A. Maltz, M. K. Reiter, Hui Zhang 2005 IEEE Symposium on Security and Privacy. Presented by: Anup Goyal Edward Merchant. Outline. Motivation/Introduction Problem Formulation The Random Moonwalk Algorithm
E N D
Worm Origin Identification Using Random Moonwalks Yinglian Xie, V. Sekar, D. A. Maltz, M. K. Reiter, Hui Zhang 2005 IEEE Symposium on Security and Privacy Presented by: Anup Goyal Edward Merchant
Outline • Motivation/Introduction • Problem Formulation • The Random Moonwalk Algorithm • Evaluation Methodology • Analytical Model • Real Trace Study • Simulation Study • Deployment and Future Work
Outline • Motivation/Introduction • Problem Formulation • The Random Moonwalk Algorithm • Evaluation Methodology • Analytical Model • Real Trace Study • Simulation Study • Deployment and Future Work
Motivation • Little automated support for identifying the location from which an attack is launched. • Knowledge of the origin support law enforcement. • Knowledge of the casual flow that advance attack supports diagnosis of how network defense is breached.
Introduction • We craft an algorithm that determines the origin of epidemic spreading attacks. • identify the “patient zero” of the epidemic • reconstruct the sequence of spreading
Introduction (cont’d) • Random moonwalk algorithm - Find the origin and propagation paths of a worm attack. • performs post-mortem analysis on the traffic records logged by the network. • It depends on the assumption that worm propagation occurs in a tree-like structure.
Outline • Introduction • Problem Formulation • The Random Moonwalk Algorithm • Evaluation Methodology • Analytical Model • Real Trace Study • Simulation Study • Deployment and Future Work
Problem Formulation (cont’d) • A directed host contact graphG = (V, E) • V = H × T • H is the set of all hosts in the network • T is time • Each directed edge represents a network flow between two end hosts at certain time. • flow has a finite duration, and involves transfer of one or more packets. • e = (u, v, ts, te)
Problem Formulation (cont’d) • normal edge • The flow does not carry an infectious payload. • attack edge • The flow carries attack traffic, whether or not the flow is successful. • causal edge • The flow that actually infect its destination. • Goal - Identify a set of edges that are edges from the top level of the casual tree.
Outline • Introduction • Problem Formulation • The Random Moonwalk Algorithm • Evaluation Methodology • Analytical Model • Real Trace Study • Simulation Study • Deployment and Future Work
Random Moonwalk Algo. • Causal relationship between flows by exploiting the global structure of worm attacks • No use of attack content, attack packet size, or port numbers • For attack progress, there has to be a communication link between source of the attack and compromised nodes • This infection causing communication flows form a causal tree, rooted at the source of attack. • Find the tree and root is the source of attack • Find causal flows and attack flows
Random Moonwalk Algo. • Basic Algorithm • Go backward from every node for certain distance. • At each node choose only the flows which are within certain time limit • Do it Z number of times • Find the edges with highest frequency • Create a tree for these flows • Most probably this is the causal tree and root is the source of attack
Random Moonwalk Algo. (cont’d) • Sampling process controlled by three parameters • W – the number of walks (samples) performed. • D – maximum length of the path traversed. • Δt - samplingwindow size, max. time allowed between two consecutive edges
Random Moonwalk Algo. (cont’d) • Why this algorithm works ? • To propagate, sometime after infection, worm creates a new flows to other hosts. • This forms a link from source to last victim • Traverse this link backward and find the source • An infected host generally originates more flows than it receives. • The originators host contact graph are mostly clients. Normal edges have no predecessor within Δt.
Outline • Introduction • Problem Formulation • The Random Moonwalk Algorithm • Evaluation Methodology • Analytical Model • Real Trace Study • Simulation Study • Deployment and Future Work
Outline • Evaluation Methodology • Analytical Model • Assumptions • Edge Probability Distribution • False Positives and False Negatives • Parameter Selection • Real Trace Study • Simulation Study
Analytical Model (Assumptions) • The host contact graph is known. • |E| edges and |H| hosts • Discretize time into units. Every flow has a length of one unit and fits into one unit.
Analytical Model (FP & FN) (42 malicious edges at k = 1.) (Total 105 host.)
Outline • Evaluation Methodology • Analytical Model • Real Trace Study • Detect the Existence of an Attack • Identify Casual Edges & Initial Infected Host • Reconstruct the Top Level Casual Tree • Parameter Selection • Performance • Simulation Study
Real Trace Study • Background Traffic • Traffic trace was collected over a 4 hour period at backbone of a class-B university network. • collect intra-campus flows only (1.4 million) involving 8040 hosts • Addition • Add flow records to represent worm-like traffic with vary scanning rate • randomly select the vulnerable hosts.
Real Trace Study (Identify) (800 causal edges from 1.5*106 flows) (The scanning rate of Trace-50 is less than Trace-10.)
Real Trace Study (Identify) • Top frequent sampling v.s. Actual initial edges (total 800 causal edges, initial 10% are the first 80 edges) (The scanning rate of Teace-50 is less than Trace-10.)
Top 60, Trace-50, 104 walks Original Attacker Blaster Worm scan
d = infinite Real Trace Study (Parameter) • d and Δt
Real Trace Study (Performance) • Random moonwalk • Z = 100, 104 walks • Heavy-hitter • Find 800 hosts with largest number of flows in the trace, random pick 100 flows • Super-spreader • Find 800 hosts contacted the largest number of destination, randomly pick 100 flows • Oracle • With zero false positive rate, randomly select 100 flows between infected hosts
Real Trace Study (Performance) • Scanning Method • Smart worm (always scan valid hosts), R↑ • Scan with random address C: casual edge A: attack edge 100: Z=100 500: Z=500
Outline • Evaluation Methodology • Analytical Model • Real Trace Study • Simulation Study
Simulation Study • Simulate different background traffic • Realistic host contact graphs tend to be much sparser, meaning the chance of communication between two arbitrary hosts is very low. p.s. in campus network,the accuracy is about 0.7
Outline • Introduction • Problem Formulation • The Random Moonwalk Algorithm • Evaluation Methodology • Analytical Model • Real Trace Study • Simulation Study • Deployment and Future Work
Deployment and Future Work • This approach assumes that the availability of complete data. • the missing data on performance • the deployment of the algorithm
Questions ???? Thank You