450 likes | 469 Views
Very Fast Containment of Scanning Worms. Nicholas Weaver, Stuart Staniford, Vern Paxson Presented by: Yi Xian, Chuan Qin. Outline. Worm containment Scan suppression Hardware implementations Approximate TRW Cooperation Attacking worm containment. Worm Classification.
E N D
Very Fast Containment of Scanning Worms Nicholas Weaver, Stuart Staniford, Vern Paxson Presented by: Yi Xian, Chuan Qin
Outline • Worm containment • Scan suppression • Hardware implementations • Approximate TRW • Cooperation • Attacking worm containment
Worm Classification According to the ways by which the worms discover new targets to exploit: • Scanning worms • random scanning / subnet scanning • Routing worms • BGP information can tell which IP address blocks are allocated • Flash worms • Pre-generated Hit list of vulnerable machines, which is determined before worm launch by scanning, is sent with payload • Gives the worm a boost in the slow start phase • Topological worms • Use the information stored on compromised hosts to find new targets • E.g. Email worms use email addresses, IM worms use buddy list • Meta-server worms • First queries the meta-server in order to determine new targets. • A meta-server keeps a list of servers which are currently active. • Passive worms • No active scanning • Either waits for a vulnerable system to contact it or rely on user behavior to discover new targets
Scanning Worms • What is scanning worm? Picking “random” addresses and attempting to infect them. • Code Red & Slammer – fully random; • Code Red II & Nimda – subnet scanning; “differentially” picking addresses closer to itself; bias toward local addresses; • Blaster – linear scanning; random-start, sequential search • Welchia – an attempted good worm to prevent and remove Blaster; random-start, sequential scanning; checking with ICMP whether the IP was live before attempting to infect the address;
Scanning Worms • Key properties of scanning worms: • Most scanning attempts fails • Infected machines will institute many connection attempts • The spread of worm relies on: • The number of initially infected hosts • A worm’s scan rate • A worm’s hitting probability
Scanning Worms • How to mitigate the spread of worms? • Prevention • Reduce size of vulnerable population • Insufficient to counter worm threat • Treatment • Once a host is infected, clean it up immediately • long time to develop cleanup code, and too slow to have a significant impact • People don’t install patches • Containment
Containment • Protect individual networks and isolate infected hosts • Most Promising Solution • Can be completely automated, otherwise, too slow to be useful • Containment does not require participation of each host on the internet • Deployment scenarios • Ideally, a global deployment is preferable. Otherwise, any uncontained but infected machines will be able to infect other systems. • Practically, a global deployment is impossible. • May be deploying at the border of ISP networks
Worm Containment • Defense against scanning worms • Leverage the anomaly of a local host attempting to connect to multiple other hosts • Works by detecting that a worm is operating in the network • Containment is based on worm behavior, not signatures (content) – be able to stop new scanning worms • Then blocking the infected machines from contacting further hosts • Containment by address blocking (blacklisting) • Does not apply to: • Hit lists (flash worms) • Meta-servers (online list) • Topology detectors • Contagion worms
Worm Containment • Break the network into many cells • Within each cell a worm can spread unimpeded. • Between cells, containment limits further infections by blocking outgoing connections from infected cells. • Must have very low false positive rate. • Blocking suspicious machines can cause a DoS attack if false positive rate is high. • Need for complete deployment within an enterprise • Integrated into the network’s outer switches or similar hardware elements, since containment works best when the cells are small.
Epidemic threshold • Worm-suppression device must necessarily allow some scanning before it triggers a response • Worm may find a victim during that time • The epidemic threshold depends on • The sensitivity of the containment response devices • Low scan threshold T • The density of vulnerable machines on the network • E.g. Using NAT and DHCP to distribute potential targets in a larger address space • The degree to which the worm is able to target its efforts into the correct network, and even into the current cell
Sustained Scanning Threshold • If worm scans slower than sustained scanning threshold, the detector will not trigger • So, it is vital to achieve as low a sustained scanning threshold as possible such that humans can notice the problem developing and take additional action. • For this implementation, threshold set to 1 scan per minute • For an enterprise with 256(2^8) vulnerable machines. • If a worm biases its scanning such that ½ the effort is used to scan the local /16, then on average it will locate another target within the enterprise after 2^9 scans. • threshold = one scan per second, the initial population’s doubling time: 2^9 seconds, that is the population will double once every 8.5 minutes • threshold = one scan per minute, doubling time: 8.5 hours, which is slow enough for humans to notice the problem and take actions.
Outline • Worm containment • Scan suppression • Hardware implementations • Approximate TRW • Cooperation • Attacking worm containment
Scan Suppression • Responding to detected portscans by blocking future scanning attempts • Portscans – Probe attempts to determine if a service is operating at a target IP address • Two basic types • Horizontal – search for identical service on large number of machines • Vertical – examine an individual machine to discover running services • Or, combines these two types
Scan Suppression • Goals –Preventing scans coming from “outside” inbound to the “inside”; keep worm below epidemic threshold, or slow it down so humans notice • Preventing scans from Internet is too hard • Protect the enterprise, forget the Internet • The enterprise network is “inside” • The cell (local area network) is “outside” • Divide the enterprise network into cells • Each cell is guarded by a filter employing the scan detection algorithm
Threshold Random Walk • Assumption: benign traffic has a higher probability of success than attack traffic • Threshold Random Walk (TRW) detection algorithm • Detect failed/succeeded connections • Y_i : outcome of the first connection attempt by a remote source r to the ith distinct local host • if success, Y_i = 0, otherwise, Y_i = 1 • Sequential Hypothesis Testing • Two hypothesis: benign (H_0) and scanner (H_1) • Probabilities determined by the equations below • Estimate q0 and q1 (The assumption that benign has higher chance of succeeding connection implies: q0 > q1 )
Threshold Random Walk • Set upper and lower thresholds h0,h1 • Where, a> false positive rate; b< detection accuracy (the algorithm picks H_1 when H_1 is in fact true) • For each test, compute the likelihood ratio: • Compare to the thresholds: If L < h0 then this is benign traffic L > h1 then this is scan traffic
Threshold Random Walk • Problems: • Requires a very large amount of state to keep track of which pairs of addresses have already tried to connect, too costly for a line-rate hardware implementation • SYN flooding attack with spoofed remote address will exhaust the state • Only detect horizontal TCP scans • Once benign address being blocked, no chance to communicate again
Outline • Worm containment • Scan suppression • Hardware implementations • Approximate TRW • Cooperation • Attacking worm containment
Hardware Implementations • Constraints in hardware implementation • Memory access speed • Surprisingly significant constraint • For duplex gigabit Ethernet, only have time to access DRAM 4 times • Worse for 10-g networks: DRAM no longer optional, must use SRAM • Independent memory banks • Different banks can be access simultaneously • Mitigate the access constraint, but adds the cost • Memory size • SRAM only be able to hold a few tens of M • DRAM can deal with G
Hardware Implementations • Design Goals • Adopt for both DRAM at 1g Ethernet and SRAM at 10g Ethernet • Access times • No more than 4 memory accesses per packet to 2 separate tables • 2 accesses for each, a read and a write to same location • Memory size • Less than 16MB needed, so SRAM can be optional
Hardware Implementations • Mechanism 1 – Approximate caches • A cache for which allow collisions cause imperfections • Fixed memory available • When collision occurs, combine or evict data • Two results: false positives or false negatives • In this scene, false negative is acceptable, false positive should avoid • Very simple lookups, vital for high-performance hardware implementation • Attacker behaviors • Predicting the hashing algorithm by creating collisions • Defense – a keyed hash function • Simply overwhelming the cache by generating a massive amount of “normal” activity to cloak malicious behavior • require substantial resources.
Hardware Implementations • Mechanism 2 – Efficient small 32 bit block ciphers • Equivalent to an 32-bit keyed permutation • Works by permuting the N-bit value with a key • Separate the resulting N-bit value into an k-bit index and a tag • Superior to using a hash function, since only need (N-k) bits for the tag • Prevent attackers from controlling collisions
Outline • Worm containment • Scan suppression • Hardware implementations • Approximate TRW • Cooperation • Attacking worm containment
Approximate TRW • Basic Strategies • Using approximate caches to track connections and addresses • Connection cache tracks the direction and the age of each connection • Address cache tracks the “count” of every detected addresses • Replace the old addresses and old ports if the corresponding entry has timed out; • Track addresses indefinitely as long as we do not have to evict their state from our caches; • Detect vertical as well as horizontal TCP scans, and horizontal UDP scans; • Implement a “hygiene filter” to thwart some stealthy scanning techniques without causing undue restrictions on normal machines.
Connection Cache • Using table indexed by hashing insideIP, outsideIP and inside port (TCP) • Recording if we’ve seen a packet in each direction and a age counter • Combines entries in the case of aliasing, may turn unidirectional connection into bidirectional or turn failed connection attempt into success (biases to false negative) • Age is reset on each forwarded packet • Every minute, back ground process purges entries older than Dconn
Address Cache • Using a 4-way associative cache to track outside IP • Dividing encrypted outsideIP into index /tag, then use index/tag to find an entry • The count keeps the difference between successes and failures, the address will be blocked if its count is larger than T • Counts are decremented every Dmiss seconds • Counts can not larger than Cmax or less than Cmin • Evict entries in case of aliasing, among the entries with the same index, the entry with the least count value will be evicted
Blocking and Special cases • If an address’s count exceeds T, block it For packets from blocked address • Do not match any existing connection, drop • Match an existing good connection • If it is a UDP or TCP SYN, drop • Else, pass • Treat TCP RST, RST+ACK, SYN+ACK, FIN, FIN+ACK specially (hygiene filter) • Do not correspond to a connection established in the other direction, drop (not block cause they may be benign activity) • Else, pass
Internet Out In Connection cache Address cache A,X: OutIn - A,*: OutIn - A,Y: OutIn - A,Z: OutIn - A B A,B: OutIn InOut A,B: OutIn - • UDP Probe: A → X [fwd] A → Y [fwd] • Normal Traffic: A → B [fwd] B → A [fwd, bidir] A: 1 A: Cmax A: 2 A: T A: 1 A: 3 • Scanning again: A → … [fwd until T] A → Z [blocked] A → B ? [block SYN/UDP, fwd TCP]
Parameters and Tuning • Parameters • T: miss-hit difference that causes block, vary from site to site. • Cmin: minimum count to prevent good address turn into bad • Cmax: maximum count to allow bad address be connected again • Dmiss: decay rate for misses • Dconn: decay rate for idle connections • Cache size and associativity, fixed
Evaluation All outbound connections over a threshold of 5 were flagged by the algorithm • Connection cache = 1MB, Address cache = 4MB • T = 5, Cmin = -20, Cmax = infinity, Dmiss = 1 minute, Dconn = 10 minutes • During test, 20% of connection cache is full, mean 20% false negative rate
Evaluation Additional alerts on the outbound traffic generated when sensitivity was increased • Set the parameters for maximum sensitivity • Connection cache = 4MB, Cmin = -5, Dmiss = infinity
Williamson implementation • Williamson’s algorithm uses a small cache of previously-allowed destinations • For all SYNs and UDP packets, if the destination found in cache, forward. If not, record and forward. • But if the source has sent to a new destination, add the packet to a delay queue. • Packet in delay queue forward by one destination per second, block will be trigger if the delay queue is too long
Outline • Worm containment • Scan suppression • Hardware implementations • Approximate TRW • Cooperation • Attacking worm containment
Cooperation • Divide enterprise into small cells • Connect all cells via low-latency channel • A cell’s detector notifies others when it blocks an address (“kill message”) • Blocking threshold dynamically adapts to number of blocks in enterprise: • T’ = T(1 – θ)X, for very small θ, where θ controls how aggressively to reduce T and X is the number of other blocks in place • Changing θ does not change epidemic threshold, but reduces infection density
Cooperation • Questions: • Should a complete shutdown be possible? • How to connect cells (practically)?
Outline • Worm containment • Scan suppression • Hardware implementations • Approximate TRW • Cooperation • Attacking worm containment
Attacking worm containment • False positives • Unidirectional control flow • Forge packets (though this does not prevent inside systems from initiating connections) • False negatives • Use a non-scanning technique (topological, meta-server, passive and hit-list) • Scan under detection threshold • Use a white-listed port to test for liveness before scanning
Attacking Cooperation • Attempt to outrace containment if initial threshold is highly permissive • Flood cooperation channels • Cooperative collapse • False positives high enough to cause lowered thresholds • Lowered thresholds cause more false positives which further reduce the threshold • Feedback causes collapse of network
Attacking Worm Containment • Detecting the presence of containment • Try to contact already infected hosts • Go stealthy if containment is detected • Circumventing containment • Embed scan in storm of spoofed packets (cause trash in address cache, and pollute the connection cache with many half-open connections) • Two-sided evasion: • Inside and outside host initiate normal connections to counter penalty of scanning • Can modify algorithm to prevent by excluding port informantion, but lose vertical scan detection
Conclusion • Develop containment algorithms suitable for deployment in high-speed, low-cost network hardware • Able to detect scanning for fewer than 10 attempts for a highly sensitive machine and for a normal machine in 30 attempts • Devise the mechanisms for cooperation that enable multiple containment devices to more effectively detect and respond to an emerging infection.
Additional References [1] Mark Shaneck. Worms: Taxonomy and Detection. [2] Weaver, Paxson, Staniford, Cunningham. A Taxonomy of Computer Worms, ACM Workshop on Rapid Malcode, 2003. [3] Jaeyeon Jung, Vern Paxson, Arthur W. Berger,Hari Balakrishnan, Fast Portscan Detection Using Sequential Hypothesis Testing.