170 likes | 280 Views
Modeling the Spread of Worms. Wade Trappe. Overview. Quick discussion of how the Internet is organized. Random Constant Spread (RCS) Model and Code-Red I The Differential Equation Solving it! Observations Improvements in worm design Scanning Strategies. Internet Overview, pg.1.
E N D
Modeling the Spread of Worms Wade Trappe
Overview • Quick discussion of how the Internet is organized. • Random Constant Spread (RCS) Model and Code-Red I • The Differential Equation • Solving it! • Observations • Improvements in worm design • Scanning Strategies
Internet Overview, pg.1 • The Internet started as a research project connecting 4 computers in 1969, and has grown to connect over 100 million machines. • The Internet is: • A loose collection of networks organized into a hierarchy through interconnection technologies. • At the local level machines are connected to each other (local area network), and to a router. • A router is a special-purpose device that transfers data to and from the next layer of the hierarchy. • Loose collection of networks organized into a multilevel hierarchy • 10-100 machines connected to a hub or a router • service providers also provide direct dialup access • or over a wireless link • 10s of routers on a department backbone • 10s of department backbones connected to campus backbone • 10s of campus backbones connected to regional service providers • 100s of regional service providers connected by national backbone • 10s of national backbones connected by international trunks
Internet Overview, pg. 2 • Question: So, I want to send an email, how does it happen? • Answer: We use Addresses, and Route between Addresses using the Internet Protocol (IP). • Your data is sent via packets, and the Internet employs a store-and-forward strategy when delivering them between nodes. • Packets consist of: Meta-data (header) and the data (payload) • Metadata allows us to forward packets when we want • E.g. letters at a post office headed for main post office • address labels allow us to forward them in batches
135.105.53 100 Internet Overview, pg. 3 • Internet addresses are called IP addresses • Refer to a host interface (device connecting the computer to the network): need one IP address per interface • Addresses are structured as a two-part hierarchy • network number • host number • Question: How many bits to assign to host number and how many to network number? • If many networks, each with a few hosts, then more bits to network number • And vice versa • In the end, IP addresses consist of three sets of partitions of bits • class A: 8 bits network, 24 bits host • class B: 16 bits each • class C: 24 bits network, 8 bits host • Routing uses these addresses to deliver from a source to a destination.
Internet Overview, pg. 4 • An example of a message route; # traceroute henna.iitd.ernet.in traceroute to henna.iitd.ernet.in (202.141.64.30), 30 hops max, 40 byte packets 1 UPSON2-NP.CIT.CORNELL.EDU (128.84.154.1) 1 ms 1 ms 1 ms 2 HOL1-MSS.CIT.CORNELL.EDU (132.236.230.189) 2 ms 3 ms 2 ms 3 CORE1-MSS.CIT.CORNELL.EDU (128.253.222.1) 2 ms 2 ms 2 ms 4 CORNELLNET1.CIT.CORNELL.EDU (132.236.100.10) 4 ms 3 ms 4 ms 5 ny-ith-1-H1/0-T3.nysernet.net (169.130.61.9) 5 ms 5 ms 4 ms 6 ny-ith-2-F0/0.nysernet.net (169.130.60.2) 4 ms 4 ms 3 ms 7 ny-pen-1-H3/0-T3.nysernet.net (169.130.1.121) 21 ms 19 ms 16 ms 8 sl-pen-21-F6/0/0.sprintlink.net (144.228.60.21) 16 ms 40 ms 36 ms 9 core4-hssi5-0.WestOrange.mci.net (206.157.77.105) 20 ms 20 ms 24 ms 10 core2.WestOrange.mci.net (204.70.4.185) 21 ms 34 ms 26 ms 11 border7-fddi-0.WestOrange.mci.net (204.70.64.51) 21 ms 21 ms 21 ms 12 vsnl-poone-512k.WestOrange.mci.net (204.70.71.90) 623 ms 639 ms 621 ms 13 202.54.13.170 (202.54.13.170) 628 ms 629 ms 628 ms 14 144.16.60.2 (144.16.60.2) 1375 ms 1349 ms 1343 ms 15 henna.iitd.ernet.in (202.141.64.30) 1380 ms 1405 ms 1368 ms
Now Back to Worms… • Someone who controls many nodes on the Internet can cause serious damage to the Internet. • It is reasonable to gain control of millions of Internet hosts through worms. • Worms differ from viruses in that worms do not require human intervention to propagate. Viruses require user action (aka. Clicking that email attachment). • Pandurang gave the overview of Worms, along with its history in the previous lecture. • We will start with Code Red
Code Red • The Code Red Worm was initially released in July 2001. • The worm spread by compromising Microsoft web servers using a vulnerability that had been discovered just a few weeks earlier. • Once a host was infected, Code Red would spread itself by launching 99 threads, that each generated a random IP address and tried to infect that address using the same vulnerability. • Initial version of Code Red, CRv1, had a bug in the random number generator. • Second version of Code Red, CRv2, the bug was fixed. CRv2 contained a piece of code to perform a distributed denial of service attack on www.whitehouse.gov.
Random Constant Spread, pg. 1 • Code Red spread very rapidly at first, until almost all vulnerable machines were compromised, then it seemed to slow down its spread. • The Random Constant Spread (RCS) is one model to describe this phenomenon. • Let N= total # of vulnerable servers which can be corrupted/infected (assume its constant with time) • Let K= initial compromise rate • i.e. the number of vulnerable machines that an infected host can find and compromise at the start (when few other hosts have been compromised). • K is some universal constant for a particular worm. • Assume that a compromised machine picks other machines at random, and that once a machine is infected it cannot be compromised again. • Let T be point when half the machines are infected. • Variables: • a: the proportion of vulnerable machines that have been infected (e.g. a=1 means all N have been infected). The variable a will change with time t. • t: time in hours
Random Constant Spread, pg. 2 • RCS is based upon the idea of logistic growth: • The actual growth rate at a time t depends on the population • Suppose a(t) is the proportion of the N machines infected at time t, then there are a total of Na(t) machines that have been infected. • If we go from time t to time (t+dt), then a(t) will become a(t+dt)=a(t) + da. • da represents the change in the proportion a, and is an infinitesimal quantity (i.e. everything is in the limit). • So Nda represents the total number of additional machines that will be infected in dt more time. • That’s one way to calculate the number of additional machines that can be infected in dt time, we need one more way.
Random Constant Spread, pg. 3 • Key Idea: Suppose I have 100 machines and I can infect K of those machines in one hour. Now, instead, suppose I have 80 machines, then how many can I infect in one hour? • Answer: 0.8 K • Now, suppose Na machines have been infected, then that leaves (1-a)N machines left. • Question: When I had N infectible-machines I could infect K machines. So, now I have (1-a)N infectible machines, how many can one machine infect? • Answer: (1-a)K • Next Issue: I can infect (1-a)K machines in 1 hour, but what about in dt time? Answer: (1-a)Kdt. • Final Issue: At time t I have a(t)N machines that can do the infecting, so how many will be infected in time dt? • Simple, but not completely accurate answer: (Na)K(1-a)dt
Random Constant Spread, pg. 4 • Lets put the two sides together: Nda= (Na)K(1-a) dt • So, how do we solve this? Answer: Its an easy first order diffeq. • One way:
Random Constant Spread, pg. 5 • Observations: • For small t (before the first infection) there is no growth, but once the infection happens, growth happens exponentially. • However, once significantly past T, growth slows again because we are running out of machines to infect. • See plot for an example. • These observations were confirmed in the real worm data. • Several hours before Code Red was due to terminate itself, it had slowed down due to the fact it had found the majority of infectible machines.
Random Constant Spread, pg. 6 • What was wrong with the RCS model? • Basically, problem lies to the simplification of the probability involved. • The assumption that if aN machines are infected then (aN)(1-a)K machines will be infected in next hour is wrong. • Randomly choosing an address might mean that you actually try to reinfect an already infected machine. • Or, by randomly choosing an address, two infected machines might try to infect the same machine. • Overall, the value of RCS is not its rigor, but the fact it reveals underlying principles and dynamics.
Better Worm Strategies • Localized Scanning: • It takes more time to infect a node further away than one nearby. • Localized scanning seeks to balance the amount of attempts a worm takes in infecting a nearby machine versus choosing a random machine on the Internet. • Strategy employed in Code Red II. • Hit List Scanning: • We saw that worms take a while to get started, but once started they grow exponentially. How do we speed up the start? • Idea: • Give the initial worm a list of “high-potential targets”. • Once it infects a machine on the hit-list, it splits the hit-list in half and gives half to child worm to use. • Child worms continue replicating and splitting hit list. • Advantages: hit-list shrinks quickly, initial spread is very quick.
Better Worm Strategies, pg. 2 • Permutation Scanning: • One limitation of random scanning is that different nodes may try to infect the same machine, or infect an already infected machine. • Idea: • Each worm gets a starting point of permutation space to work with. • Permutation Space is mapped to IP Address Space via a 32 bit cipher (with fixed key). • The worm goes along attempting to infect each machine in its region of permutation space. If it ever encounters a machine that has been infected, it knows that its permutation space will start overlapping another worm’s permutation space, so it chooses a new, random place to start from in permutation space. • Result: Worms end up trying to work on separate sections of permutation space. • Improvements: Enforced partitions of permutation space.