SAMPLING STRATEGIES FOR EPIDEMIC-STYLE INFORMATION DISSEMINTATION

SAMPLING STRATEGIES FOR EPIDEMIC-STYLE INFORMATION DISSEMINTATION Milan Vojnovic, Varun Gupta, Thomas Karagiannisand Christos Gkantsidis

INTRODUCTION • Reaching target fraction of the hosts. • Discovering of nodes by random probing • Optimum static and dynamic random probing stratagies • Non uniformity of hosts of subnets. • Assumed that hosts are partitioned into groups or subnets

INTRODUCTION

INTRODUCTION • Summary of Results: • Identify the optimal static and dynamic strategies • Optimal static strategy is unique • Optimal dynamic strategies are multiple • Simple sampling strategies, outperforms global random scanning and local subnet preference strategies. • K-FAIL and K- CANDSET

INTRODUCTION • Related Work • Speed of propagation of the information to the hosts • Time required to reach the target fraction of the hosts

STATIC SUBNET PREFERENTIAL SAMPLING • Class of sampling strategies for which the subnet sampling probabilities are fixed in time are considered. • Uniform Global random sampling strategy(UNI(Ω)). • total fraction of sucesptable hosts s(u)and the total number of samplings per host ‘u’ are related as • s(u) = s(0)e^−βu, u ≥ 0.

STATIC SUBNET PREFERENTIAL SAMPLING • Optimal Static Strategy (OPT - STATIC) • OPT-STATIC dictates sampling over a set A • Need not necessarily sample the smallest number initially densest subnets • Targeting the largest subsets may start slow dissemination but makes things faster at the end.

STATIC SUBNET PREFERENTIAL SAMPLING • Thoerm: • For any target fraction of infected hosts, the strategy OPT-STATIC is optimal for minimizing the total number of sampling over all static sampling strategies.

STATIC SUBNET PREFERENTIAL SAMPLING • The required number of samplings to reach the target hosts depends on: • Density of hosts over the address space • Initial fraction of infected hosts • Distribution of intiallysusceptable hosts over subnets • Distribution of subnet address sizes

DYNAMIC SAMPLING STRATEGIES • Optimal dynamic sampling strategy(OPT_DYNAMIC) • Extending the space of sampling strategies from static to dynamic does not give the optimum solution. • Assumed that number of samples are minimized. • Adding infected hosts to least dense subnets.

SAMPLING STRATEGIES THAT USE ONLY LOCAL KNOWLEDGE • Sampling strategies that are local in the each host biases its sampling over subnets based only on success or failure. • We would see the sampling strategies that at any time keep the state for only a constant number of subnets.

SAMPLING STRATEGIES THAT USE ONLY LOCAL KNOWLEDGE • Local Subnet Preference • Each infected host in a subnet samples an address uniformly at random.

SAMPLING STRATEGIES THAT USE ONLY LOCAL KNOWLEDGE • K-FAIL Strategy • Each infected Host starts with uniform random sampling. • When Strategy fails on a candidate subnet • When Host becomes infected

SAMPLING STRATEGIES THAT USE ONLY LOCAL KNOWLEDGE • K-CAND Strategy • Infected hosts are set arbitarily • Each infected host samples an address uniformly • A host that becomes infected inherits the candidate set of the instigator host

Experimental Results • Data Sets • WU: The data set refers to IIS logs collected at the windows update system • Hotmail: The data set consists of approximately 103 million IP addresses • Dsheild: The data-set consists of roughly 7.6 million IP addresses • Witty A: list of IPs(roughly 55000) corresponding to hosts spreading the witty worm

EXPERIMENTAL RESULTS • Evaluation of Optimal Sampling Strategy • Optimal sampling strategy depends on • Logarithmic term • KL divergence term • If KL divergence term is negligibly relative to logarithmic term then uniform random sampling over an address space is near optimal .

Prior Distribution of DShield

Prior Distribution of WU

CONCLUSION • Leveraging the distribution of hosts over subnets • Static and dynamic sampling strategies • Analysis was done to acquire the number of samplings done to reach the target host • In future, calculating the time required to reach the host would be good aspect to be researched on

QUESTIONS

SAMPLING STRATEGIES FOR EPIDEMIC-STYLE INFORMATION DISSEMINTATION