Maximum Likelihood Network Topology Identification

Maximum Likelihood Network Topology Identification Mark Coates McGill University Robert Nowak Rui Castro Rice University DYNAMICS May 5th,2003

Network Tomography • Inferring network topology based on “external” end-to-end measurements. • Traceroute requires cooperation of routers:May not be met in practice • This paper assumes no internal network cooperation • Solely host-based unicast measurements

How does it work? The Problem Statement Unique Sender R

How does it work? Information we have • End-to-end measurements that measure the degree of correlation between receivers • Associate metric i,j with pair of receivers i,j  R Monotonicity property: pi,pj,pk : Paths from sender to i,j,k If pi shares more links with pj than with pk, then i,j > i,k

An example Here 18,19 > i,19for all other i Examples ? Simple Bottom-up merging algorithms can be used to identify full, logical topology

Two-fold Contribution • Novel measurement scheme: • Sandwich Probing • Each probe: three packets • Main Idea: Small packets queues behind the large, inducing extra seperation between small packets on shared links • A stochastic search method for topology identification

Sandwich Probing p2 p1 no cross-traffic: 01: queuing delay of p2 on link 01, 35= 01 ij: sum of ’s on the shared links to receiver i and j

Sandwich Probing 34= 01+12 35= 01 more shared queues  larger g

Advantages over loss and delay based metrics • Probe loss is rare on Internet. Large number of measurements required • For measuring delay, clock sync required • Each measurement contributes here.

Multiple measurements CLT Measurement framework Measurement of ij contaminated by cross traffic Cross traffic: zero-mean effect on

Likelihood Formulation • Estimated metrics are randomly distributed according to density p • p parameterized by underlying topology T and set of true metric values • When is viewed as function of T and , it is called the likelihood of T and .

Likelihood Formulation • Maximum Likelihood Tree is given by: F denotes forest of all possible trees G denotes set of all metrics satisfying monotonicity property Maximization involved is formidable Brute Force method: for N = 10, more than 1.8 x 106 trees

Simplifying the problem • Parameters  are chosen to maximize the value for a given tree T • To provide the very best fit T can provide to Data • Log likelihood of T Maximum Likelihood Tree is the one in the forest that has the largest likelihood value

Stochastic Search • Reversible Markov Chain Monte Carlo Method • Using above techniques, authors devise a rapid search method to find optimal trees. • “Learning using Bayesian Statistics” • Prior and Posterior distributions Main Idea: Posterior Distribution gives the region of high likelihood trees in F

Birth Move (insert node) T T 1 2

T T 2 1 Death Move (delete node)

ns-2 Simulations source 9 8 1 7 6 5 2 3 4

Simulation results % Correct 100 MPLT 80 60 DBT 40 20 4000 6000 8000 Number of Probes

MCMC Algorithm true topology MCMC topology Can Layer 2 branching points High speed connections can fool tomography

Summary • Delay-based measurement, no need for clock synchronization • MCMC algorithm to explore forest and identify maximum (penalized) likelihood tree

Maximum Likelihood Network Topology Identification

Maximum Likelihood Network Topology Identification

Presentation Transcript

Maximum likelihood estimation

Maximum Likelihood Estimation

Maximum likelihood (ML)

Maximum Likelihood

Maximum Likelihood

4. Maximum Likelihood

Maximum Likelihood

Maximum Likelihood Estimation

Maximum Likelihood Estimation

Maximum Likelihood

Maximum likelihood

Maximum likelihood decoding

Maximum likelihood (cont.)

Maximum Likelihood Estimation

Maximum likelihood (cont.)

Maximum Likelihood

Maximum Likelihood

Maximum Likelihood

Maximum Likelihood Estimation

Maximum Likelihood Estimation

Maximum Likelihood Estimate

Maximum Likelihood Detection