Modeling Time Correlation in Passive Network Loss Tomography

Modeling Time Correlation in Passive Network Loss Tomography Jin Cao (Alcatel-Lucent, Bell Labs), Aiyou Chen (Google Inc), Patrick P. C. Lee (CUHK) June 2011

Outline • Motivation • Loss model • Include correlation • Profile likelihood inference • Basic approach • Extensions • Simulation results

Motivation • Monitoring a network’s health is critical for reliability guarantees • to identify bottlenecks/failures of network elements • to plan resource provisioning • It’s challenging to monitor a large-scale network • Collection of statistics can bring huge overhead • Network loss tomography • compute statistical estimates of internal losses through end-to-end external measurements

Loss Tomography Overview • Active probing • Consider a tree setting. • Send unicast probes to different receivers (leaves) • Collect statistics at receivers • Assume probes may be lost at links • Our goal: infer loss rate of common link (root-to-middle-node link) • Key idea: time correlation of packet losses • neighboring packets likely experience similar loss behavior on the common link probes 4 3 2 1

Passive Loss Tomography • Drawback of active probing: • introduce probing overhead • require collaboration of both senders and receivers • Passive loss tomography: • Monitor underlying traffic • E.g., use TCP data and ACKs to infer losses • Challenges: • Limited control. Time correlation highly varies. • Can we model time correlation?

Prior Work on Loss Tomography • Multicast loss inference [Cáceres et al. ’99, Ziotopolous et al. ’01, Arya et al. ’03] • Send multicast probes • Drawback: require multicast be enabled • Unicast loss inference [Coates & Novak ’00, Harfoush et al. ’00, Duffield et al. ’06] • Send unicast probes to different receivers • Drawback: introduce probing overhead • Passive loss tomography [Tsang et al. ’01, Brosh et al. ’05, Padmanabhan et al. ’03] • Use existing traffic for inference • Drawback: no explicit model of time correlation

Propose passive loss tomography with explicit modeling of time correlation of packet losses Our Objective

Our Contributions • Formulate a loss model as a function of time correlation • Show our loss model is identifiable • Develop a profile-likelihood method for simple and accurate inference • Extend our method for complex topologies • Model and network simulations with R and ns2

Where to Apply Our Work? • An extension for TCP loss inference platform • use packet retransmissions to infer losses • Identify packet pairs: neighboring packets to different leaf branches TCP packets/ACKs Determine information of loss samples TCP packets common link loss samples & packet pairs TCP ACKs Our inference approach … infer loss rate of common link 1 2 K • Note: our work is not on how to sample, but uses existing samples to accurately compute loss rates

Loss Modeling • Main idea: use packet pairs to capture loss correlation • Issues to address: • How to integrate correlation into loss model? • Is the model identifiable? • What is the inference error if we wrongly assume perfect correlation?

V p U p1 p2 1 2 Loss Model • Define: • A packet pair (U, V) to diff. leaves • p, p1, p2 = link success rates • Zu, Zv = success events on common link • ρ(Δ) = correlation(Zu, Zv) with time difference Δ • 0 ≤ ρ(Δ) ≤ 1 (by definition) • ρ(0) = 1 • ρ(Δ) is monotonically decreasing w.r.t. Δ • Probability that both U, V are successfully delivered from root to respective leaf nodes • r11 = p p1 p2 (p + (1 – p) ρ(Δ)) • if ρ(Δ) = 1, r11 = p p1 p2 • if ρ(Δ) = 0, r11 = p2 p1 p2

Modeling Time Correlation • Perfect correlation: ρ(Δ) = 1 • In practice, ρ(Δ) < 1 for Δ > 0 (i.e., decaying) • r11 = p p1 p2 (p + (1 – p) ρ(Δ)) is over-estimated in perfect correlation • Consider two specific approximations: • Linear form: ρ(Δ) = exp(-a Δ) (a is decaying constant) • Quadratic form: ρ(Δ) = exp(-a Δ2) • If Δ is small, good enough approximations to capture time-decaying of correlation • Claim: better than simply assuming perfect correlation

Theorems • Theorem 1: Under the loss correlation model, the link success rates p, p1, p2 and constant a are identifiable, given that ρ(0) = 1 • Theorem 2: If perfect correlation is wrongly assumed in a setting with imperfect correlation, then there is an absolute asymptotic bias. • See proofs in paper.

p p1 pK p2 … 2 1 K Profile Likelihood Inference • Given the loss model, how to estimate loss rate? • Inputs: • single packet end-to-end measurements • packet pair end-to-end measurements • Topology: • Two-level, K-leaf tree • Profile likelihood (PL) inference: • Focus on parameters of interest (i.e., link loss rates to be inferred) • Replace nuisance unknowns with appropriate estimates

Profile Likelihood Inference • Step 1: apply end-to-end success rates • Let Pi = end-to-end success rate to leaf link I • Re-parameterize r11(for every pair of leaves) as a function of p and Pi’s • Solve for {p, P1, P2, …, PK, a} • But this is challenging with many variables to solve Pi = p pi r11 = PU PV p-1(p + (1 – p) ρ(Δ))

^ Pi = Mi / Ni Profile Likelihood Inference • Step 2: remove nuisance parameters • Based on profile likelihood [Murphy ’00], replace nuisance unknowns with appropriate estimates • Replace Piwith maximum likelihood estimate • Ni = number of packets going to leaf i • Mi = number of total successes to leaf I • Only two variables to solve: p and a

Profile Likelihood Inference • Step 3: estimate p when ρ(.) is unknown • Approximate ρ(.) with either linear or quadratic form • To solve for p and a, we optimize log-likelihood function using BFGS quasi-Newton method • See paper for details

^ Pi = M / N for all i Extension: Remove Skewness • If some leaf has only a few packets (i.e., Mi, Ni are small), the approximation of Pi will be inaccurate. • Especially when there are many leaf branches • Heuristic: let Pi be the same for all i • Intuition: remove skewness of traffic loads among leaves by taking aggregate average • Let: • N = total number of packets to all leaves • M = total number of successes to all leaves • Take the approximation:

Extension: Large-Scale Topology • If there are many levels in a tree, we decompose into many two-level problems • Estimate loss rates f0 and f1 • f = max(0, (f1 – f0) / (1 – f0))

p p1 pK p2 … 2 1 K Network Simulations • We use model simulations to verify the correctness of our models under ideal settings • See details in paper • Network simulations with ns2: • Traffic models: • Short-lived TCP sessions • Background UDP on-off flows • Loss models: • Links follow exponential ON-OFF loss model • Queue overflow due to UDP bursts • Both loss models are justified in practice and show loss correlation TCP/UDP flows

^ Pi = Mi / Ni ^ Pi = M / N for all i Network Simulations • Three estimation methods: • est.equal: take aggregate average in end-to-end success rates • est.self: take individual end-to-end success rates • est.perfect: use est.self but assuming perfect correlation

p p1 pK p2 … 2 1 K Experiment 1: ON-OFF Loss • Consider two-level tree, with exponential on-off loss • est.perfect is worst among all p = 2%, pi = 0 p = 2%, pi = 2%

p p1 pK p2 … 2 1 K = 10 Experiment 2: Skewed Traffic • Uneven traffic (let K = 10) • β: % of traffic going to leaves 1 – 5 • 1 – β: % of traffic going to leaves 6 - 10 • est.equal is robust to skewed traffic p = 2%, pi = 0 p = 2%, pi = 2%

Experiment 3: Large Topology • Goal: verify if two-level inference can be extended for multi-level topology

Experiment 3: Large Topology Level 1 Level 2 Level 3 Losses occur only in links of interest

Experiment 3: Large Topology • est.equal is best among all • around 5%, 10%, 20% errors in levels 1, 2, 3 resp. Level 1 Level 2 Level 3 Losses occur only in links of interest

Conclusions • Provide first attempt to explicitly model time correlation in loss tomography • Propose profile likelihood inference • Remove nuisance parameters • Simplify loss inference without compromising accuracy • Conduct extensive model/network simulations • Assuming perfect correlation is not a good idea • est.equal is robust in general, even for skewed traffic loads and large topology

Modeling Time Correlation in Passive Network Loss Tomography

Modeling Time Correlation in Passive Network Loss Tomography

Presentation Transcript

Network Tomography

Time Series from their Observed Sums: Network Tomography

Passive Activity Loss Limitations

Time series modeling of temporal network

Network Tomography on Correlated Links

Modeling time in computing

Network Tomography

Network Coding Tomography for Network Failures

Mingling Tomography with Waveform Modeling

Simple Network Performance Tomography

Network Tomography Using Passive End-to-End Measurements

Correlation Modeling

Real Time Tomography

Time-Correlation Functions

Passive Network Tomography Using Bayesian Inference

Passive Loss Rules

Multiterminal Network Tomography

Reserve Variability Modeling: Correlation

Portfolio Modeling with Time Dependent Correlation Structure Computational Finance

Network Modeling

Passive Inference of Path Correlation

Passive optical network market