270 likes | 388 Views
Modeling Time Correlation in Passive Network Loss Tomography. Jin Cao (Alcatel-Lucent, Bell Labs), Aiyou Chen (Google Inc), Patrick P. C. Lee (CUHK) June 2011. Outline. Motivation Loss model Include correlation Profile likelihood inference Basic approach Extensions Simulation results.
E N D
Modeling Time Correlation in Passive Network Loss Tomography Jin Cao (Alcatel-Lucent, Bell Labs), Aiyou Chen (Google Inc), Patrick P. C. Lee (CUHK) June 2011
Outline • Motivation • Loss model • Include correlation • Profile likelihood inference • Basic approach • Extensions • Simulation results
Motivation • Monitoring a network’s health is critical for reliability guarantees • to identify bottlenecks/failures of network elements • to plan resource provisioning • It’s challenging to monitor a large-scale network • Collection of statistics can bring huge overhead • Network loss tomography • compute statistical estimates of internal losses through end-to-end external measurements
Loss Tomography Overview • Active probing • Consider a tree setting. • Send unicast probes to different receivers (leaves) • Collect statistics at receivers • Assume probes may be lost at links • Our goal: infer loss rate of common link (root-to-middle-node link) • Key idea: time correlation of packet losses • neighboring packets likely experience similar loss behavior on the common link probes 4 3 2 1
Passive Loss Tomography • Drawback of active probing: • introduce probing overhead • require collaboration of both senders and receivers • Passive loss tomography: • Monitor underlying traffic • E.g., use TCP data and ACKs to infer losses • Challenges: • Limited control. Time correlation highly varies. • Can we model time correlation?
Prior Work on Loss Tomography • Multicast loss inference [Cáceres et al. ’99, Ziotopolous et al. ’01, Arya et al. ’03] • Send multicast probes • Drawback: require multicast be enabled • Unicast loss inference [Coates & Novak ’00, Harfoush et al. ’00, Duffield et al. ’06] • Send unicast probes to different receivers • Drawback: introduce probing overhead • Passive loss tomography [Tsang et al. ’01, Brosh et al. ’05, Padmanabhan et al. ’03] • Use existing traffic for inference • Drawback: no explicit model of time correlation
Propose passive loss tomography with explicit modeling of time correlation of packet losses Our Objective
Our Contributions • Formulate a loss model as a function of time correlation • Show our loss model is identifiable • Develop a profile-likelihood method for simple and accurate inference • Extend our method for complex topologies • Model and network simulations with R and ns2
Where to Apply Our Work? • An extension for TCP loss inference platform • use packet retransmissions to infer losses • Identify packet pairs: neighboring packets to different leaf branches TCP packets/ACKs Determine information of loss samples TCP packets common link loss samples & packet pairs TCP ACKs Our inference approach … infer loss rate of common link 1 2 K • Note: our work is not on how to sample, but uses existing samples to accurately compute loss rates
Loss Modeling • Main idea: use packet pairs to capture loss correlation • Issues to address: • How to integrate correlation into loss model? • Is the model identifiable? • What is the inference error if we wrongly assume perfect correlation?
V p U p1 p2 1 2 Loss Model • Define: • A packet pair (U, V) to diff. leaves • p, p1, p2 = link success rates • Zu, Zv = success events on common link • ρ(Δ) = correlation(Zu, Zv) with time difference Δ • 0 ≤ ρ(Δ) ≤ 1 (by definition) • ρ(0) = 1 • ρ(Δ) is monotonically decreasing w.r.t. Δ • Probability that both U, V are successfully delivered from root to respective leaf nodes • r11 = p p1 p2 (p + (1 – p) ρ(Δ)) • if ρ(Δ) = 1, r11 = p p1 p2 • if ρ(Δ) = 0, r11 = p2 p1 p2
Modeling Time Correlation • Perfect correlation: ρ(Δ) = 1 • In practice, ρ(Δ) < 1 for Δ > 0 (i.e., decaying) • r11 = p p1 p2 (p + (1 – p) ρ(Δ)) is over-estimated in perfect correlation • Consider two specific approximations: • Linear form: ρ(Δ) = exp(-a Δ) (a is decaying constant) • Quadratic form: ρ(Δ) = exp(-a Δ2) • If Δ is small, good enough approximations to capture time-decaying of correlation • Claim: better than simply assuming perfect correlation
Theorems • Theorem 1: Under the loss correlation model, the link success rates p, p1, p2 and constant a are identifiable, given that ρ(0) = 1 • Theorem 2: If perfect correlation is wrongly assumed in a setting with imperfect correlation, then there is an absolute asymptotic bias. • See proofs in paper.
p p1 pK p2 … 2 1 K Profile Likelihood Inference • Given the loss model, how to estimate loss rate? • Inputs: • single packet end-to-end measurements • packet pair end-to-end measurements • Topology: • Two-level, K-leaf tree • Profile likelihood (PL) inference: • Focus on parameters of interest (i.e., link loss rates to be inferred) • Replace nuisance unknowns with appropriate estimates
Profile Likelihood Inference • Step 1: apply end-to-end success rates • Let Pi = end-to-end success rate to leaf link I • Re-parameterize r11(for every pair of leaves) as a function of p and Pi’s • Solve for {p, P1, P2, …, PK, a} • But this is challenging with many variables to solve Pi = p pi r11 = PU PV p-1(p + (1 – p) ρ(Δ))
^ Pi = Mi / Ni Profile Likelihood Inference • Step 2: remove nuisance parameters • Based on profile likelihood [Murphy ’00], replace nuisance unknowns with appropriate estimates • Replace Piwith maximum likelihood estimate • Ni = number of packets going to leaf i • Mi = number of total successes to leaf I • Only two variables to solve: p and a
Profile Likelihood Inference • Step 3: estimate p when ρ(.) is unknown • Approximate ρ(.) with either linear or quadratic form • To solve for p and a, we optimize log-likelihood function using BFGS quasi-Newton method • See paper for details
^ Pi = M / N for all i Extension: Remove Skewness • If some leaf has only a few packets (i.e., Mi, Ni are small), the approximation of Pi will be inaccurate. • Especially when there are many leaf branches • Heuristic: let Pi be the same for all i • Intuition: remove skewness of traffic loads among leaves by taking aggregate average • Let: • N = total number of packets to all leaves • M = total number of successes to all leaves • Take the approximation:
Extension: Large-Scale Topology • If there are many levels in a tree, we decompose into many two-level problems • Estimate loss rates f0 and f1 • f = max(0, (f1 – f0) / (1 – f0))
p p1 pK p2 … 2 1 K Network Simulations • We use model simulations to verify the correctness of our models under ideal settings • See details in paper • Network simulations with ns2: • Traffic models: • Short-lived TCP sessions • Background UDP on-off flows • Loss models: • Links follow exponential ON-OFF loss model • Queue overflow due to UDP bursts • Both loss models are justified in practice and show loss correlation TCP/UDP flows
^ Pi = Mi / Ni ^ Pi = M / N for all i Network Simulations • Three estimation methods: • est.equal: take aggregate average in end-to-end success rates • est.self: take individual end-to-end success rates • est.perfect: use est.self but assuming perfect correlation
p p1 pK p2 … 2 1 K Experiment 1: ON-OFF Loss • Consider two-level tree, with exponential on-off loss • est.perfect is worst among all p = 2%, pi = 0 p = 2%, pi = 2%
p p1 pK p2 … 2 1 K = 10 Experiment 2: Skewed Traffic • Uneven traffic (let K = 10) • β: % of traffic going to leaves 1 – 5 • 1 – β: % of traffic going to leaves 6 - 10 • est.equal is robust to skewed traffic p = 2%, pi = 0 p = 2%, pi = 2%
Experiment 3: Large Topology • Goal: verify if two-level inference can be extended for multi-level topology
Experiment 3: Large Topology Level 1 Level 2 Level 3 Losses occur only in links of interest
Experiment 3: Large Topology • est.equal is best among all • around 5%, 10%, 20% errors in levels 1, 2, 3 resp. Level 1 Level 2 Level 3 Losses occur only in links of interest
Conclusions • Provide first attempt to explicitly model time correlation in loss tomography • Propose profile likelihood inference • Remove nuisance parameters • Simplify loss inference without compromising accuracy • Conduct extensive model/network simulations • Assuming perfect correlation is not a good idea • est.equal is robust in general, even for skewed traffic loads and large topology