200 likes | 304 Views
Amogh Dhamdhere ( CAIDA/UCSD ) amogh@caida.org with Lee Breslau, Nick Duffield, Cheng Ee , Alexandre Gerber, Carsten Lund and Shubho Sen ( AT&T Labs-Research ). FlowRoute : Inferring Forwarding Table Updates Using Passive Flow-level Measurements. Motivation.
E N D
AmoghDhamdhere (CAIDA/UCSD) amogh@caida.org with Lee Breslau, Nick Duffield, Cheng Ee, Alexandre Gerber, Carsten Lund and ShubhoSen (AT&T Labs-Research) FlowRoute: Inferring Forwarding Table Updates Using Passive Flow-level Measurements
Motivation • Routing protocol performance during routing events can affect end-to-end performance • Transient loops and packet losses may occur during routing reconvergence • Network operators need to monitor routing protocol performance • Do routers respond as expected? • Update their forwarding tables in a timely manner? • Update their forwarding tables to the expected state? IMC 2010, Melbourne Australia
Monitoring Routing Events • Control plane monitors (e.g., OSPFmon, BGPmon) • Monitor the control plane • cannot measure when a router implemented a change in its forwarding table • Active probing • Can only monitor paths that are probed • Spatial and temporal resolution limited by placement of probes and probing frequency IMC 2010, Melbourne Australia
FlowRoute • A data-plane monitoring tool to work in conjunction with control plane monitors • Infer forwarding table updates using flow-level measurements • Works offline, for after-the-fact forensics and analysis • No additional overhead on routers • Uses flow-level measurements (e.g., Netflow) that are already collected IMC 2010, Melbourne Australia
Basic Method • Single packet flows f1 and f2 towards D • f1 seen at N1: R is previous hop at time T1 • N1 is R’s next hop towards D at T1 • f2 seen at N2: R is previous hop at time T2 • N2 is R’s next hop towards D at T2 T1: f1 N1 R N2 T2: f2 R’s next hop towards D changed in [t1,t2] IMC 2010, Melbourne Australia
Routing Flow Records o i Rp R Rn δ R sees flow towards destination D from tf to tl Netflow: (R, i, o, tf, tl, D) Map outgoing interface o to next hop router Map incoming interface i to previous hop router Subtract link propagation delays Duplicate first packet timestamp (Rp, tf-δ, tl- δ,D,R) (R, tf, tf, D, Rn) One flow record at R produces two routing flow records, giving the routing state of R and Rp IMC 2010, Melbourne Australia
Inferring Forwarding Table Updates • Collect netflow records from all routers • Convert to Routing Flow Records (RFRs) for offline processing (R, T1, T2, N1, D) (R, T3, T4, N2, D) T2 < T3 R changed next hop towards D in the time window [t2,t3] “range” of forwarding table update N1 N2 T1 T2 T3 T4 IMC 2010, Melbourne Australia
Inferring Forwarding Table Updates • Collect netflow records from all routers • Convert to Routing Flow Records (RFRs) for offline processing (R, T1, T2, N1, D) (R, T3, T4, N2, D) T2 > T3 Routing flow records overlap could be due to Equal Cost Multi-Path (ECMP) N2 N1 T1 T3 T2 T4 IMC 2010, Melbourne Australia
ECMP • Router R can forward flows destined to D to either N1 or N2 • RFRs generated at N1 and N2 can overlap inconsistency • Non-overlapping RFRs can appear as a routing change for every flow [T1,T2]: f1 N1 R D N2 [T3, T4]: f2 IMC 2010, Melbourne Australia
Filtering ECMP • Observation: In 99% of next hop changes due to ECMP, a router routes fewer than 20 flows towards one next hop, before routing a flow towards an equal-cost next hop • Filtering heuristic: Declare routing change only if >20 flows were routed to the old next hop before a flow is routed to new next hop • Conservative: May miss routing changes before 20 flows are forwarded to the old next hop IMC 2010, Melbourne Australia
Sampling • Both packet and flow sampling in high-speed networks • Sampling does not affect correctness of inferred ranges • Sampling affects the width of ranges; more sampling lower temporal resolution • More discussion in the paper IMC 2010, Melbourne Australia
Timely Forwarding Table Updates Forwarding table update ranges OSPF event “cluster” All ranges overlap with OSPF event cluster IMC 2010, Melbourne Australia
Delayed Forwarding Table Updates Forwarding table updates consistent with OSPF events Forwarding table updates delayed w.r.t OSPF events Such behavior is not detectable using a control plane monitor alone! IMC 2010, Melbourne Australia
Delayed Forwarding Table Updates • Used FlowRoute on a 2-month dataset • 2666 OSPF event clusters • 97010 time ranges consistent with OSPF event clusters • 117 ranges that showed delayed forwarding table updates • Two routers showed delayed updates 14 times in the 2-month dataset • Subsequently retired from the network IMC 2010, Melbourne Australia
Loops • Delayed forwarding table updates can cause transient loops • Example in the paper of how this can happen • 392 instances of 1-hop loops during 2-month dataset • Mostly short-lived (sub-second) • A few loops lasted 10s of seconds • Long-lived loops were due to delayed updates by one or more routers IMC 2010, Melbourne Australia
Summary • FlowRoute: A data plane monitor to work in conjunction with control plane monitors for forensics and analysis of forwarding table updates • Used to study forwarding table updates in a tier-1 ISP network • Found cases of delayed forwarding table updates due to buggy routers • Also found transient loops during routing convergence and spikes in link utilization IMC 2010, Melbourne Australia
Thanks!amogh@caida.orgwww.caida.org/~amogh IMC 2010, Melbourne Australia
Practical Issues • What should be the destination? Can be either destination IP address, prefix, or MPLS tunnel endpoint • Need to observe sufficient flow volume • We choose MPLS tunnel endpoint • Sampling • Both packet and flow sampling occur in high-speed networks • Sampling does not affect correctness of inferred ranges • Affects the width of the ranges; more sampling lower temporal resolution IMC 2010, Melbourne Australia
Existing Approaches • Control plane monitors (e.g., OSPFmon, BGPmon) • Monitor the control plane, cannot measure when a router implemented a change in its forwarding table • Collect and process router logs • Large volume of data, transporting and processing is hard • Limited by polling frequency, e.g., 5 minutes with SNMP • Active probing • Spatial and temporal resolution limited by placement of probes and probing frequency IMC 2010, Melbourne Australia
Delayed Forwarding Table Updates • Used FlowRoute on a 2-month dataset -- 2666 OSPF event clusters • 97010 time ranges consistent with OSPF event clusters • 58 clusters, 117 ranges that showed delayed forwarding table updates • Two routers showed delayed updates 14 times in the 2-month dataset • Subsequently retired from the network IMC 2010, Melbourne Australia