410 likes | 423 Views
The Sprint IP Monitoring Project and Traffic Dynamics at a Backbone POP. Supratik Bhattacharyya Sprint ATL http://www.sprintlabs.com. The IP Group at Sprintlabs. Charter : Investigate IP technologies for robust, efficient, QOS-enabled networks
E N D
The Sprint IP Monitoring Project and Traffic Dynamics at a Backbone POP Supratik Bhattacharyya Sprint ATL http://www.sprintlabs.com
The IP Group at Sprintlabs Charter : • Investigate IP technologies for robust, efficient, QOS-enabled networks • Anticipate and evaluate new services and applications Major Projects : • Monitoring Sprint’s IP Backbone • Service Platform
Talk Overview • The IPMon Project • Routing and Traffic Dynamics
IP Backbone : POP-to-POP view POP OC-48 OC-12 OC-3 POP : Point of Presence, typically a metropolitan area
Motivation: Need for Monitoring Current network is over-provisioned, over-engineered, best-effort… • Diagnosis: • detect and report problems at IP level • Management • configuration problems, traffic engineering • resource provisioning, network dimensioning • Value-added service • feedback to customers (performance, traffic characteristics) • Detect attacks and anomalies
Existing Measurement Efforts • Passive measurements • SNMP-based tools • Netflow (Cisco proprietary) • OC3MON, OC12MON • Active Measurements • ping, traceroute, NIMI, MINC, Surveyor • Skitter, Keynote, Matrix • Integrated Approach • AT&T Netscope • Network topology and routes • Traffic at flow level granularity • Delay and loss statistics
Our approach • Passive monitoring • Capture header (44 bytes) from every packet • full TCP/IP headers, no http information • Use GPS time stamping - allows accurate correlating of packets on different links • Day long traces • Simultaneously monitor multiple links and sites. • Collect routing information along with packet traces. • Traces archived for future use
Applications • Data from a commercial Tier-1 IP backbone • Applications of data: • traffic modeling • traffic engineering • provisioning • pricing, SLAs • hardware design in collaboration with vendors • denial-of-service
Measurement Facilities • IPMON System • Collects packet traces by passively tapping onto the fiber using optical splitters • supports OC-3 to OC-48 data rates • Data Repository • Large tape library to archive data • Analysis Platform • Initially 17 nodes computing cluster • SAN under deployment
IPMON Architecture Linux PC with multiple PCI buses
Current Status of IPMONs • Currently operational in one major west coast POP on OC3 links • Under way in two major east coast POPs for OC3 and OC12 -- (we hope by July 2001) • OC48 in preparation for 1 east coast POP and 1 west coast POP -- summer 2001 • Future: Sprint Dial-Up Network, more POPs, European network
Practical Constraints • Difficult to monitor operational network : • Complex procedure for deploying equipment • POPs evolve too fast • Too costly to be ubiquitous • Technology limitations (PCs, disks, etc.) • Only off-line analysis is possible • Are 44 bytes enough?
Ongoing Projects • Routing and Traffic Dynamics • Delay measurement across a router • TCP flow analysis • Denial of service • Bandwidth provisioning and pricing
Routing and Traffic Dynamics Project • Part 1: what are the traffic demands between pairs of POPs? • How stable is this demand? • Part 2: what are the paths taken by those demands? • Are link utilizations levels similar throughout the backbone? • Part 3: is there a better way to spread the traffic across paths? • At what level of traffic granularity should traffic be split up?
Motivation Understand traffic demands between POP pairs
City A City B City C City A City B City C POP-to-POP Traffic Matrix For every ingress POP : Identify total traffic to each egress POP Further analyze this traffic Measure traffic over different timescales Divide traffic per destination prefix, protocol, etc.
Applications • Intra-domain routing • Analyzing routing anomalies • Verify BGP Peering • Capacity planning and dimensioning • POP architecture
The Mapping Problem What is the egress POP for a packet entering the a given ingress POP?
Recursive BGP lookup to find last Sprint hop Mapping BGP destinations to POPs (Dst,Next-Hop) Find best Next-Hop BGP table (Next-Hop, POP map) Map Dst to POP Get Unique Next-Hops Unique Next-Hops Map to POP (BGP Dst,POP) (Next-Hop, Last Sprint Hop)
Data Processing • Step 1: Use BGP tables to generate [prefix, egress POP] map • Step 2: Run IP lookup software on packet trace using above map • Output : single trace file for each egress-POP, e.g. all packets headed to POP k from monitored POP • Step 3: Use our traffic analysis tool for statistics evaluation.
Access Access Access Access Monitored links at a single POP Peer 2 Peer 1 Core Core Core ISP web hosting
Trace Length (hours) Access Link Type Webhost 1 19 13 Webhost 2 24 Peer 1 Peer 2 15 8 ISP Data • 5 traces collected on Aug 9, 2000
Day-Night Variation : Webhost #1 % reduction at night between 20-50% depending upon access link
Summary • Wide disparity in “traffic demands” among egress POPs • POPs can be roughly categorized as : small, medium, large; and they maintain their rank during the day. • Traffic is heterogeneous in space yet stable in time. • Traffic varies by (access link, egress POP pair) • Hard to characterize time-of-day behaviour • 20-50% reduction at night
Routing and Traffic Dynamics Project • Part 1: what are the traffic demands between pairs of POPs? • How stable is this demand? • Part 2: what are the paths taken by those demands? • Are link utilizations levels similar throughout the backbone? • Part 3: is there a better way to spread the traffic across paths? • At what level of traffic granularity should traffic be split up?
What we’ve seen so far Wide disparity in traffic demands between (ingress, egress) POP pairs + Wide disparity in link utilization levels, plus many underutilized routes + Routing Policies concentrate traffic on few paths Question: Can we divert some traffic to the lightly loaded paths?
Routing and Traffic Dynamics Project • Part 1: what are the traffic demands between pairs of POPs? • How stable is this demand? • Part 2: what are the paths taken by those demands? • Are link utilizations levels similar throughout the backbone? • Part 3: is there a better way to spread the traffic across paths? • At what level of traffic granularity should traffic be split up?
Creating traffic aggregates • To address issues of splitting traffic over multiple paths, need to define “streams” within traffic • How should packets be aggregated into streams? • Coarse granularity: POP-to-POP • Very fine granularity: use 5-tuple • Initial criterion : destination address prefix
Elephants and Mice among /8 streams Traffic grouped by egress POPs Stream : all packets in a group with same /8 destination address prefix Ingress : Webhost Link
Observations about prefix-based streams Recursive : /8 elephant has a few /16 elephants and many mice, likewise at /24 level Phenomenon is less pronounced at /24 level Qn : Are elephants stable? • Definition: • Ri(n) = the rank of flow i at time slot n • Di,n,k= | Ri(n) - Ri(n+k) | • each time slot corresponds to 30 minutes
Frequency of Rank Changes Conclusion : For load balancing, route elephants along different paths
Conclusions • Monitoring and measurement is key to better network design • IPMon : a passive monitoring system for packet-level information • We have used our data to build components of traffic matrices for traffic engineering • Backbone traffic can be better load-balanced : destination-prefix is a possible (simple) criterion
Ongoing Work • Intra-domain Routing : • Choosing ISIS link weights • Load balancing in the backbone • Flow Characterization • Building Traffic Matrices • POP modeling