390 likes | 484 Views
Implementing and Maintaining an ISP Backbone. Kevin Butler. Tier 1 ISP Backbones. Comprise some of the world’s largest IP networks Tier 1 companies include Sprint, AT&T, PSINet
E N D
Implementing and Maintaining an ISP Backbone Kevin Butler
Tier 1 ISP Backbones • Comprise some of the world’s largest IP networks • Tier 1 companies include Sprint, AT&T, PSINet • UUNET has the world’s largest IP data network (by number of POPs), presence on five continents (North and South America, Europe, Asia, Australia)
Service Level Agreements • SLAs are an important and prestigious tool in attracting and maintaining customers • Comprised of uptime guarantees and bounds on latency through various geographic regions • most ISPs currently have latency < 65ms monthly average between regional hubs in the US
Current SLA latency times • Looking at the North American Backbone over past 24 hours (ICMP tests) • UUNET: 64.9 ms • SprintLink: 69.3 ms • AT&T: 68.7 ms • Cable & Wireless: 60.8 ms • PSINet: 80 ms source: http://ratings.miq.net
Supporting the Customer • Quality and expertise of first-line customer support varies wildly between companies • depending on size, geographic location and company focus, some front-line support teams outsourced to third parties • some in-house high level support teams have skills equivalent or superior to NOCs
Network Operations Centres • Generally the teams concerned with backbone maintenance and support • trend towards consolidation into “Super-NOCs” (eg. one for Americas, one for Europe) • specialisation within NOC for product support (eg. dial, VPN, backbone NOCs)
NOC Tools • NOCOL - Network Operations Centre On Line (freeware UNIX) • Mediahouse monitoring (mainly web) • Micromuse Netcool - used by WorldCom, PSINet, BT
Some Circuit Terminology • DS-1 = 1.544 Mbps, refers to “digital signal”, the actual physical layer component • Often used interchangeably with “T1”, referring to the carrier on the line • DS-3 (T3) = 44.736 Mbps or 28 DS-1s • PRI: “primary rate interface”, equivalent to a DS-1 • BRI: basic rate interface, made up of 2 B (bearer) channels and 1 D channel: B channel is 56/64 kbps (depending on switching limitations), 23 B + 1 64 kbps D channel make a PRI (each B channel is a DS-0 circuit) • Note: 24 DS-0 = 1.536 Mbps – remainder of bandwidth comes as a synchronizing Frame bit after a byte transferred from all 24 channels (so this is bit 193)
Optical Carrier • OC-x rates based on multiplexing SONET streams • SONET – synchronous optical network: defines a standard optical TDM system with common standards and compatibility across continents (devised at Bellcore) – Europe uses SDH, very similar to SONET • OC-3 = 155.54 Mbps, commonly goes up in multiples of four in North America and Europe (OC-12 = 622 Mbps, OC48 ~ 2.5 Gbps, OC-192 ~ 10 Gbps)
Dial Access • Dial is a major selling point, especially with customers who travel a lot or are their own ISPs • connections made through a dial concentrating unit eg. Ascend (Lucent) MAX TNT, which can support up to 720 concurrent callers • back-end is a DS-3 into a backbone router, routers advertised by an IGP (eg. RIP)
Dial-Related Technologies • COBRA (Central Office Based Remote Access) allow building of virtual POPs by backhauling PRIs • RADIUS (Remote Authentication Dial In User Service) – authenticates and can provide some routing and netblock information about customer logging in
Integrated Services Digital Network • ISDN customers authenticate by RADIUS similar to dial users • Most customers use BRI (2 B channels for 128 kbps data rate) • underlying architecture similar but dial equipment often administrated differently • ISDN maintained within same AS as backbone whereas dial often in its own AS
DS-1 and high-speed access • Customer connections usually multiplexed, come into DSU (data service unit) as a channelised DS-3 • gateway routers on ISP side usually Cisco 7500 series, increasingly using Cisco 12000 • customers connect using Cisco 1604, 2621, some 3600 series, very large customers use 7500 series routers
Gateway Routers • obtain routes from customers usually statically, but sometimes by BGP • usually run link-state IGP within AS (eg. OSPF, IS-IS) • Cisco 7513 backplanes 1.8 Gbps while 12008 does 40 Gbps
Where does traffic go from here? • Most ISPs have two levels of networks above the access router • Metropolitan networks aggregate gateway traffic, generally city-wide if multiple points of presence (POPs) in city • transit networks aggregate metro network’s traffic, responsible for inter-city transport
TR TA TA TR TRANSIT METRO TheBig Picture XR XR HA HA HA HA EDGE DR GW GW DR
POPs and NAPs as real estate • Often located in the centre of cities (Ameritech NAP in Chicago, right) • 60 Hudson St, NYC is a “telco hotel”, large number of telecoms companies have equipment there • Industrial buildings (because of high HVAC use) and often nondescript (both for cost and security reasons)
ATM Switches • Terminate long-haul OC-12, OC-48 circuits and metro rings • Choice of vendor contingent on ISP, commonly Newbridge, Fore Systems (ASX-1000 and ASX-4000)
Example of an ATM interface TR1.EG1: interface ATM2/0 description To HA13.BLAH1 3C1 atm vc-per-vp 512 atm pvc 16 0 16 ilmi ! interface ATM2/0.195 point-to-point description To XR1.BLAH1 ATM6/0 ip address 146.188.200.98 255.255.255.252 ip router isis Net-Backbone atm pvc 195 0 195 aal5snap clns router isis Net-Backbone
Tying it all Together • ATM devices perform switching functions at layer two level • Within regional areas, routers use intra-domain routing protocols • To communicate with other regions and across peering points, an inter-domain routing protocol is used
Slash Notation • Subnet masks can be an unwieldy thing to deal with, eg. 255.255.255.240 • Slash notation simplifies this: the number after the slash refers to the number of bits to be ANDed to create the network identifier • 192.168.1.0 255.255.255.0 = 192.168.1.0/24 • Nifty trick: number of hosts in a netblock easy to determine with slash notation - # usable hosts in /x = 2^(32-x) – 2 • Therefore, there are 256 addresses in a /24, 254 usable
Routing Protocols • Intra-domain (IGPs) • Distance-vector (RIP, IGRP) • Link-state (OSPF, IS-IS, EIGRP) • Inter-domain (EGPs) • Path-vector: BGP • Routes by number of hops between autonomous systems, hence uses a vector comprised of AS sequence numbers instead of next IP address
Autonomous Systems • An autonomous system (AS) is a group of routers with a single routing policy, running under a single administration • Different ISPs, and large companies, can have their own AS number • Where to get a number? In North America, ARIN (American Registry for Internet Numbers), in Europe, RIPE (Réseaux IP Européens), in Asia APNIC (Asia-Pacific Network Information Centre) – also the places for getting IP addresses
Implementation of BGP • BGP runs between autonomous systems and peers, as well as multi-homed customers • monolithic AS broken up into BGP confederations for ease of work • Why BGP? Policies can be defined and routes controlled to a highly customisable degree using access lists and route maps – one can choose what routes to distribute to which neighbours • BGP can run inside an AS – internal (IBGP) carries transit traffic through the AS (like an Interstate through a county)
Communities are destinations that share common attributes (eg. through access-list filters) BGP table version is 23718690, local router ID is 205.150.242.2 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *>i24.64.0.0/19 198.133.49.7 100 0 6327 6172 i *>i24.64.0.0/14 198.133.49.7 100 0 6327 i *>i24.64.32.0/19 198.133.49.7 100 0 6327 6172 i *>i24.64.64.0/19 198.133.49.7 100 0 6327 6172 i *>i24.64.96.0/19 198.133.49.7 100 0 6327 6172 i *>i24.64.192.0/19 198.133.49.7 100 0 6327 6172 i *>i24.64.224.0/19 198.133.49.7 100 0 6327 6172 i *>i24.65.0.0/19 198.133.49.7 100 0 6327 6172 i *>i24.65.96.0/19 198.133.49.7 100 0 6327 6172 i *>i24.65.128.0/19 198.133.49.7 100 0 6327 6172 i BGP
Advantages of BGP for User • Allows for load-sharing and redundancy • routes can be biased through AS path prepending (adding the same AS number to a route multiple times to make it a less favourable route to take) • requirement is high-quality router with close to 100% uptime to avoid connection flaps and subsequent route dampening (BGP gets annoyed if connections go up and down frequently and will penalise the offending network)
Common Customer Issues • Static routes on backbone - often difficult to spot, can cause very strange routing results (very conducive to routing loops) • pull-up routes for netblocks smaller than /24, required to avoid BGP dampening (smaller customers tend to reset their equipment more often) • BGP recalculations - if done on a transit router, entire backbone segments can experience outages (tables are huge, currently over 103,000 prefixes in table)
Customer Requirements of the Backbone • Redundancy - networks are redundant but card failures can take down whole routers • physical connection to POP from customer is SPF • low latency - massive increases in demand on backbone makes this difficult • over $2 million a day spent on global backbone upgrades
DSL: low cost, high speed • DSL might phase out ISDN connections • difficult to troubleshoot from network standpoint • connections pass through telco’s frame or ATM cloud between DSLAM (DSL access multiplexor – separates voice and data traffic by frequency) and VR • RedBack SMS (Subscriber Management System) 1000 commonly used as VR, though currently the SMS 10000 is the largest “carrier-class” routing switch, can take in 24 OC-12s)
RedBack SMS 1000 • Supports up to 4000 sessions • OC-3 out to metro network • traffic-shaping accomplished with profiles atm profile samplecust counters shaping vbr-nrt pcr 1000 cdvt 100 scr 100 bt 10
Increasing Capacity • Backbone capacity increasing at a huge rate • Traffic engineering combined with high backplane becoming increasingly important • many ISPs turning to Juniper routers • UUNET rolled out production OC-192c with Juniper M160 running MPLS
Juniper Routers • Specialises in huge routers (M160 backplanes 160 Gbps) • JUNOS supports MPLS and RSVP isis { interface all; } ospf { area 0.0.0.0 { interface so-0/0/0 { metric 15; retransmit-interval 10; hello-interval 5; } } } [edit]
Network Abuse • Spam-killing – looking at SMTP header for IP address, null-routing it • Open relay detection – ORBS et. al. • DDoS attacks can be very detrimental to backbone (even causing switch crashes) • Combated by rate-limiting ICMP on routers • Most effective defense is community-wide egress filtering; requires co-operation throughout the Internet
Network Challenges eg. Canada • Geographically, population resides in virtually a straight line across the south • major focus is on southbound capacity to the US • CRTC regulations on telcos create different arrangements • heterogeneous network to the US, integration a big issue
Costs • Network equipment not cheap: a Cisco GSR can cost upwards of a quarter million dollars • Fibre and transceivers can be expensive to lay ($100K/mile near rail, over $300K/mile in the city) • Interesting note: Sprint grew its all fibre network quickly because it was laid on railway right-of-way (the SPR in Sprint initially stood for Southern Pacific Railway) • Costs for backbone access? Currently ~ $1300 CDN + local loop cost for burstable 128k T1, up to ~ $50 K CDN for a full T3, much more for OC3+ (USD costs similar)
Questions? • Anything I can clarify or expand on... • Thank you!