470 likes | 541 Views
15-441 Computer Networking. Inter-Domain Routing BGP (Border Gateway Protocol). scale: with 50 million destinations: can’t store all dest’s in routing tables! routing table exchange would swamp links!. administrative autonomy internet = network of networks
E N D
15-441 Computer Networking Inter-Domain Routing BGP (Border Gateway Protocol)
scale: with 50 million destinations: can’t store all dest’s in routing tables! routing table exchange would swamp links! administrative autonomy internet = network of networks each network admin may want to control routing in its own network Hierarchical Routing Our routing study thus far - idealization • all routers identical • network “flat” … not true in practice Lecture #11: 10-02-01
aggregate routers into regions, “autonomous systems” (AS) routers in same AS run same routing protocol “intra-AS” routing protocol routers in different AS can run different intra-AS routing protocols special routers in AS run intra-AS routing protocol with all other routers in AS also responsible for routing to destinations outside AS run inter-AS routing protocol with other gateway routers gateway routers Hierarchical Routing Lecture #11: 10-02-01
c b b c a A.c A.a C.b B.a Intra-AS and Inter-AS routing • Gateways: • perform inter-AS routing amongst themselves • perform intra-AS routers with other routers in their AS b a a C B d A network layer inter-AS, intra-AS routing in gateway A.c link layer physical layer Lecture #11: 10-02-01
Inter-AS routing between A and B b c a a C b B b c a d Host h1 A A.a A.c C.b B.a Intra-AS and Inter-AS routing Host h2 Intra-AS routing within AS B Intra-AS routing within AS A Lecture #11: 10-02-01
Why different Intra- and Inter-AS routing ? Policy: • Inter-AS: admin wants control over how its traffic routed, who routes through its net. • Intra-AS: single admin, so no policy decisions needed Scale: • hierarchical routing saves table size, reduced update traffic Performance: • Intra-AS: can focus on performance • Inter-AS: policy may dominate over performance Lecture #11: 10-02-01
Outline • External BGP (E-BGP) • Internal BGP (I-BGP) • Multi-Homing Lecture #11: 10-02-01
History • Mid-80s: EGP • Reachability protocol (no shortest path) • Did not accommodate cycles (tree topology) • Evolved when all networks connected to NSF backbone • Result: BGP introduced as routing protocol • Latest version = BGP 4 • BGP-4 supports CIDR • Primary objective: connectivity not performance Lecture #11: 10-02-01
Choices • Link state or distance vector? • No universal metric – policy decisions • Problems with distance-vector: • Bellman-Ford algorithm may not converge • Problems with link state: • Metric used by routers not the same – loops • LS database too large – entire Internet • May expose policies to other AS’s Lecture #11: 10-02-01
Solution: Distance Vector with Path • Each routing update carries the entire path • Loops are detected as follows: • When AS gets route, check if AS already in path • If yes, reject route • If no, add self and (possibly) advertise route further • Advantage: • Metrics are local - AS chooses path, protocol ensures no loops Lecture #11: 10-02-01
Snapshot of Routing Table CIDR block next hop MED PREF AS PATH *>i12.16.212.0/23 206.157.77.73 10 100 0 3561 6347 6411 i * i 137.39.166.122 10 100 0 1239 6347 6411 i *>i12.16.244.0/22 165.87.33.4 10 100 0 2685 5673 6201 i *>i12.17.10.0/23 157.130.9.110 20 100 0 (65535 65518 65525 65488) 6507 i *>i12.18.74.0/24 157.130.192.14 100 0 7018 11154 i *>i12.18.236.0/23 137.39.166.122 10 100 0 1239 11107 i *>i12.18.240.0/22 137.39.166.122 10 100 0 1239 5650 6188 6188 11741 i * i12.20.66.0/23 206.157.77.73 10 100 0 3561 11589 11589 11589 11589 11589 i *>i 206.157.77.77 10 100 0 3561 11589 11589 11589 11589 11589 i *>i12.20.92.0/24 206.157.77.77 10 100 0 3561 11857 i * i 206.157.77.73 10 100 0 3561 11857 i *>i12.20.166.0/24 165.117.52.233 10 100 0 2548 11235 i * i 157.130.192.14 100 0 7018 11235 i Taken from a UUNet router in Palo Alto Lecture #11: 10-02-01
Interconnecting BGP Peers • BGP uses TCP to connect peers • Advantages: • Simplifies BGP • No need for periodic refresh - routes are valid until withdrawn, or the connection is lost • Incremental updates • Disadvantages • Congestion control on a routing protocol? • Poor interaction during high load Lecture #11: 10-02-01
Hop-by-hop Model • BGP advertises to neighbors only those routes that it uses • Consistent with the hop-by-hop Internet paradigm • e.g., AS1 cannot tell AS2 to route to other AS’s in a manner different than what AS2 has chosen (need source routing for that) Lecture #11: 10-02-01
AS Categories • Stub: an AS that has only a single connection to one other AS - carries only local traffic. • Multi-homed: an AS that has connections to more than one AS, but does not carry transit traffic • Transit: an AS that has connections to more than one AS, and carries both transit and local traffic (under certain policy restrictions) Lecture #11: 10-02-01
AS Categories AS1 AS3 AS1 AS2 AS1 AS3 AS2 Transit Stub AS2 Multi-homed Lecture #11: 10-02-01
Policy with BGP • BGP provides capability for enforcing various policies • Policies are not part of BGP: they are provided to BGP as configuration information • BGP enforces policies by choosing paths from multiple alternatives and controlling advertisement to other AS’s Lecture #11: 10-02-01
Examples of BGP Policies • A multi-homed AS refuses to act as transit • Limit path advertisement • A multi-homed AS can become transit for some AS’s • Only advertise paths to some AS’s • An AS can favor or disfavor certain AS’s for traffic transit from itself Lecture #11: 10-02-01
BGP Common Header 1 2 3 0 Marker (security and message delineation) 16 bytes Length (2 bytes) Type (1 byte) Types: OPEN, UPDATE, NOTIFICATION, KEEPALIVE Lecture #11: 10-02-01
BGP Messages • Open • Announces AS ID • Determines hold timer – interval between keep_alive or update messages, zero interval implies no keep_alive • Keep_alive • Sent periodically (but before hold timer expires) to peers to ensure connectivity. • Sent in place of an UPDATE message • Notification • Used for error notification • TCP connection is closed immediately after notification Lecture #11: 10-02-01
BGP UPDATE Message • List of withdrawn routes • Network layer reachability information • List of reachable prefixes • Path attributes • Origin • Path • Metrics • All prefixes advertised in message have same path attributes Lecture #11: 10-02-01
Path Selection Criteria • Information based on path attributes • Attributes + external (policy) information • Examples: • Hop count • Policy considerations • Preference for AS • Presence or absence of certain AS • Path origin • Link dynamics Lecture #11: 10-02-01
LOCAL PREF • Local (within an AS) mechanism to provide relative priority among BGP routers R5 R1 AS 200 R2 AS 100 AS 300 R3 Local Pref = 500 Local Pref =800 R4 I-BGP AS 256 Lecture #11: 10-02-01
AS_PATH • List of traversed AS’s AS 200 AS 100 170.10.0.0/16 180.10.0.0/16 AS 300 180.10.0.0/16 300 200 100 170.10.0.0/16 300 200 AS 500 Lecture #11: 10-02-01
CIDR and BGP AS X 197.8.2.0/24 AS T (provider) 197.8.0.0/23 AS Z AS Y 197.8.3.0/24 What should T announce to Z? Lecture #11: 10-02-01
Options • Advertise all paths: • Path 1: through T can reach 197.8.0.0/23 • Path 2: through T can reach 197.8.2.0/24 • Path 3: through T can reach 197.8.3.0/24 • But this does not reduce routing tables! We would like to advertise: • Path 1: through T can reach 197.8.0.0/22 Lecture #11: 10-02-01
Sets and Sequences • Problem: what do we list in the route? • List T: omitting information not acceptable, may lead to loops • List T, X, Y: misleading, appears as 3-hop path • Solution: restructure AS Path attribute as: • Path: (Sequence (T), Set (X, Y)) • If Z wants to advertise path: • Path: (Sequence (Z, T), Set (X, Y)) • In practice used only if paths in set have same attributes Lecture #11: 10-02-01
Multi-Exit Discriminator (MED) • Hint to external neighbors about the preferred path into an AS • Non-transitive attribute (we will see later why) • Different AS choose different scales • Used when two AS’s connect to each other in more than one place Lecture #11: 10-02-01
MED • Hint to R1 to use R3 over R4 link • Cannot compare AS40’s values to AS30’s 180.10.0.0 MED = 50 R1 R2 AS 10 AS 40 180.10.0.0 MED = 120 180.10.0.0 MED = 200 R3 R4 AS 30 Lecture #11: 10-02-01
MED • MED is typically used in provider/subscriber scenarios • It can lead to unfairness if used between ISP because it may force one ISP to carry more traffic: ISP1 SF ISP2 NY • ISP1 ignores MED from ISP2 • ISP2 obeys MED from ISP1 • ISP2 ends up carrying traffic most of the way Lecture #11: 10-02-01
Other Attributes • ORIGIN • Source of route (IGP, EGP, other) • NEXT_HOP • Address of next hop router to use • Used to direct traffic to non-BGP router • Check out http://www.cisco.com for full explanation Lecture #11: 10-02-01
Typical Decision Process • Processing order of attributes: • Select route with highest LOCAL-PREF • Select route with shortest AS-PATH • Apply MED (if routes learned from same neighbor) Lecture #11: 10-02-01
Outline • External BGP (E-BGP) • Internal BGP (I-BGP) • Multi-Homing Lecture #11: 10-02-01
Internal vs. External BGP • BGP can be used by R3 and R4 to learn routes • How do R1 and R2 learn routes? • Option 1: Inject routes in IGP • Only works for small routing tables • Option 2: Use I-BGP R1 E-BGP AS1 R3 R4 AS2 R2 Lecture #11: 10-02-01
Internal BGP (I-BGP) • Same messages as E-BGP • Different rules about re-advertising prefixes: • Prefix learned from E-BGP can be advertised to I-BGP neighbor and vice-versa, but • Prefix learned from one I-BGP neighbor cannot be advertised to another I-BGP neighbor • Reason: no AS PATH within the same AS and thus danger of looping. Lecture #11: 10-02-01
Internal BGP (I-BGP) • R3 can tell R1 and R2 prefixes from R4 • R3 can tell R4 prefixes from R1 and R2 • R3 cannot tell R2 prefixes from R1 • R2 can only find these prefixes through a direct connection to R1 • Result: I-BGP routers must be fully connected (via TCP)! • contrast with E-BGP sessions that map to physical links R1 E-BGP AS1 R3 R4 AS2 R2 I-BGP Lecture #11: 10-02-01
Link Failures • Two types of link failures: • Failure on an E-BGP link • Failure on an I-BGP Link • These failures are treated completely different in BGP • Why? Lecture #11: 10-02-01
Failure on an E-BGP Link • If the link R1-R2 goes down • The TCP connection breaks • BGP routes are removed • This is the desired behavior AS1 AS2 E-BGP session R1 R2 Physical link 138.39.1.1/30 138.39.1.2/30 Lecture #11: 10-02-01
Failure on an I-BGP Link • If link R1-R2 goes down, R1 and R2 should still be able to exchange traffic • The indirect path through R3 must be used • Thus, E-BGP and I-BGP must use different conventions with respect to TCP endpoints R2 138.39.1.2/30 Physical link 138.39.1.1/30 R1 R3 I-BGP connection Lecture #11: 10-02-01
Outline • External BGP (E-BGP) • Internal BGP (I-BGP) • Multi-Homing Lecture #11: 10-02-01
Multi-homing • With multi-homing, a single network has more than one connection to the Internet. • Improves reliability and performance: • Can accommodate link failure • Bandwidth is sum of links to Internet • Challenges • Getting policy right (MED, etc..) • Addressing Lecture #11: 10-02-01
Multi-homing to Multiple Providers • Major issues: • Addressing • Aggregation • Customer address space: • Delegated by ISP1 • Delegated by ISP2 • Delegated by ISP1 and ISP2 • Obtained independently ISP3 ISP1 ISP2 Customer Lecture #11: 10-02-01
Address Space from one ISP • Customer uses address space from ISP1 • ISP1 advertises /16 aggregate • Customer advertises /24 route to ISP2 • ISP2 relays route to ISP1 and ISP3 • ISP2-3 use /24 route • ISP1 routes directly • Problems with traffic load? ISP3 138.39/16 ISP1 ISP2 Customer 138.39.1/24 Lecture #11: 10-02-01
Pitfalls • ISP1 aggregates to a /19 at border router to reduce internal tables. • ISP1 still announces /16. • ISP1 hears /24 from ISP2. • ISP1 routes packets for customer to ISP2! • Workaround: ISP1 must inject /24 into I-BGP. ISP3 138.39/16 ISP1 ISP2 138.39.0/19 Customer 138.39.1/24 Lecture #11: 10-02-01
Address Space from Both ISPs • ISP1 and ISP2 continue to announce aggregates • Load sharing depends on traffic to two prefixes • Lack of reliability: if ISP1 link goes down, part of customer becomes inaccessible. • Customer may announce prefixes to both ISPs, but still problems with longest match as in case 1. ISP3 ISP1 ISP2 204.70.1/24 Customer 138.39.1/24 Lecture #11: 10-02-01
Independent Address Space • Offers the most control, but at the cost of aggregation. • Still need to control paths ISP3 ISP1 ISP2 Customer Lecture #11: 10-02-01
Problems • Routing table size • Need an entry for all paths to all networks • Required memory= O((N + M*A) * K) • N: number of networks • M: mean AS distance (in terms of hops) • A: number of AS’s • K: number of BGP peers Lecture #11: 10-02-01
Routing Table Size • Problem reduced with CIDR Networks Mean AS Distance Number of AS’s BGP Peers/Net Memory 2,100 5 59 3 27,000 4,000 10 100 6 108,000 10,000 15 300 10 490,000 100,000 20 3,000 20 1,040,000 Lecture #11: 10-02-01