• 350 likes • 486 Views
Inter-domain Routing. Don Fussell CS 395T Measuring Internet Performance. Internet Routing. Two-level architecture, two protocol classes IGP: Internal Gateway Protocol Within an organization’s network Optimized protocol Intra-domain routing protocol EGP: External Gateway Protocol
E N D
Inter-domain Routing Don Fussell CS 395T Measuring Internet Performance
Internet Routing • Two-level architecture, two protocol classes • IGP: Internal Gateway Protocol • Within an organization’s network • Optimized protocol • Intra-domain routing protocol • EGP: External Gateway Protocol • Between organizations’ networks • Policy routing • Inter-domain routing protocol
Internal Gateway Protocol • Runs within an Autonomous System (AS) • An AS is a collection of routers (not a collection of IP addresses or prefixes) • Can provide optimal paths between nodes (according to some cost metric) • Examples • RIP (Routing Information Protocol • OSPF (Open Shortest Path First) • IS-IS (Intermediate System to Intermediate System) • IGRP, EIGRP (CISCO proprietary)
External Gateway Protocol • Allows different ASs to exchange routing information • Policy routing – Control can be exerted over the information that crosses the border between Ass • Based on cost metrics, but do not necessarily optimize like IGPs do • Examples • BGP4 (Border Gateway Protocol, de facto standard) • EGP (External Gateway Protocol, specific not generic) • GGP (Gateway to Gateway Protocol) • Hello
Distance Vector Protocols • Simple to understand and implement • Poor scalability, based on transmitting routing tables between routers • Require periodic retransmission of routing information as routing tables expire • Limited to small networks with simple topologies • Can exhibit “counting to infinity” behavior in the presence of link failures • Example – RIP (Routing Information Protocol)
Link State Protocols • Routers exchange Link State Packets (LSPs), not routing tables • LSP information from a router flooded to rest of network • Only regenerates this information based on topology changes • Good scalability - amount of information sent proportional to topology change, not number of IP prefixes • Each router maintains local map of entire network (AS), called Link State Database (LSDB), and constructs shortest path information using Dijkstra’s algorithm • Examples – OSPF, IS-IS
Classless Inter-Domain Routing (CIDR) • The Internet is a collection of networks – hence an IP address contains two parts, a network identifier and a host identifier • Networks within the Internet have different numbers of hosts, hence originally networks were divided into classes • Network classes • Class A – 0 in high order bit, network id is in first octet, host address is in the last three octets • 128 class A networks each with 16.7 million host addresses • Class B – 10 in high order two bits, network id is in first two octets, host address is in the last two octets • 16,384 class B networks each with 65,535 host addresses • Class C – 110 in high order three bits, network id is in the first three octets, host address is in the last octet • 2.1 million class C networks each with 255 host addresses • Class D – for multicast • Class E – reserved and unused • This architecture is now obsolete
Classless Addressing • Rapid growth of Internet outpaced class based addressing • Routing tables growing too large • Running out of IP address space • CIDR primarily addresses routing table problem • Basic idea – get rid of implicit netmasks, pass explicit netmasks in inter-domain routing protocols • CIDR allows service providers to aggregate classful networks and provide single summarized routing advertisements to other domains, thus controlling the growth of routing tables • Addresses can overlap, forwarding must use longest matching prefix
CIDR Advantages • Reduced the size of the Internet routing table • Reduced the growth rate of the Internet routing table • Allows current generation routers to handle Internet addressing and forwarding • Extended the lifetime of IPv4 addressing
CIDR Issues • Address allocation must be done in such a way as to allow aggregation • BGP4, which was created to support CIDR, must also be configured to support aggregation • Multihoming – having more than one link to the Internet – how to aggregate • Proxy aggregation – One AS performs aggregation of addresses contained within another
BGP Outline • Based on Distance Vector algorithms • Uses TCP as transport protocol • A BGP session involves two nodes • Routers can be involved in several concurrent BGP sessions • BGP message types • Open session • Activate new routes to prefixes • Deactivate old routes to prefixes • Report unusual conditions • Close session • Advertised routes are actively being used by advertiser • Prefix advertisement attributes • Next hops • Route preference metrics • AS path of routing announcement • How the prefix entered the routing table of the source AS • BGP is extensible – new attributes can be added as needed
TCP Connection Failed Idle Active Connect TCP Connection Attempted TCP Connection Failed TCP Connection Established Connection Rejected or Error Error Open Confirm Established Open Sent Connection Accepted Open Received BGP State Machine
BGP Message Types • Open • Update • Notification • Keepalive
Open Message • Version (1 octet) • My Autonomous System (2 octets) • Hold time (2 octets) • BGP identifier (4 octets) • Optional parameters length (1 octet) • Optional parameters (variable length • Type (1 octet) • Length (1 octet) • Value (variable)
OPEN Optional Parameters • 1 – Authentication information (1 octet authentication code and variable length information field. Not really used.) • 2 – Capability negotiation
Update Message • Withdrawn (unfeasible) routes length (2 octets) • Withdrawn (unfeasible) routes (variable) • IP prefix length in bits (1 octet) • IP prefix (variable) • Total path attributes length (2 octets) • Path attributes (variable) • Network layer reachability information (variable)
Attribute Encoding • Attribute Type (2 octets) • Attribute Flags (1 octet) • Attribute Type Code (1 octet) • Attribute Length (1 or 2 octets) • Attribute Value (variable)
Attribute Flags • Bit 1 – Optional • 0 = well-known, required in all BGP implementations • 1 = optional • Bit 2 – Transitive • 0 = non-transitive, not passed to other peers • 1 = transitive, must be passed on to others • Bit 3 – Partial • 1 = some router didn’t understand optional transitive attribute • 0 = otherwise, must be 0 for well-known and optional nontransitive attributes • Bit 4 – Extended Length • 0 = attribute length represented in 1 octet • 1 = attribute length represented in 2 octets
Notification Message • Error code (1 octet) • Error subcode (1 octet) • Data (variable)
Error Codes • 1 – Message Header Error • 2 – OPEN Message Error • 3 – UPDATE Message Error • 4 – Hold Timer Expired • 5 – Finite State Machine Error • 6 – Cease
Message Header Error Subcodes • 1 – Connection Not Synchronized • 2 – Bad Message Length • 3 – Bad Message Type
OPEN Message Error Subcodes • 1 – Unsupported Version Number • 2 – Bad Peer AS • 3 – Bad BGP Identifier • 4 – Unsupported Optional Parameter • 5 – Authentication Failure • 6 – Unacceptable Hold Time
UPDATE Message Error Subcodes • 1 – Malformed Attribute List • 2 – Unrecognized Well-known Attribute • 3 – Missing Well-known Attribute • 4 – Attribute Flags Error • 5 – Attribute Length Error • 6 – Invalid ORIGIN Attribute • 7 – AS Routing Loop • 8 – Invalid NEXT-HOP Attribute • 9 – Optional Attribute Error • 10 – Invalid Network Field • 11 – Malformed AS-PATH
Keepalive • Common header, no data
Model of Operation • Each peer contains three locations • Adj-RIB-In (Adjacent Routing Information Base In) • 1 per peer (BGP session) • Contains prefixes learned from that peer • Loc-RIB (Local Routing Information Base) • 1 per system • Contains prefixes selected for use • Adj-RIB-Out (Adjacent Routing Information Base Out) • 1 per peer (BGP session) • Contains prefixes advertised to that peer
Standard Attributes • 1 – Origin (well-known) • Indicates how a given prefix came into BGP at the AS originating the prefix announcement • 1 – IGP: The prefix was learned from an IGP • 2 – EGP: The prefix was learned through BGP • 3 – INCOMPLETE: The prefix was learned through some mechanism other than IGP or EGP, in practice these are the static routes
Standard Attributes • 2 – AS-PATH (well-known) • Contains sequence of ASNs through which the announcement has passed • Primarily used for loop detection/prevention • If a peer’s ASN appears in the AS-PATH, the announcement is generally rejected, although some implementations can be configured to accept such a route for partition healing. • Encoded as sequence of AS-PATH segments • Each has a TYPE ( 1 octet), LENGTH (1 octet), VALUE (list of length LENGTH of 2 octet ASNs) • TYPE is either AS-SET or AS-SEQUENCE, allows for aggregation of routes received via different AS-PATHS
Standard Attributes • 3 – NEXT-HOP (well-known) • Address of the node to send packets to get them to the advertised prefix • Often the same as the speaker’s IP address • Can be different (third-party next hop), otherwise would be redundant • Requires special configuration, need not be accepted by listener • Can be useful when several routers are on a LAN but only some of them speak BGP
Standard Attributes • 4 – MULTI-EXIT-DISCRIMINATOR (MED) (optional, nontransitive, 4-octet unsigned integer) • Used when two ASs connect to each other at multiple places • Carries a metric expressing a degree of preference for the link in the advertisement for routing to a prefix • Sent by one AS, used by another, thus typically used in provider-subscriber relationships
Standard Attributes • 5 – LOCAL-PREF (well-known, discretionary, 4 octet unsigned integer) • Generally used locally by an AS to express preferences for routes to a prefix when multiple routes to different ASs are known • Different from MED in that it isn’t passed by one AS to another, and doesn’t only apply to multiple connections between a pair of ASs
Standard Attributes • 6 – ATOMIC-AGGREGATE (well-known, discretionary, 0 length used as a flag) • Indicates that the advertised prefix has been aggregated • Some parts of paths to parts of the aggregate address space advertised may not appear in the AS-PATH • The receiver of the advertisement should not deaggregate the prefix into more specific BGP entries
Standard Attributes • 7 – AGGREGATOR (optional, transitive, 2 octet ASN, 4 octet IP address) • Indicates the AS and router that performed the aggregation of the announced prefix
Internal and External BGP • How do multiple routers speaking BGP within a single AS exchange routing information? • Could use IGP such as OSPF, but the volume of routing table information and frequency of updates typically transmitted by BGP would break LSPs • A preferred way is to use Internal BGP (I-BGP) • Strictly speaking, we should call the typical EGP use of BGP E-BGP • Basically, the two are the same, with the key difference that prefixes learned from an E-BGP neighbor can be advertised to an I-BGP neighbor and vice versa, but a prefix learned from an I-BGP neighbor cannot be advertised to another I-BGP neighbor • This presents looping routing announcements within an AS, the AS-PATH attribute is useless for this within one AS • It also leads to the requirement of a full-mesh of logical connections between I-BGP peers within an AS
BGP Route Selection • How does a system choose among multiple routes for the same (identical, not overlapping) prefix? • The route with the highest LOCAL-PREF is selected first • If no unique route is found, then the route with the shortest AS-PATH is selected from among those previously selected, • If this does not produce a unique route, then if the system accepts MED and the multiple routes were learned from a single neighboring AS, the route with the lowest MED value is selected • If multiple routes are still available, then choose the route with the minimum cost to the NEXT-HOP according to the IGP in use • If no unique route has been chosen, and exactly one of the routes was learned by E-BGP, choose that one. • If no unique route has been chosen, and all routes were learned via I-BGP, then choose the route learned from the I-BGP neighbor with the lowest BGP ID