650 likes | 884 Views
Design of a Diversified Router: Lookup Block with All Associated Data in SRAM. John DeHart jdd@arl.wustl.edu http://www.arl.wustl.edu/arl. Revision History. 5/23/06 (JDD): Changes for all Associated Data in SRAM 5/25/06 (JDD): Put Port # back in MR Results 5/26/06 (JDD):
E N D
Design of aDiversified Router: Lookup BlockwithAll Associated Data in SRAM John DeHartjdd@arl.wustl.edu http://www.arl.wustl.edu/arl
Revision History • 5/23/06 (JDD): • Changes for all Associated Data in SRAM • 5/25/06 (JDD): • Put Port # back in MR Results • 5/26/06 (JDD): • Added data format from Lookup block to downstream neighbor. • 5/30-5/31/06 (JDD): • Clean up definition of data going from Lookup block to Hdr Format blocks. • 6/2/06 (JDD): • Update with new Internal packet format
Issues to investigate • Questions/Issues that came up 5/16/06: • Negation bit • Match everything but this key • Exclusive/Non-exclusive Filters • GM filters for monitoring (makes a copy of packet) • Protocol field “trick” to shorten GM filter Key • 2 bits to define: UDP, TCP, Other • Maybe even expand it to 4 bits. • For Other, full 8 bit protocol field overlaps a TCP/UDP Port field • Even better, use this trick with the TCP_Flags field. • 76 Bytes as minimum size frame for judging performance: • 64 Byte minimum Ethernet Frame • 96 bit (12 byte) Ethernet inter-frame spacing. • To increase the lookup rate we might need to move one of the LC Associated Data storage to SRAM • Probably keep them both in TCAM AD for November and then look at modifying it in the next phase of Lookup block development. • Multicast • Separate Multicast DB • MHL on Multicast DB yielding 8 32-bit AD Results • Actually 29 useful bits per Result • Maximum of 232 bits • QID(20b) and MI(16) specified for each copy • 232/36 = 6 copies • We’d need to get result down to 29 bits to support 8 copies • Is there any way to make use of the Loopback block to make more copies?
Overview • These slides are as much a definition of what is NOT in the Lookup Block as they are what is. • In defining what is not in the Lookup Block I am putting some requirements on other blocks. • These requirements have to do with where fields are added to frame headers. • Not everything can or needs to be kept in the TCAM. • There are also: • Constants • Fields that have to be calculated for each frame • Fields that are configurable per Blade or per physical interface. • Etc. • Also, there is a lot of information about the TCAM here. • And, finally, a design for the Lookup Block(s).
Architecture Review • First lets review the architecture of the Promentum™ ATCA-7010 card which will be used to implement our LC and NP Blades: • Two Intel IXP2850 NPs • 1.4 GHz Core • 700 MHz Xscale • Each NPU has: • 3x256MB RDRAM, 533 MHz • 4 QDR II SRAM Channels • Channels 1, 2 and 3 populated with 8MB each running at 200 MHz • Channel 0 • TCAM with an associated ZBT SRAM • 2 MB of QDR-II SRAM for EACH NPU • 16KB of Scratch Memory • 16 Microengines • Instruction Store: 8K 40-bit wide instructions • Local Memory: 640 32-bit words • TCAM: Network Search Engine (NSE) on SRAM channel 0 • Each NPU has a separate LA-1 Interface • Part Number: IDT75K72234 • 18Mb TCAM
TCAM HW Details • CAM Size: • Data: 256K 72-bit entries • Organized into Segments. • Mask: 256K 72-bit entries • Segments: • Each Segment is 8k 72-bit entries • 32 Segments • Segments are not shared between Databases. • Minimum database size is therefore 8K 72-bit entries. • Databases wider than 72-bits use sequential entries in a segment to make up longer entries • 36b DB has 16K entries per segment • 72b DB has 8K entries per segment • 144b DB has 4K entries per segment • 288b DB has 2K entries per segment • 576b DB has 1K entries per segment • Segments can be dynamically added to a Database as it grows • More on this feature in a future issue of the IDT User Manual…
TCAM HW Details • Number of Databases available: 16 • Database Core Sizes: 36b, 72b, 144b, 288b, 576b • Core size implies how many CAM core entries are used per DB entry • Key/Entry size • Can be different for each Database. • Key/Entry size <= Database Core Size • Key/Entry size tells us how many memory access cycles it will take to get the Key into the TCAM across the 16-bit wide QDR II SRAM interface. • Result Type • Absolute Index: relative to beginning of CAM • Database Relative Index: relative to beginning of Database • Memory Pointer: Translation based on database configuration registers • Base address • Result size • TCAM Associated Data of width 32, 64 or 128 bits
TCAM HW Details • Memory Usage: • Results can be stored in TCAM Associated Data SRAM or IXP SRAM. • TCAM Associated Data • 512K x 36 bit ZBT SRAM (4 bits of parity) • Supports 256K 64-bit Results • If used for Ingress and Egress then 128K in each direction • Supports 128K 128-bit Results • If used for Ingress and Egress then 64K in each direction • Results deposited directly in Results Mailbox • IXP QDR II SRAM Channel • 2 x 2Mx18 (effective 4M x 18b) • 4 times as much as the TCAM ZBT SRAM. • Supports 1024K 64-bit Results • If used for Ingress and Egress then 512K in each direction • Supports 512K 128-bit Results • If used for Ingress and Egress then 256K in each direction • Read Results Mailbox to check Hit bit and to get Index or Memory Pointer • Then read SRAM for actual Result.
TCAM HW Details • Lookup commands supported: • Direct: Command is encoded in 2b Instruction field on Address bus • Indirect: Instruction field = 11b, Command encoded on Data bus. • Lookup (Direct) • 1 DB, 1 Result • Multi-Hit Lookup (Direct) • 1 DB, <= 8 Results • Simultaneous Multi-Database Lookup (Direct) • 2 DB, 1 Result Each • DBs must be consecutive! • Multi-Database Lookup (Indirect) • <= 8 DB, 1 Result Each • Simultaneous Multi-Database Lookup (Indirect) • 2 DB, 1 Result Each • Functionally same as Direct version but key presentation and DB selection are different. • DBs need not be consecutive. • Re-Issue Multi-Database Lookup (Indirect) • <= 8 DB, 1 Result Each • Search Key can be modified for each DB being searched. • First 32 bits of search key can be specified for each • Rest of key is same for each.
TCAM HW Details • Mask Registers Notes (mostly for reference) • When are these used? • I think we will need one of these for each database that is to be used in a Multi Database Lookup (MDL), where the database entries do not actually use all the bits in the corresponding core size. • For example: a 32-bit lookup would have a core size of 36 bits and so would need a GMR configured as 0xFFFFFFFF00 to mask off the low order 4 bits when it is used in a MDL where there are larger databases also being searched. • 64 72-bit Global Mask Registers (GMR) • Can be combined for different database sizes • 36-bit databases have access to 31 out of a total of 64 GMRs • A bit in the configuration for a database selects which half of the GMRs can be used • A field in each lookup command selects which specific GMR is to be used with the lookup key. • Value of 0x1F (31) is used in command to indicate no GMR is to be used. Hence, 36-bit lookups cannot use all 32 GMRs in its half. • 72-bit databases have access to 31 out of a total of 64 GMRs • A bit in the configuration for a database selects which half of the GMRs can be used • A field in each lookup command selects which specific GMR is to be used with the lookup key. • Value of 0x1F (31) is used in command to indicate no GMR is to be used. Hence, 72-bit lookups cannot use all 32 GMRs in its half. • 144-bit lookups have 32 GMRs available to it. • 288-bit lookups have 16 GMRs available to it. • 576-bit lookups have 8 GMRs available to it. • Each lookup command can have one GMR associated with it.
TCAM Usage Notes • Database Types are defined and managed by the IMS Software. • The Type of the Database is defined in the software only. • It tells the software how to define and use masks and priorities (weights). • Allows the software to provide to the user a more flexible way to specify entries. • Types of Databases: • Longest Prefix Match (LPM): • Mask matches length of prefix • Exact Match (EM) • Mask matches full Entry size • Best/Range Match: What we typically call General Match. • Mask is completely general. • Priority: • Priority within a database is done by order of the entries. • Exact Match should not need priority within the database since only one Entry should match a supplied Key. • LPM and Best/Range Match do use priority within the databases. • So, the order in which the entries are stored in these databases is important. • For LPM DBs we would want to group prefixes by length in the TCAM. • And this is almost certainly what the IDT software does. • Changing priorities on existing entries may cause us some problems. • It appears that the only way to change the priority of a Best/Range Match entry might be to write a new entry in a different location (different priority) and then delete the old entry. • Changing the priority of an LPM entry really would mean changing its prefix. • The IDT software uses a weight assigned to Entries as they are added for LPM and Best/Range Match • I believe this weight is just used to group entries of the same weight together and to ensure that entries are ordered based on their weights as they are added.
TCAM Performance • Three Factors that affect performance: • Lookup Size (Entry/Key) • Associated Data Width (Result) • CAM Core Lookup Rate • IXP/TCAM LA-1 Interface • 16 bits wide • 200 MHz QDR II SRAM Interface • Effectively 32bits per clock tick • So getting Key in is 32bits/tick • Example: 128b Key would take 4 ticks to get clocked into TCAM. • Max of 50 M Lookups/sec • Table on next slide shows some of the performance numbers for some Sizes that are of interest to us. • What we’ll see a little later is that in the worst case, we need a TOTAL Lookup rate of 12.5 M/sec (6.25 M/sec on each LA-1 interface)
TCAM Performance (Rates in M/sec) LC_Egress LC_Ingress
TCAM Software • Several software components exist, enough to be really confusing. • IDT Libraries: • MicroEngine Libraries: • NSE-QDR Data Plane Macro (DPM) API • Iipc.uc and Iipc.h • IIPC: Integrated IPCo-processor • Microengine Lookup Library (MLL) • IipcMll.uc • 5 slightly higher level macros than Iipc.uc • XScale: • Lookup Management Library (LML) • Control Plane: • Initialization Management and Search (IMS) Library • Simulation: • NSE with Dual QDR Interfaces IDT75K234SLAM • Intel Libraries: • TCAM Classifier Library • Microengine and XScale support for using TCAM. • Requires installation of MLL and LML. • Is geared toward a very specific application of NSE to IPv4 Forwarding App. • May be useful as an example of code to look at but probably not useful for us to use directly. • IXA SDK 4.0 Location: • src/library/microblocks_library/microcode/idt_tcam_classifier
Lookup Block • Three Lookup Blocks Needed: • All the Lookup Blocks will use the TCAM • LC-Ingress • All Databases for Ingress will be Exact Match • LC-Egress • All Databases for Egress will be Exact Match • MR • There will probably be multiple versions of this: • Shared • Dedicated • IPv4 • MPLS • But lets think of it as one for now and focus on IPv4. • Discussion later on what combination of the three types of DB we might use. • Base functionality and code should be the same for all three • Sizes of Keys and Results will differ. • LC-Ingress and LC-Egress will share a TCAM • ARP on the LC might need/want to use the TCAM. • The aging properties of the TCAM might be very useful for ARP. • So, we should leave some room for ARP on the LC TCAM. • We will need to think more about ARP when we get into the details of the control plane. • There will be two MR Lookup Blocks sharing a TCAM
QM QM MR Lookup Block Control TCAM XScale XScale Rx DeMux Parse Parse DeMux Rx Lookup Lookup Tx Tx HeaderFormat HeaderFormat MR (NPUA) MR (NPUB)
LC Lookup Block Ingress (NPUB) R T M S W I T C H Lookup Hdr Format QM/Schd Switch Tx Phy Int Rx Key Extract XScale LC TCAM ARP XScale Phy Int Tx QM/Schd Hdr Format Lookup Rate Monitor Key Extract Switch Rx Egress (NPUA)
Lookup Block Requirements • Average: Number of Packets per second required to handle? • Line Rate: 10Gb/s • Assume an average IP Packet Size of 200 Bytes (1600 bits) • (10Gb/s)/(1600 bits/pkt) = 6.25 Mpkts/s • Ethernet Header of 14 Bytes • Average Frame Size of 214 Bytes (1712 bits) • (10Gb/s)/(1712 bits/pkt) = 5.841 Mpkts/s • Ethernet Inter-Frame Spacing: 96 bits • Average Frame Size with Inter-Frame Spacing: 1808 bits • (10Gb/s)/(1808 bits/pkt) = 5.53 Mpkts/s • I’ll use 6.25 Mpkts/s as a target. • Minimum Pkt size: • Line Rate: 10Gb/s • Minimum Ethernet Frame Size: 64 Bytes (512 bits) • Ethernet Inter-Frame Spacing: 96 bits (512 + 96 = 608 bits) • (10Gb/s)/(608 bits/pkt) = 16.45 Mpkts/s • Max Core rate for LC Ingress: 50 M Lookups/s • 16.45/50 = 32.9 % • Max Core rate for LC Egress: 25 M Lookups/s • 16.45/25 = 65.80
Lookup Block Requirements • LC: Number of Lookups per second required: • 1 Ingress and 1 Egress lookup required per packet • If we assume 6.25 MPkts/sec then we need 12.5 M Lookups/sec. • MR/NPU: Number of Lookups per second required: • 5 Gb/s per MR/NPU: 3.125 M Lookups/sec • Total of 6.25 M Lookups/sec • Total Number of Lookup Entries to be supported? • Dependent on Size of Entries • Size of Entries and Keys? • Dependent on type of Lookup: MR, LC-Ingress, LC-Egress • Size of Results? • Dependent on type of Lookup: MR, LC-Ingress, LC-Egress
Keys and Results for Ingress LC and Egress LC • Keys: • Ingress (Link Router): • What fields in the External Frame Formats uniquely identify the MetaLink? • First we have to identify the Substrate Link Type • Then we can Identify the Substrate Link and MetaLink • Egress (Router Link): • What fields in the Internal Frame Format uniquely identify the MetaLink? • Results: • We need to identify what fields are needed to build the appropriate frame headers. • The fields needed may consist of several parts: • Constant fields: • Ethertype in most cases • Calculated fields: • Things like Checksums • Statically configured Fields that can be stored in Local Memory • Things like per physical interface or Blade Ethernet Src Addresses • ARP results for Ethernet DAddr on a Multi-Access link • Lookup Result from TCAM • Everything else… • Ingress (Link Router): Details later • Egress (Router Link): Details later
Field Sizes in Keys and Results • Field and Identifier sizes: • MR id: 16 bits (64K Meta Routers per Substrate Router) • MR ID == VLAN (Defined locally on a Substrate Router) • Note: We can probably shorten this to 12 bits since our switch only supports 4K VLANs which is 12 bits. • MI id: 16 bits (64K Meta Interfaces per Meta Router) • This seems like a lot. What level of flexibility do we need to support? • MLI: 16 bits (64K Meta Links per Substrate Link) • This seems safe and should not changed. • Port: 4 bits (16 Physical Interfaces per Line Card) • Note: I originally had this defined as 8 bits but since the RTM only supports 10 physical interfaces, 4 bits is enough. There were some places where the extra 4 bits pushed us to a larger size. • QID: 20 bits (QM_ID:Queue_ID) • Queue_ID: 17 bits (128K Queues per Queue Manager) • QM_ID: 3 bits (8 Queue Managers per LC or PE.) • We probably can only support 4 QMs, which could be encoded in 2 bits. • (64 Q-Array Entries) / (16 CAM entries) 4 QMs per SRAM Controller.
DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) VLAN (2B) Type=Substrate (2B) Dst MPE (2B) RxMI (2B) LEN (2B) Shim Flags (1B) Shim Data (nB) Meta Frame PAD (nB) CRC (4B) LC … LC: Internal Frame Formats DstAddr (6B) SrcAddr (6B) Internal Frame Leaving Ingress LC Internal Frame Arriving at Egress LC Type=802.1Q (2B) VLAN (2B) Type=Substrate (2B) TxMI (2B) Src MPE (2B) LEN (2B) Shim Flags (1B) Shim Data (nB) Meta Frame PAD (nB) CRC (4B) Packet arriving On Port N Packet leaving On Port M LC MR Switch Switch … IXP PE
DstAddr (6B) DstAddr (6B) DstAddr (6B) SrcAddr (6B) SrcAddr (6B) SrcAddr (6B) Type=802.1Q (2B) Type=802.1Q (2B) Type=802.1Q (2B) TCI (2B) TCI ≠ VLAN0 (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) DstAddr (6B) Type=Substrate (2B) Type=IP (2B) Type=IP (2B) MLI (2B) Ver/HLen/Tos/Len (4B) Ver/HLen/Tos/Len (4B) SrcAddr (6B) SrcAddr (6B) LEN (2B) ID/Flags/FragOff (4B) ID/Flags/FragOff (4B) Meta Frame TTL (1B) Type=802.1Q (2B) Type=802.1Q (2B) TTL (1B) Protocol=Substrate (1B) Protocol (1B) TCI=VLAN0 (2B) TCI≠VLAN0 (2B) Hdr Cksum (2B) Hdr Cksum (2B) Type=Substrate (2B) Type=Substrate (2B) Src Addr (4B) Src Addr (4B) MLI (2B) MLI (2B) Dst Addr (4B) LEN (2B) LEN (2B) Dst Addr (4B) PAD (nB) Meta Frame Meta Frame MLI (2B) IP Payload CRC (4B) LEN (2B) P2P-VLAN0 Meta Frame P2P-Tunnel PAD (nB) PAD (nB) PAD (nB) PAD (nB) CRC (4B) CRC (4B) CRC (4B) CRC (4B) LC: External Frame Formats P2P-DC (Configured) Multi-Access Legacy
DstAddr (6B) DstAddr (6B) SrcAddr (6B) SrcAddr (6B) Type=802.1Q (2B) Type=802.1Q (2B) VLAN (2B) VLAN (2B) Type=Substrate (2B) Type=Substrate (2B) Dst MPE (2B) TxMI (2B) RxMI (2B) Src MPE (2B) LEN (2B) LEN (2B) Shim Flags (1B) Shim Flags (1B) Shim Data (nB) Shim Data (nB) Meta Frame Meta Frame PAD (nB) PAD (nB) CRC (4B) CRC (4B) DstAddr (6B) DstAddr (6B) DstAddr (6B) SrcAddr (6B) SrcAddr (6B) SrcAddr (6B) Type=802.1Q (2B) Type=802.1Q (2B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) TCI (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) Type=IP (2B) Type=Substrate (2B) Type=IP (2B) Ver/HLen/Tos/Len (4B) MLI (2B) Ver/HLen/Tos/Len (4B) SrcAddr (6B) ID/Flags/FragOff (4B) LEN (2B) ID/Flags/FragOff (4B) DstAddr (6B) TTL (1B) Meta Frame TTL (1B) Type=802.1Q (2B) Protocol=Substrate (1B) Protocol (1B) SrcAddr (6B) TCI=VLAN0 (2B) Hdr Cksum (2B) Hdr Cksum (2B) Type=Substrate (2B) Src Addr (4B) Type=802.1Q (2B) Src Addr (4B) MLI (2B) Dst Addr (4B) TCI≠VLAN0 (2B) Dst Addr (4B) LEN (2B) PAD (nB) Type=Substrate (2B) MLI (2B) Meta Frame IP Payload MLI (2B) LEN (2B) CRC (4B) LEN (2B) Meta Frame Meta Frame PAD (nB) PAD (nB) PAD (nB) CRC (4B) CRC (4B) CRC (4B) PAD (nB) CRC (4B) LC: TCAM Lookup Keys Internal Frame Leaving Ingress LC Internal Frame Arriving at Egress LC Ingress LC Egress LC • Blue Shading: Determine SL Type • Black Outline: Key Fields from pkt P2P-DC (Configured) P2P-Tunnel Legacy P2P-VLAN0 Multi-Access
SL(4b) 0000 Port (4b) MLI(16b) SL(4b) 0001 Port (4b) EtherType (16b) 0x0800 IP SAddr (32b) MLI (16b) SL(4b) 0100 Port (4b) Ethernet SAddr (48b) MLI (16b) SL(4b) 0011 Port (4b) MLI(16b) SL(4b) 0010 Port (4b) EtherType (16b) 0x0800 DstAddr (6B) DstAddr (6B) SrcAddr (6B) SrcAddr (6B) Type=802.1Q (2B) Type=802.1Q (2B) TCI (2B) TCI ≠ VLAN0 (2B) DstAddr (6B) DstAddr (6B) Type=Substrate (2B) Type=IP (2B) MLI (2B) Ver/HLen/Tos/Len (4B) SrcAddr (6B) SrcAddr (6B) LEN (2B) ID/Flags/FragOff (4B) Type=802.1Q (2B) Meta Frame TTL (1B) Type=802.1Q (2B) TCI≠VLAN0 (2B) Protocol (1B) TCI=VLAN0 (2B) Hdr Cksum (2B) Type=Substrate (2B) Type=Substrate (2B) Src Addr (4B) MLI (2B) MLI (2B) LEN (2B) Dst Addr (4B) LEN (2B) PAD (nB) Meta Frame Meta Frame IP Payload CRC (4B) PAD (nB) PAD (nB) PAD (nB) CRC (4B) CRC (4B) CRC (4B) P2P-VLAN0 Multi-Access Legacy LC: TCAM Lookup Keys on Ingress P2P-DC 24 bits IPv4 Tunnel 72 bits Legacy 24 bits P2P-VLAN0 24 bits MA 72 bits DstAddr (6B) • Blue Shading: Determine SL Type • Black Outline: Key Fields from pkt SrcAddr (6B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol=Substrate (1B) Hdr Cksum (2B) Src Addr (4B) Dst Addr (4B) MLI (2B) LEN (2B) Meta Frame PAD (nB) CRC (4B) P2P-DC (Configured) P2P-Tunnel
LC: TCAM Lookup Results on Ingress • We need the Ethernet Header fields to get the frame to the blade that is to process it next. • We also need a QID and RxMI • Ethernet header fields that are constants can be configured and do not need to be in the TCAM Lookup Result. • Ethernet Header fields: • DAddr: Depends on MetaLink • SAddr: Can be constant and configured per LC • EtherType1: Can be a constant: 802.1Q • VLAN(TCI): Different for each MR • EtherType2: Can be a constant: Substrate • TCAM Lookup Result (76b) • VLAN (16b) • RxMI (16b) • DAddr (8b) • We can control the MAC Addresses of the Blades, so lets say that 40 of the 48 bits of DAddr are constant across all blades and 8 bits are assigned and stored in the Lookup Result. • Will 8 bits be enough to support multiple chasses? • We could go up to 12 bits and still use 64bit Associated Data • QID (20b) • Stats Index(16b) • What about Ingress Egress Pass Thru MetaLinks? • We will define a special Substrate VLAN for this use • We will also define a special set of MIs
LC: TCAM Lookup Results on Ingress 31 23 15 7 0 Buf Handle (32b) VLAN(16b) RxMi(16b) Rsv (12b) QID(20b) Rsv (8b) DA (8b) Stats(16b) Data format to downstream neighbor • TCAM Lookup Result (76b) • VLAN (16b) • RxMI (16b) • DAddr (8b) • We can control the MAC Addresses of the Blades, so lets say that 40 of the 48 bits of DAddr are constant across all blades and 8 bits are assigned and stored in the Lookup Result. • Will 8 bits be enough to support multiple chasses? • We could go up to 12 bits and still use 64bit Associated Data • QID (20b) • Stats Index(16b)
Pass Thru MetaLinks and Multi-Access SLs • When going MR LC-Egress the MR may provide a Next Hop MN Address for the LC to use to map to a MAC address. • This is particularly used when the destination Substrate Link is Multi-Access and there may be multiple MAC addresses used on the same Multi-Access MetaLink. • When going LC-Ingress LC-Egress for a pass through MetaLink, do we need to do something similar? • This could arise when a MetaNet has hosts on a multi-access network but the first Substrate Router that these hosts have access to does not have a MR for that MN. • However, I contend that if there is no MR on that access SR, then there is nothing there to discriminate between the multiple MN addresses on the single MA MetaLink and hence it cannot be supported.
Pass Thru MetaLinks and Multi-Access SLs Host1 No way to communicate Next Hop addresses from MR to distant LC Host2 Host3 Host4 LC LC LC ARP ML MR MA Network Host5 P2P SL MA SL Host6 Substrate Router1 Substrate Router2 • Implications: • We will not extend MA links across Substrate Routers and other Substrate Links. • MetaNets must place a MR in the substrate router that terminates a MA Substrate Link on which they want to support hosts. Host7 Host8 … HostN
DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) VLAN (2B) Type=Substrate (2B) TxMI (2B) Src MPE (2B) LEN (2B) Shim Flags (1B) Shim Data (nB) Meta Frame PAD (nB) CRC (4B) LC: TCAM Lookups on Egress • Key: • VLAN(16b) • TxMI(16b) • Result • The Lookup Result for Egress will consist of several parts: • Lookup Result • Constant fields • Calculated fields • Fields that can be stored in Local Memory • Some of these are common across all SL Types • Other fields are specific to each SL Type • Common across all SL Types (108b): • From Result (60b) • SL Type(4b) • Port(4b) (Physical Interface 1-10 on LC RTM) • MLI(16b) • QID (20b) • Stats Index (16b) • Local Memory (48b) • Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) • SL Type Specific Headers are on following slides
DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) TCI (2B) Type=Substrate (2B) MLI (2B) LEN (2B) Meta Frame PAD (nB) CRC (4B) LC: TCAM Lookups on Egress 31 23 15 7 0 • Key: • VLAN(16b) • TxMI(16b) • Result • Common across all SL Types (108b): • From Result (60b) • SL Type(4b) • Port(4b) • MLI(16b) • QID (20b) • Stats Index (16b) • Local Memory (48b) • Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) • SL Type Specific Headers • P2P-DC Hdr (64b) • Constant (16b): In Egress Hdr Format • EtherType (16b) = Substrate • Calculated (0b) • From Result (48b) • Eth DA (48b) • Lookup Result Total (Common Result + Specific Result): 108 bits • Total (Common + Specific) : 156 bits Buf Handle (32b) MLI(16b) Eth DA[15:0] (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) Data format to downstream neighbor
DstAddr (6B) SrcAddr (6B) Type=Substrate (2B) MLI (2B) LEN (2B) Meta Frame PAD (nB) CRC (4B) LC: TCAM Lookups on Egress • Key: • VLAN(16b) • TxMI(16b) • Result • Common across all SL Types (108b): • From Result (60b) • SL Type(4b) • Port(4b) • MLI(16b) • QID (20b) • Stats Index (16b) • Local Memory (48b) • Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) • SL Type Specific Headers • MA Hdr (64b) : • Constant (16b): In Egress Hdr Format • EtherType (16b) = Substrate • Calculated (0b) • ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) • Eth DA (48b) • From Result (0b) • Lookup Result Total (Common From Result + Specific From Result): 60 bits • Total (Common + Specific) : 156 bits 31 23 15 7 0 Buf Handle (32b) MLI(16b) Rsv (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Data format to downstream neighbor
DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) TCI≠VLAN0 (2B) Type=Substrate (2B) MLI (2B) LEN (2B) Meta Frame PAD (nB) CRC (4B) LC: TCAM Lookups on Egress 31 23 15 7 0 • Key: • VLAN(16b) • TxMI(16b) • Result • Common across all SL Types (108b): • From Result (60b) • SL Type(4b) • Port(4b) • MLI(16b) • QID (20b) • Stats Index (16b) • Local Memory (48b) • Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) • SL Type Specific Headers • MA with VLAN Hdr (96b) : • Constant (32b): In Egress Hdr Format • EtherType1 (16b) = 802.1Q • EtherType2 (16b) = Substrate • Calculated (0b) • ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) • Eth DA (48b) • From Result (16b) • VLAN/TCI (16b) • Lookup Result Total (Common From Result + Specific From Result): 76 bits • Total (Common + Specific) : 188 bits Buf Handle (32b) MLI(16b) VLAN(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Data format to downstream neighbor
DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) TCI=VLAN0 (2B) Type=Substrate (2B) MLI (2B) LEN (2B) Meta Frame PAD (nB) CRC (4B) LC: TCAM Lookups on Egress 31 23 15 7 0 • Key: • VLAN(16b) • TxMI(16b) • Result • Common across all SL Types (108b): • From Result (60b) • SL Type(4b) • Port(4b) • MLI(16b) • QID (20b) • Stats Index (16b) • Local Memory (48b) • Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) • SL Type Specific Headers • P2P-VLAN0 Hdr (96b): • Constant (32b): In Egress Hdr Format • EtherType1 (16b) = 802.1Q • EtherType2 (16b) = Substrate • Calculated (0b) • From Result (64b) • Eth DA (48b) • VLAN/TCI (16b) • Lookup Result Total (Common From Result + Specific From Result): 124 bits • Total (Common + Specific) : 188 bits Buf Handle (32b) MLI(16b) Rsv (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) VLAN (16b) Eth DA[15:0] (16b) Data format to downstream neighbor
DstAddr (6B) SrcAddr (6B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol=Substrate (1B) Hdr Cksum (2B) Src Addr (4B) Dst Addr (4B) MLI (2B) LEN (2B) Meta Frame PAD (nB) CRC (4B) LC: TCAM Lookups on Egress 31 23 15 7 0 • Result (continued) • Common across all SL Types (108b): • From Result (60b) • SL Type(4b) • Port(4b) • MLI(16b) • QID (20b) • Stats Index (16b) • Local Memory (48b) • Eth Hdr SA (48b): tied to physical interface (10 entry tbl in Egress Hdr Format) • SL Type Specific Headers • P2P-Tunnel Hdr for IPv4 Tunnel without VLANs (224b): • Constant (48b): In Egress Hdr Format • Eth Hdr EtherType (16b) = 0x0800 • IPHdr Version(4b)/HLen(4b)/Tos(8b) (16b): All can be constant? • IP Hdr TTL (8b): Initialized to a contant when sending. • IP Hdr Proto (8b) = Substrate • Calculated (64b): By Egress Hdr Format • IP Pkt Len(16b) : Calculated for each packet. • IP Hdr ID(16b): should be unique for each packet sent, so shouldn’t be in Result. • IP Hdr Checksum (16b): Needs to be calculated, so shouldn’t be in Result. • IP Hdr Flags(3b)/FragOff(13b) (16b) : If fragments are never used, these are constants, if it is possible we will have to use them, then this has to be calculated. Either way, shouldn’t be in Result • Local Memory (32b) • IP Hdr Src Addr (32b) : tied to physical interface (10 entry table in Egress Hdr Format) • From Result (80b) • Eth Hdr DA (48b) • IP Hdr Dst Addr (32b) • Lookup Result Total (Common From Result + Specific From Result): 140 bits • Total (Common + Specific) : 316 bits Buf Handle (32b) MLI(16b) Rsv (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) Rsv (16b) Eth DA[15:0] (16b) IP DA (32b) Data format to downstream neighbor
DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol=Substrate (1B) Hdr Cksum (2B) Src Addr (4B) Dst Addr (4B) MLI (2B) LEN (2B) Meta Frame PAD (nB) CRC (4B) LC: TCAM Lookups on Egress • Result (continued) • Common across all SL Types (108b): • From Result (60b) • SL Type(4b) • Port(4b) • MLI(16b) • QID (20b) • Stats Index (16b) • Local Memory (48b) • Eth Hdr SA (48b) : tied to physical interface (10 entry tbl in Egress Hdr Format) • SL Type Specific Headers • P2P-Tunnel Hdr for IPv4 Tunnel with VLANs (256b): • Constant (64b): In Egress Hdr Format • First Eth Hdr EtherType (16b) = 802.1QS • Second Eth Hdr EtherType (16b) = 0x0800 • IPHdr Version(4b)/HLen(4b)/Tos(8b) (16b): All can be constant? • IP Hdr TTL (8b): Initialized to a contant when sending. • IP Hdr Proto (8b) = Substrate • Calculated (64b): By Egress Hdr Format • IP Pkt Len(16b) : Calculated for each packet. • IP Hdr ID(16b): should be unique for each packet sent, so shouldn’t be in Result. • IP Hdr Checksum (16b): Needs to be calculated, so shouldn’t be in Result. • IP Hdr Flags(3b)/FragOff(13b) (16b) :Frags needed? • Local Memory (32b) • IP Hdr Src Addr (32b) : tied to physical interface (10 entry table in Egress Hdr Format) • From Result (96b) • Eth Hdr DA (48b) • IP Hdr Dst Addr (32b) • VLAN/TCI (16b) • Lookup Result Total (Common From Result + Specific From Result): 156 bits (PROBLEM!) • Total (Common + Specific) : 348 bits 31 23 15 7 0 Buf Handle (32b) MLI(16b) Rsv (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Eth DA[47:16] (32b) VLAN (16b) Eth DA[15:0] (16b) IP DA (32b) Data format to downstream neighbor
DstAddr (6B) SrcAddr (6B) Type=802.1Q (2B) TCI ≠ VLAN0 (2B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol (1B) Hdr Cksum (2B) Src Addr (4B) Dst Addr (4B) IP Payload PAD (nB) CRC (4B) LC: TCAM Lookups on Egress • Key: • VLAN(16b) • TxMI(16b) • Result • Common across all SL Types (108b): • From Result (60b) • SL Type(4b) • Port(4b) • MLI(16b) • Ignored for Legacy Traffic • QID (20b) • Stats Index (16b) • Local Memory (48b) • Eth Hdr SA (48b) : tied to physical interface (10 entry tbl in Egress Hdr Format) • SL Type Specific Headers • Legacy (IPv4) with VLAN Hdr (96b): • IP Header provided by MR! • Constant (16b) In Egress Hdr Format • EtherType1 (16b) = 802.1Q • Calculated (0b) • ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) • Eth DA (48b) • From Result (32b) • EtherType2 (16b) = IPv4 • TCI (16b) • Lookup Result Total (Common From Result + Specific From Result): 92 bits • Total (Common + Specific) : 188 bits 31 23 15 7 0 Buf Handle (32b) MLI(16b) ETYpe(16b) Rsv (4b) Port (4b) SL (4b) QID(20b) VLAN (16b) Rsv (16b) Data format to downstream neighbor
DstAddr (6B) SrcAddr (6B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol (1B) Hdr Cksum (2B) Src Addr (4B) Dst Addr (4B) IP Payload PAD (nB) CRC (4B) LC: TCAM Lookups on Egress 31 23 15 7 0 • Key: • VLAN(16b) • TxMI(16b) • Result • Common across all SL Types (108b): • From Result (60b) • SL Type(4b) • Port(4b) • MLI(16b) • Ignored for Legacy Traffic • QID (20b) • Stats Index (16b) • Local Memory (48b) • Eth Hdr SA (48b) : tied to physical interface (10 entry table in Egress Hdr Format) • SL Type Specific Headers • Legacy (IPv4) without VLAN Hdr (64b): • IP Header provided by MR! • Constant (0b) • Calculated (0b) • ARP Lookup on NhAddr (Is ARP cache another database in TCAM?) (48b) • Eth DA (48b) • From Result (16b) • EtherType (16b) = IPv4 • Lookup Result Total (Common From Result + Specific From Result): 76 bits • Total (Common + Specific) : 156 bits Buf Handle (32b) MLI(16b) EType (16b) Rsv (4b) Port (4b) SL (4b) QID(20b) Data format to downstream neighbor
LC: Lookup Block Parameters • All lookups will be Exact Match. • Ingress: • # Databases: 1 • 4 bits in Key identify the SL Type • 0000: DC • 0001: IPv4 Tunnel • 0010: Legacy (non-substrate) with or without VLAN • 0011: VLAN0 • 0100: MA (with or without VLAN) • Core Size: 72b • Key Size: 24b - 72b • AD Result Size: 64b of which we’ll use 60 bits • Egress: • # Databases: 1 • Core Size: 36b • Key Size: 32b • AD Result Size: 128b of which we’ll use different amounts per SL Type • With one problem to still work out.
SUMMARY: LC: TCAM Lookups • Ingress Key Size: 24 bits or 72 bits • Ingress Result Size: 76 bits • Egress Key Size: 32 bits • Egress Result Size: 60-156 bits • The IP Tunnel with VLANs Substrate Link option is a problem. • Discussion of ways to handle them are on next slide • We also need to watch out for the Egress Result for Tunnels w/o VLANs. If we introduce anything else we want in there then we go beyond the 128 bits supportable through the TCAM’s Associated memory.
Handling IP Tunnel SL with VLANs • Result Fields (156 bits): • SL Type(4b) • Port(4b) • MLI(16b) • QID (20b) • Stats Index (16b) • Eth Hdr DA (48b) • IP Hdr Dst Addr (32b) • VLAN (16b) • 128 bits is max size of a Result stored in TCAM Associated Data SRAM • Options for handling this Result • Not allow this type of SL • Might be ok for short term but almost certainly not ok for long term. • Find 28 bits we don’t really need in Result • Do a second lookup when we find a SL like this. • Do a Multi-Hit lookup and put two entries in for these SLs and only one entry for all others. • Then concatenate the two results when we get them. • Only allow a small fixed number of this type of SL: • Store an index in the 4 bits we have left • store the extra bits we need in a table in Local memory. • However, this is a little tricky since we would then need to get the extra bits from the control plane into Local Memory and we will want Substrate Links to be able to be added dynamically.
QM QM MR Lookup Block Control TCAM XScale XScale Rx DeMux Parse Parse DeMux Rx Lookup Lookup Tx Tx HeaderFormat HeaderFormat MR (NPUA) MR (NPUB)
Buf Handle(32b) MR-1 . . . MR-n MR Id(16b) MR Mem Ptr(32b) MR Lookup Key(NB) QM Common Router Framework (CRF) Functional Blocks Parse HeaderFormat Lookup Tx Rx DeMux MR-1 . . . MR-n Buffer Handle(32b) MR_ID and MR Mem Ptr Not needed for Dedicated IPv4 MR MR Id(16b) MR Mem Ptr(32b) Lookup Result(16B) • Lookup • Function • Perform lookup in TCAM based on MR Id and lookup key • Result: • Output MI • QID • Stats index • MR-specific Lookup Result (flags, etc. ?) • How wide can/should this be?
MR Lookup Block Requirements • Shared NP Lookup Engine specific: • Number of Lookups per second required: • 1 lookup required per packet • 5Gb/s per NP on a blade • Average sized packet: 200Bytes, 1600 bits • If we assume 6.25 MPkts/sec for 10Gb/s then for 5Gb/s would be 3.125 MPkt/s • We would want 3.125 M Lookups/sec per LA-1 Interface, total of 6.25 M Lookups/sec for the TCAM Core. • Minimum Sized Packet: 76Bytes, 608 bits • If we assume 16.45 MPkts/sec for 10Gb/s then for 5Gb/s would be 8.225 MPkt/s • We would want 8.225 M Lookups/sec per LA-1 Interface, total of 16.45 M Lookups/sec for the TCAM Core. • Number of MRs to be supported? • Will each get its own database? No. This would limit it to 16 which is not enough. • How many keys will each MR be limited to? • How much of Result can be MR-specific? • How much of Key can be MR-specific? • How are masks to be supported? • Mask core is same size as Data core. One mask per Entry • Global Mask Registers also available for masking key to match size of Entry during Multi Database Lookups where the multiple databases have different sizes. • How will multiple hits across databases be supported? • How will priorities be supported? • Priorities within a database are purely by the order of the keys. • For example, in a GM filter table if Keys 4 and 7 both match, Key 4 is selected. • Priorities across databases will have to be included in the Entries • Do we need support for non-exclusive (make a copy) filters? • Later? • How are GM with fields with ranges supported? • The IDT libraries support this by adding multiple entries, each with its own mask, to the DB to cover the range of the field.
IPv4 MR Lookup Entry Examples • Route Lookup: Longest Prefix Match • Entry (64b): • MR ID (16b) • MI (16b) • DAddr (32b) • Mask: (32 + Prefix length) high order bits set to 1 • GM Match Lookup • Entry (142b): • MR ID (16b) • MI (16b) • SAddr (32b) • DAddr (32b) • Sport(16b) • Dport(16b) • Protocol_Selector (2b) : • 00: Protocol is NOT TCP and so following field should be interpreted as Protocol • 01: Protocol is TCP and so following field should be interpreted as TCP_Flags • 10: reserved • 11: reserved • Protocol_TCP_Flags (12b) • Mask: Completely general, user defined. • EM Match Lookup • Entry (136b): • MR ID (16b) • MI (16b) • SAddr (32b) • DAddr (32b) • Sport(16b) • Dport(16b) • Protocol_Selector (2b) : • 00: Protocol is NOT TCP and so following field should be interpreted as Protocol • 01: Protocol is TCP and so following field should be interpreted as TCP_Flags • 10: reserved • 11: reserved • Protocol_TCP_Flags (12b) • Mask: 136 high order bits set to 1
IPv4 MR Lookup Databases • How many databases to use? • Three Options: • 3: a separate DB for each • 2: one DB for GM and one for RL and EM • 1: RL, GM and EM all in one DB • Assumptions: • We want to be able to easily change priorities of Filters • We want Routes being strictly Longest Prefix is the best Match. • A Filter, either Exact Match or Range/Best Match, always takes precedence over a Route • EM is generally higher priority than Range/Best Match, but not always. • We still want the best highest priority match of each and then compare them. • We may not want to pay the overhead penalty of shuffling filter entries when we change priorities. • Currently unknown what the penalty will be.
IPv4 MR Lookup Databases • 3 Databases: • Means we would use Multi Database Lookup (MDL) command • More efficient use of CAM core entries as each DB could be sized closer to its Entry size • Guaranteed at least one Result from each Database if an existing match existed in each database. • 2 Databases: • We could use MDL command • Guaranteed one Result from GM and one from either EM or RL but not both! • Order is important: • EM filters would all go first in EM/RL DB, with full masks. • At most one entry would match • EM filters would always be higher priority than Routes. • If no EM filter match, we would get the best RL match. • RL entries would be sorted by prefix length so first match was the longest. • We could use two separate commands: Lookup or MHL for GM and MHL for EM/RL • Guaranteed at least one Result from each {GM,EM,RL} if an existing match existed in each. • Price: Two lookups per packet. • 1 Database: • Use Multi Hit Lookup (MHL) command • Efficient use of the CAM core entries is a potential problem. • Would not be as bad, if we could get the GM filters down to 144 bits by making the MR/MI fields a combined 4 bits shorter. • Order is important • EM Filters first • GM Filters second • With Result of 64 bit AD, we can get back at most 4 Results (1 EM and 3 GM or 4 GM or something less…) • RL Entries last • EM and GM always take priority over RL. • Priority field in Results could be used to arbitrate between matched EM and GM filters
MR ID (16b) MI (16b) DAddr (32b) SAddr (32b) Protocol (8b) DPort (16b) Sport (16b) TCP_Flags (12b) IPv4 MR Lookup Example: 3 DBs • Order matters: • Same Key will be applied to all Databases(MDL) • Multi-Database Lookup (MDL) • Each Database will use the number of bits it was configured for, starting at the MSB. • DAddr field needs to be first • TCP_Flags field needs to be last • Route Lookup: Longest Prefix Match • Key (64b): • MR ID (16b) • MI (16b) • DAddr (32b) • GM Match Lookup: Best/Range Match • Key (148b): • MR ID (16b) • MI (16b) • DAddr (32b) • SAddr (32b) • Protocol (8b) • Sport(16b) • Dport(16b) • TCP_Flags (12b) • MASK/Ranges • How will we handle • Masks for Addr fields • Ranges for Port fields • Wildcard for Protocol field • EM Match Lookup: Exact Match • Key (136b): • MR ID (16b) • MI (16b) • DAddr (32b) • SAddr (32b) • Protocol (8b) • Sport(16b) • Dport(16b)
Mask Mask Mask Data Data Data Core Entries For RL DB 64 bits Core Size: 72 bits GMR=0xFFFFFFFFF 0xFFFFFFF00 Core Entries for EM DB 136 bits Core Size: 144 bits GMR=0xFFFFFFFFF 0xFFFFFFFFF 0xFFFFFFFFF 0xFFFFFFF00 IPv4 MR Lookup Example: 3 DBs Lookup Key: 148 bits out of 5 32-bit words transmitted with Lookup command. MR ID (16b) MI (16b) DAddr(32b) SAddr(32b) DPort(16b) SPort(16b) Proto (8b) TCP_Flags (12b) Pad (12b) W1 W2 W3 W4 W5 MDL GMR GMR GMR Core Entries for GM DB 148 bits Core Size: 288 bits GMR=0xFFFFFFFFF 0xFFFFFFFFF 0xFFFFFFFFF 0xFFFFFFFFF 0xF00000000 0x000000000 0x000000000 0x000000000