290 likes | 420 Views
Tx, QM Parse Plugin XScale. FreeList Mgr (1 ME). Stats (1 ME). QM Copy Plugins. SRAM. ONL NP Router. xScale. xScale. TCAM. Assoc. Data ZBT-SRAM. SRAM. 64KW. HdrFmt (1 ME). Parse, Lookup, Copy (3 MEs). Rx (2 ME). Mux (1 ME). QM (1 ME). Tx (1 ME). NN. 64KW. SRAM.
E N D
Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) Stats (1 ME) QM Copy Plugins SRAM ONL NP Router xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 64KW HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin3 Plugin4 Plugin0 Plugin1 Plugin2 SRAM xScale Scratch Ring NN Ring NN
Rsv (4b) Out Port (4b) Buffer Handle(24b) L3 (IP, ARP, …) Pkt Length (16b) QID(16b) Plugin Tag (5b) In Port (3b) Flags (8b) Stats Index (16b) Reserved (5b) Src (2b) PT (1b) 7 3 1 2 0 MUX -> PLC Flags: Src: Source (2b): 00: Rx 01: XScale 10: Plugin 11: Undefined PT(1b): PassThrough(1)/Classify(0) Reserved (5b) PLC Mux
Rsv (8b) Buffer Handle(24b) SRAM L3 (IP, ARP, …) Pkt Length (16b) QID(16b) In Port (3b) Flags (8b) Stats Index (16b) Plugin Tag (5b) NH MAC DA[47:16] (32b) xScale NH MAC DA[15:0] (16b) EtherType (16b) Reserved (16b) Unicast/MCast bits (16b) Reserved (2b) NH INV (1b) ARP (1b) NI (1b) NR (1b) Opt (1b) TTL (1b) 5 4 7 3 1 2 0 PLC -> XScale XScale Flags(8b): Why pkt is being sent to XScale TTL(1b): TTL expired Options(1b): IP Options present NoRoute(1b): No matching route or filter NonIP(1b): Non IP Packet received ARP_Needed(1b): NH_IP valid, but no MAC NH_Invalid(1b): NH_IP AND NH_MAC both invalid Reserved(2b): currently unused Same as ring_in Based on Parse & Lookup results PLC
Rsv (8b) Buffer Handle(24b) SRAM L3 (IP, ARP, …) Pkt Length (16b) QID(16b) In Port (3b) Flags (8b) Stats Index (16b) Plugin Tag (5b) NH MAC DA[47:16] (32b) xScale NH MAC DA[15:0] (16b) EtherType (16b) Reserved (16b) Unicast/MCast bits (16b) Reserved (2b) NH INV (1b) ARP (1b) NI (1b) NR (1b) Opt (1b) TTL (1b) 5 4 7 3 1 2 0 PLC -> Plugins Same as ring_in Based on Parse & Lookup results PLC Flags(8b): Why pkt is being sent to XScale TTL(1b): TTL expired Options(1b): IP Options present NoRoute(1b): No matching route or filter NonIP(1b): Non IP Packet received ARP_Needed(1b): NH_IP valid, but no MAC NH_Invalid(1b): NH_IP AND NH_MAC both invalid Reserved(2b): currently unused Plugins 0-4
Rsv (8b) Buffer Handle(24b) QID(16b) Rsv (4b) Out Port (4b) Rsv (8b) Reserved(16b) L3 (IP, ARP, …) Pkt Length (16b) PLC -> QM PLC QM Same as ring_in Based on Parse & Lookup results Maybe the added hdr buffer handle
Types of Pkts Arriving at PLC • From Rx: • Only have a payload buf, ref_cnt == 1. • Subject to classification except mal-formed IP pkts detected by Parse. • From XScale/Plugins: • Passthrough (PT) pkt • May/May not have a hdr buf, ref_cnt >= 1. • Not processed by PLC. • Copy sends it to QM. • Non-PT pkt • Only have a payload buf at arrival, ref_cnt >= 1. • Is IP pkt. • Will be classified if it passes IP hdr validation done by Parse.
PLC() PLC() { if (!ring_in.PT) { Parse(); if (dlNextBlock != BID_FREELISTMGR) Lookup(); } Copy(); } • Inside Copy(), dl_sink() is called to enqueue pkts to downstream blocks.
dl_sink • Special design of dl_sink(first, last, try_action) • “first” == TRUE, means current pkt is the first one of a sequence of pkts to be sunk. • “last” == TRUE, means current pkt is the last one of a sequence of pkts to be sunk. • “try_action” == 1, drop the pkt if ring is full; otherwise try till succeeds. • dl_sink() has a return value to indicate if enqueue is successful or not. • If DL_ORDERED is defined and “first” == TRUE, thread waits for signal from previous context before enqueuing the first pkt. • If DL_ORDERED is defined and “last” == TRUE, thread passes signal to next context after euqueuing the last pkt.
Parse • Functions: • Do IP Router checks (wrong ver, Hlen, Plen, cksum), count error pkts. • Decrement TTL on pkts from Rx and recompute IP cksum • Detect exceptions (expired TTL, IP option, Non-IP) • Extract lookup key • Input: • ring_in data • Output: • lookup key, dlNextBlock, eth_type, DG QID. IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF
Parse • Operations: • If pkt is from Rx: • Read eth_type in Ethernet hdr • If eth_type != 0x0800, set NIP in exception bits. • If eth_type == 0x0806, set ARP in exception bits. • Return. • eth_type <= 0x0800. • For pkt from Rx, offset <= 0x18E; for pkt from XScale/plugins, get offset from payload buf desc. • Read 20B IP hdr. • Check for ver, Hlen, Plen. If fails, count and dlNextBlock <= BID_FREELISTMGR. Return. • If Hlen > 5, set OPT in exception bits, read IP option. • Verify cksum. If fails, count and dlNextBlock <= BID_FREELISTMGR. Return. • If pkt is from Rx and TTL <= 1, or pkt is from plugins/XScale and TTL < 1, set TTL in exception bits. • If pkt is from Rx and TTL > 1, decrement TTL, recompute cksum and write them back to dram. • Form Datagram QID using DG QID = SA[9:8] SA[6:5] DA[6:5] (Used by Copy in case of a zero QID). • Extract key from IP hdr. • If IP protocol is TCP/UDP, read 14B, and extract key from TCP/UDP hdr. • Copy Plugin Tag and In Port to lookup key.
D (1b) D (1b) D (1b) H (1b) H (1b) H (1b) M H (1b) M H (1b) M H (1b) Res (8b) Res (8b) Prio (8b) Address (21b) Address (21b) Address (21b) NH_MAC (48b) NH_MAC (48b) NH_MAC (48b) Res (16b) Res (16b) Res (16b) NH_IP (32b) NH_IP (32b) NH_IP (32b) Entry Valid (1b) NH IP Valid (1b) NH MAC Valid (1b) PPS (1b) IP MC Valid (1b) Multicast Copy Vector (11b) • If IP MC Valid = 1 Reserved (4b) D (1b) PPS (1b) UCast Out Port (3b) UCast Out Plugin (3b) • If IP MC Valid = 0 Lookup Key and Results Formats IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF 32 Bit Result in TCAM Assoc. Data SRAM: 96 Bit Result in QDR SRAM Bank0: V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) PF V (4b) S B (2b) R e s (2b) Uni Cast (8b) QID (16b) Stats Index (16b) AF V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) RL TCAM Ctrl Bits: D:Done H:HIT MH:Multi-Hit
Lookup Overview • Initialization • Control Plane initializes TCAM and Route and Filter DBs • Runtime Updates • Control Plane updates to Route and Filter DBs • Design – in upcoming slides • Processing – in upcoming slides • Lookup will be written in C • There are many things about writing IXP code in “C” that I need to learn. Here are some of them: • Performing multiple memory operations in parallel and waiting on a set of signals (If needed for performance reasons) • Performing timestamp waits • Calling IDT microcode macros
Lookup: Design -- Databases • Three Databases: • Route Lookup: • Unicast • Sorted by DAddr Prefix Length • Multicast • Exact match on DAddr and prefix of SAddr • Primary Filter • Filters should be sorted in the DB with higher priority filters first • Auxiliary Filter • Filters should be sorted in the DB with higher priority filters first • Priority between Primary Filter and Route Lookup • A priority will be stored with each Primary Filter • A priority will be assigned to RLs (all routes have same priority) • PF priority and RL priority compared after result is retrieved. • One of them will be selected based on this priority comparison. • Auxiliary Filters: • If matched, cause a copy of packet to be sent out according to the Aux Filter’s result.
Lookup: Design -- Results • Use SRAM Bank 0 (2 MB per NPU) for Results • B0 Byte Address Range: 0x000000 – 0x1FFFFF • 21 bits • B0 Word Address Range: 0x000000 – 0x1FFFFC • 19 significant bits • 2 trailing 0’s • Store result in two parts: • 32-bit Associated Data SRAM result for Address of actual Result: • TCAM Control Bits (3b) • Done: 1b • Hit: 1b • MHit: 1b • Priority: 8b • Present for Primary Filters, for RL and Aux Filters should be 0 • SRAM B0 Word Address: 21b • 2 spare bitS if needed for anything else • 3 Words (<= 96 bits) of Result in SRAM Bank0 • Use Multi-Database Lookup (MDL) Indirect for searching all 3 DBs • Order of fields in Key is important. • Each thread will need one TCAM context
Lookup Processing write KEY to TCAM use timestamp delay to wait appropriate time make delay long enough that we are as sure as possible that we will have to read the 1st word of the Results MB only once while !DoneBit // DONE Bit BUG Fix requires reading just first word read 1 word from Results Mailbox and check DoneBit done read words 2 and 3 from Results Mailbox If (PrimaryFilter AND RouteLookup results HIT) { PrimaryResult.Valid TRUE compare priorities store higher priority result as Primary Result (read result from SRAM Bank0) } else if (PrimaryFilter results HIT) { PrimaryResult.Valid TRUE PrimaryResults.* PrimaryFilter.* (read result from SRAM Bank0) } else if (RouterLookup results HIT) { PrimaryResult.Valid TRUE PrimaryResults.* RouteLookup.* (read result from SRAM Bank0) } else PrimaryResult.Valid False if (AuxiliaryFilter result HIT) { AuxiliaryResult.Valid TRUE AuxiliaryResults.* (read result from SRAM Bank0) } else AuxiliaryResult.Valid FALSE
D (1b) D (1b) D (1b) H (1b) H (1b) H (1b) M H (1b) M H (1b) M H (1b) Res (8b) Res (8b) Prio (8b) Address (21b) Address (21b) Address (21b) NH_MAC (48b) NH_MAC (48b) NH_MAC (48b) Res (16b) Res (16b) Res (16b) NH_IP (32b) NH_IP (32b) NH_IP (32b) Entry Valid (1b) NH IP Valid (1b) NH MAC Valid (1b) PPS (1b) IP MC Valid (1b) Multicast Copy Vector (11b) • If IP MC Valid = 1 Reserved (4b) D (1b) PPS (1b) UCast Out Port (3b) UCast Out Plugin (3b) • If IP MC Valid = 0 Lookup Key and Results Formats IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF 32 Bit Result in TCAM Assoc. Data SRAM: 96 Bit Result in QDR SRAM Bank0: V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) PF V (4b) S B (2b) R e s (2b) Uni Cast (8b) QID (16b) Stats Index (16b) AF V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) RL TCAM Ctrl Bits: D:Done H:HIT MH:Multi-Hit
Exception Bits in Lookup Key IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: RL PF and AF Reserved (12b) Non-IP (1b) ARP (1b) IP Opt (1b) TTL (1b) • Exception Bits: • TTL: TTL has expired. It was 0 or 1 on arriving packet • IP Opt: IP Packet contained Options • ARP: Ethertype field in ethernet header was ARP • Non-IP: Ethertype field in ethernet header was NOT IP • NOTE: An ARP packet will have ARP bit and Non-IP bit set
Lookup Block Diagram mem access Latency Setup Lookup Key Write Lookup Key to TCAM SRAM Write: 5W TimeStamp Delay 315 cycles ctx_swap Read 1W Result from AD SRAM Read: 1W 150 cycles ctx_swap Check Done Bit Read 2W Result from AD SRAM Read: 2W 150 cycles ctx_swap SRAM Read: 3W 150 cycles Read 2 Full Results from QDR SRAM Read: 3W 150 cycles ctx_swap Setup Results for Copy TOTAL (No optimization) 915 cycles
Lookup File locations • Code • src/applications/ONL_Router/src/plc/ONL/lookup.c • Include Paths • src/applications/ONL_Router/src/dispatch_loop/ONL/ • dl_source.h and dl_source.c • dl_source() and dl_sink() functions • src/IDT_NSE/data_place_IXP2XXX/include • IDT IIPC defines and macros • others?
Copy • Functions: • Drop error pkts detected by Parse. • Count and send PT pkts to QM. • Process lookup results: • When “control error” (NH_IP and NH_MAC both invalid or both valid), or “ARP request needed” (Uncast pkt with valid NH_IP but invalid NH_MAC), or “no route” (invalid PR and AR) is detected, pkt should be sent to XScale, or one of five plugins, or dropped based on user preference (dynamically configurable). • Compute MAC DAddr for IP multicast pkts. • Update total ref_cnt in paylod buf desc for each classified pkt. • If total ref_cnt == 1 and pkt goes to QM, fill in payload buf desc with NH_MAC, Stats index and EtherType. • If total ref_cnt > 1, add hdr buf to each pkt going to QM, and fill in hdr buf desc with buffer_next, packet_size, ref_cnt (=1), NH_MAC, Stats index, EtherType. • For each copy, form ring_out data and enqueue it to the outgoing ring. • Input: • ring_in data, exception bits, dlNextBlock, eth_type, DG QID, lookup results • Output: • Ring_out data
Copy • Operations: • Error pkt detected by Parse (to freelist_mgr): • Construct ring_out data to Freelist_mgr • Call dl_sink(TRUE, TRUE, 0). Return. • PT pkt (to QM): • dlNextBlock <= BID_QM • Construct ring_out data to QM from ring_in data • Call dl_sink(TRUE, TRUE, 0). Return. • Classified pkt: • Pre-check: • Copy exception bits to flags. • If PR and AR are both invalid, this pkt has no route. • Set NR bit in flags. • Set dlNextBlock to user preference. • Construct ring_out data. • Call dl_sink(TRUE, TRUE, 1). Return. • Pre-process lookup results, because: • Both PR and AR can generate pkt copies to QM and XScale/plugins. • Need to know total num of copies to • Decide whether hdr buffer decs should be added to copy going to QM. • Set “last” bit in dl_sink(). • Update total ref_cnt in payload buf desc using one atomic addition. • Post-process lookup results, copy and send pkts to proper rings.
V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) V (4b) S B (2b) R e s (2b) Uni Cast (8b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_MAC (48b) Res (16b) Res (16b) NH_IP (32b) NH_IP (32b) Copy struct copy_qm_prep{ unsigned int pkt_cnt; bool pr_ucast; bool pr_mcast; bool ar_ucast; } qm; PR: struct copy_plugins_prep{ unsigned int pkt_cnt; bool pr_arp; bool pr_nh_inv; bool pr_ucast; bool pr_mcast; bool ar_arp; bool ar_nh_inv; bool ar_ucast; } plugins; AR: • Pre-processing data structure: Pre-process fields Post-process fields Only one can be TRUE
NH_MAC (48b) Res (16b) NH_IP (32b) Entry Valid (1b) NH IP Valid (1b) NH MAC Valid (1b) PPS (1b) IP MC Valid (1b) Multicast Copy Vector (11b) • If IP MC Valid = 1 Reserved (4b) D (1b) PPS (1b) UCast Out Port (3b) UCast Out Plugin (3b) • If IP MC Valid = 0 Copy (pre-process PR) V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) • If PR is valid, • If NH_IP is valid and NH_MAC is invalid and IP_MC is invalid, this pkt needs ARP. • Set APR bit in flags. • If pkt should be sent to XSclae/plugins, • plugins.pkt_cnt ++; plugins.pr_arp <= TRUE. • Else if NH_IP and NH_MAC are both invalid or both valid, lookup entry is mis-configured. • Set NH_INV bit in flags. • If pkt should be sent to XSclae/plugins, • plugins.pkt_cnt ++; plugins.pr_nh_inv <= TRUE.
Reserved (4b) D (1b) PPS (1b) UCast Out Port (3b) UCast Out Plugin (3b) PPS (1b) Multicast Copy Vector (11b) Copy (pre-process PR) • Else, • If MC_valid == 0 (unicast), • If Drop == 0, • If PPS == 0, qm.pkt_cnt ++, qm.pr_ucast <= TRUE. • Else (PPS == 1), plugins.pkt_cnt ++, plugins.pr_ucast <= TRUE. • Else (MC_valid == 1 (Multicast)), • If PPS == 0, • qm.pkt_cnt <= total 1’s in high 5 bits in MCast_copyVector; • plugins.pkt_cnt <= total 1’s in low 6 bits in MCAST_copyVector. • If qm.pkt_cnt > 0, qm.pr_mcast <= TRUE. • If plugins.pkt_cnt > 0, plugins.pr_mcast <= TRUE. • Else (PPS == 1), • plugins.pkt_cnt <= total 1’s in low 6 bits in Mcast_copyVector; • If plugins.pkt_cnt > 0, plugins.pr_mcast <= TURE.
V (4b) S B (2b) R e s (2b) Uni Cast (8b) QID (16b) Stats Index (16b) D (1b) PPS (1b) UCast Out Port (3b) UCast Out Plugin (3b) NH_MAC (48b) Res (16b) NH_IP (32b) Entry Valid (1b) NH IP Valid (1b) NH MAC Valid (1b) IP MC Valid (1b) Copy (pre-process AR) • If AR is valid, • If NH_IP is valid and NH_MAC is invalid, this pkt needs ARP. • Set APR bit in flags. • If pkt should be sent to XSclae/plugins, • plugins.pkt_cnt ++; plugins.pr_arp <= TRUE. • Else if NH_IP and NH_MAC are both invalid or both valid, lookup entry is mis-configured. • Set NH_INV bit in flags. • If pkt should be sent to XSclae/plugins, • plugins.pkt_cnt ++; plugins.pr_nh_inv <= TRUE.
SB (2b) Rsv (2b) D (1b) PPS (1b) UCast Out Port (3b) UCast Out Plugin (3b) Copy (pre-process AR) • Else, • Each SB value is associated with a sampling rate rt(SB) that is dynamically configurable by users. • Generate a random number rd. • If rd <= rt(SB), • If PPS == 0, • qm.ar_ucast <= TRUE, qm.pkt_cnt ++. • Else (PPS == 1), • plugins.ar_ucast <= TRUE, plugins.pkt_cnt ++.
V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) V (4b) S B (2b) R e s (2b) Uni Cast (8b) QID (16b) Stats Index (16b) NH_MAC (48b) NH_MAC (48b) Res (16b) Res (16b) NH_IP (32b) NH_IP (32b) Copy (post-process) PR: struct copy_qm_prep{ unsigned int pkt_cnt; bool pr_ucast; bool pr_mcast; bool ar_ucast; } qm; struct copy_plugins_prep{ unsigned int pkt_cnt; bool pr_arp; bool pr_nh_inv; bool pr_ucast; bool pr_mcast; bool ar_arp; bool ar_nh_inv; bool ar_ucast; } plugins; AR: • If qm.pkt_cnt + plugins.pkt_cnt == 0, dlNextBlock <= BID_FREELISTMGR, dl_sink(TRUE, TRUE, 0). Return. • Read ref_cnt from payload buf desc. • If qm.pkt_cnt + plugins.pkt_cnt + (ref_cnt – 1) == 1, • If qm.pkt_cnt == 1, DO NOT add hdr buf desc. • Based on qm.pr_ucast, qm.pr_mcast, qm.ar_ucast, • Fill in NH_MAC, stats index, and eth_type in payload buf desc. • Construct QM ring_out data. • dlNextBlock <= BID_QM. • dl_sink(TRUE, TRUE, 0). Return. • Else (plugins.pkt_cnt == 1), • Based on plugins.*, • construct plugins/XScale ring_out data. • Set dlNextBlock to XScale or one of the five plugins. • dl_sink(TRUE, TRUE, 1). Return.
Copy (post-process) • Else (qm.pkt_cnt + plugins.pkt_cnt + (ref_cnt – 1) > 1), • Add (qm.pkt_cnt + plugins.pkt_cnt – 1) to ref_cnt in payload buf desc. • If qm.pkt_cnt >0, • for each copy to QM, add hdr buf desc. • Based on qm.pr_ucast, qm.pr_mcast, qm.ar_ucast, • Fill in buffer_next, packet_size, ref_cnt (=1), NH, stats index, and eth_type in hdr buf desc. • Construct QM ring_out data. • dlNextBlock <= BID_QM. • Call dl_sink() qm.pkt_cnt times. • If plugins.pkt_cnt == 0, Set “last” bit in dl_sink() for the last copy and return. • If plugins.pkt_cnt > 0, • Based on plugins.*, • Construct plugins/XScale ring_out data. • Set dlNextBlock to XScale or one of the five plugins. • Call dl_sink() plugins.pkt_cnt times. • Set “last” bit in dl_sink() for the last copy and return.
PLC Typical Case Operation Memory Access Cycle Count dl_source Dequeue pkt from MUX 3LW Scratch Ring read 60 Read eth_type, IP, TCP/UDP hdr Aligned 10LW DRAM read 300 Parse Write TTL, IP cksum Aligned 4LW DRAM write 300 • Valid IP Pkt from Rx, no option, go to one port as a result of PR. AR entry is invalid. Lookup Lookup 915 Write NH MAC, stats index, eth_type Copy Aligned 3LW SRAM write 150 Enqueue pkt to QM 3LW Scratch Ring write 60 dl_sink Total: 1785