150 likes | 355 Views
Code Review for IPv4 Metarouter Header Format. Jing Lu jl1@arl.wustl.edu. QM. Header Format. Header Format. Lookup. Tx. Rx. Substr Decap. Parse. Main functions: Put on MN Internal header (slow path), tunnel frame header (IP/UDP header) and Ethernet VLAN header based on:
E N D
Code Review for IPv4 MetarouterHeader Format Jing Lu jl1@arl.wustl.edu
QM Header Format Header Format Lookup Tx Rx Substr Decap Parse • Main functions: • Put on MN Internal header (slow path), tunnel frame header (IP/UDP header) and Ethernet VLAN header based on: • Exception flags raised by Parse block • TTL expired: bit 0 of exception flags • IP option: bit 1 of exception flags • Lookup result • Hit, Drop, Local delivery bits • If Rx UDP DPort = Tx UDP SPort, packet should be redirected • Increment pre-queue packet counter and byte counter for each incoming packet based on counter index • Update buffer descriptor with new buffer/packet size, buffer offset and counter index • pass relevant fields to QM • NN communication • Single thread
Where is the code • Dispatch loop: • IPv4_MR\src\dispatch_loop\PL\hdr_format_dl.[c,h] • IPv4_MR\src\dispatch_loop\PL\dl_source.[c,h] • IPv4_MR\src\dispatch_loop\PL\nn_rings.[c,h] • Header format: • IPv4_MR\src\hdr_format\PL\hdr_format.[c,h] • Ipv4 header format: • IPv4_MR\src\ipv4\PL\ipv4_hdr_format.[c,h] • External Dependencies: • Ring Data format: • IPv4_MR/src/dispatch_loop/PL/ring_formats.h • System definitions and memory locations: • IPv4_MR/build/PL/dispatch_loop/dl_system.h
Required Includes • Files • IXA_SDK_4.0\microengineC\src\intrinsic.c • IXA_SDK_4.0\microengineC\src\rtl.c • Directories • IXA_SDK_4.0\src\library\microblocks_library\microc\ • IXA_SDK_4.0\MicroengineC\include\..\..\..\..\ • IXA_SDK_4.0\src\library\dataplane_library\microc\ • These are required to gain access to the buffer libraries and intrinsic functions!
QM Input and Output Buf Handle(32b) Rsv_1 (4b) QID(20b) Port (4b) Rsv_2 (4b) MN Fram Length (16b) Cntr Index (16b) Lookup Hdr Format Buf Handle(32b) IP Pkt Length (16b) IP Pkt Offset (16b) Slice ID (VLAN) (16b) Rx UDP DPort(16b) R S V d (1b) H (1b) L D (1b) D (1b) Exception Bits (12b) Cntr Index (16b) H: Hit D: Drop LD: Local Delivery Exception[0]: TTL Exception[1]: IP Option Tx IP DAddr (32b) Tx UDP DPort (16b) Tx UDP SPort(16b) DA(8b) Port (4b) QID(20b) Slice data pointer (32b) Rsvd_2(28b) Code opt (4b)
Initialization typedef struct _hdr_format_control_block { unsigned int eth_addr_hi32; unsigned int eth_addr_lo16; unsigned int this_ip_addr; unsigned int gpe_num; gpe_info_table *gpe_table_ptr; } hdr_format_control_block; • Static configuration by XScale (Control block) • This metra router • Ethernet address • IP address • Number of GPE • Pointer to the GPE Info table • GPE table • Low 8 bits of Ethernet address • Low 8 bits of IP address • Local Delivery: • UDP DPort • Queue ID • Other exceptions: • UDP DPort • Queue ID • Hard coded configuration (#define) • Local delivery: • UDP SPort • QM port • Other exceptions: • UDP SPort • QM port typedef struct _gpe_info_table { unsigned int gpe_eth_addr_lo8; unsigned int gpe_ip_addr_lo8; unsigned int ld_udp_dst_port; unsigned int ld_qid; unsigned int excpt_udp_dst_port; unsigned int excpt_qid; } gpe_info_table;
Global Variables • Externally defined global variables: • hdr_format_dl.c • ring_in • ring_out • dlNextBlock • Initialization variables shared by all threads • this_ip_addr • gpe_ip_addr • eth_addr_hi32 • this_eth_addr_lo16 • gpe_eth_addr_lo16 • ld_udp_dst_port • ld_port_qid • excpt_udp_dst_port • excpt_port_qid • partial_ip_cksum • ipv4_header_format_init() will read the control block in SRAM and initialize these variables with the corresponding values
Header Data Structure DstAddr (6B) Ethernet VLAN Header SrcAddr (6B) Type=802.1Q (2B) VLAN (2B) Type=IP (2B) Ver/HLen/Tos (2B) Len (2B) ID/Flags/FragOff set(4B) TTL (1B) IP Header Header Protocol = UDP (1B) Hdr Cksum (2B) Dst Addr (4B) Src Addr (4B) Src Port (2B) UDP Header Dst Port (2B) UDP length (2B) UDP checksum (2B) Type, hdr_length (2B) MN Internal Header Rx UDP DPort (2B) Type dependent data (8B)
Function and Performance Memory access: Functions: Processing cycles: Dequeue ring_in data NN: 9W reads 42/42 Construct MN int hdr 44/86 Construct IP, UDP, Ethernet, VLAN hdr 64/73 12/12 Set IP checksum 11/11 Set UDP checksum DRAM: 46-58B writes 37/40 Write hdr to DRAM Inc Pre_queue Cnt SRAM: 8B writes 15/15 Update buffer descriptor SRAM: 10B writes 66/66 Enqueue ring_out data NN: 3W writes 27/27 318/372
Performance • 372 cycles for CPU processing • ~1300 cycles latency • Expected performance (90B min IPv4 packet (78 min IPv4MN + 12B IFS)) • (201/372)*5Gbps = 2.7Gbps • To achieve 5Gbps, need two MEs running in parallel
Type Rx UDP DPort (2B) Length (2B) Tx UDP DPort (2B) Tx UDP SPort (2B) Type Dependent Data (8B) Tx IP DAddr (4B) IPv4 Internal Header Format • Type dependent data is the FwdKey = [Tx UDP DPort + Tx UDP Sport + Tx IP DAddr]
Construct ipv4 MN Internal header Yes Drop bit set? No Yes Hit bit set? No No No No No TTL expired? Set NR bit in type Local DL? IP option? Redirect? Yes Yes Yes Yes No Set TTL bit in type; Set Rx UDP DPort; Length = 4 Set LD bit in type; Set Rx UDP DPort; Length = 4 Set OPT bit in type; Set Rx UDP DPort; Set TypeDependData; Length = 12 Set RD bit in type; Set Rx UDP DPort; Set TypeDependData; Length = 12 TTL expired? Yes Set TTL bit in type Length = 4 44 cycles for the common case 86 cycles for the worst case return
Testing MR Header Format Stub Parse Dummy Lookup Hdr Format Buf Handle(32b) Buf Handle(32b) H: Hit D: Drop LD: Local Delivery Exception[0]: TTL Exception[1]: IP Option IP Pkt Length (16b) IP Pkt Offset (16b) IP Pkt Length (16b) IP Pkt Offset (16b) Slice ID (VLAN) (16b) Rx UDP DPort(16b) Lookup Key[143-112] Slice ID/Rx UDP DPort (32b) R S V d (1b) H (1b) L D (1b) D (1b) Exception Bits (12b) Cntr Index (16b) Lookup Key[111-80] DA (32b) Lookup Key[ 79-48] SA (32b) Tx IP DAddr (32b) Lookup Key[ 47-16] Ports (32b) Tx UDP DPort (16b) Tx UDP SPort(16b) L Flags (4b) Exception Bits (12b) Lookup Key Proto/TCP_Flags [15- 0] (16b) DA(8b) Port (4b) QID(20b) Slice data pointer (32b) Slice Data Ptr (32b) Rsvd_2(28b) Code opt (4b) Rsvd (28b) Code opt (4b) • Dummy Lookup block enumerates all combinations of the five bits and generates corresponding NN ring data to Hdr Format.
Possible Optimizations Processing cycles: Memory access: Optimizations: Functions: • More efficient Dequeue NN: 9W reads 42/42 -10 Dequeue ring_in data • Reduce redundant assignments for worst case 44/86 -15 Construct MN int hdr • Static fields only initialized by the first packet in each thread 64/73 -20 Construct IP, UDP, Ethernet, VLAN hdr 12/12 Set IP checksum 11/11 Set UDP checksum DRAM: 46-58B writes 37/40 Write hdr to DRAM • Aligned sram writes, use assembler SRAM: 8B writes 15/15 -6 Inc Pre_queue Cnt • Similar to DRAM writes SRAM: 10B writes 66/66 -30 Update buffer descriptor NN: 3W writes 27/27 Enqueue ring_out data 318/372 -81
Implementation Status • Add dynamic statistics • Packet counter for fast path packets • Packet counter for exception path packets • Packet counter per exception case • Decide which field in buffer descriptor to store counter index • Run 8-thread simulation