1 / 76

SPP V2 Router Design

SPP V2 Router Design. John DeHart and Mike Wilson. Revision History. 3 June 2008 Initial release, presentation 25 June 2008 Updates on feedback from presentation 27 July 2009 Current status, changes, Control documentation 24 August 2009 Updates from debugging, simulation.

larya
Download Presentation

SPP V2 Router Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPP V2 RouterDesign John DeHart and Mike Wilson

  2. Revision History • 3 June 2008 • Initial release, presentation • 25 June 2008 • Updates on feedback from presentation • 27 July 2009 • Current status, changes, Control documentation • 24 August 2009 • Updates from debugging, simulation

  3. Current Status: Summary • Memory Layout • Done, may need revisiting • Scripts (.ind files) done, missing TCAM initialization • NPUA blocks written, simulates, some GPE-to-NPE problems • NPUB broken, needs some changes • Needs RxB SRAM Ring fix • HdrFmt needs internal header fix • Recent changes to LookupB/Copy not yet added • Need some changes to TxB for chained buffers • Recent changes: • Exception, Local Delivery packets omitted in original design • Necessitates changes to Parse • Changed ResultTable indexing • Impacts LookupB/Copy

  4. SPP Versions • SPP Version 0: • What we used for SIGCOMM Paper • SPP Version 1: • Bare minimum we would need to release something to PlanetLab Users • SPP Version 2: • What we would REALLY like to release to PlanetLab users.

  5. Objectives for SPP-NPE version 2 • Deal with constraints imposed by switch • can send to only one NPU; can receive from only one NPU • split processing across NPUs • parsing, lookup on one; queuing on other • Provide more resources for slice-specific processing • Decouple QM schedulers from links • collection of largely independent schedulers • may use several to send to the same link • e.g. separate rate classes (1-10M, 10-100M, 100-100M) • optionally adjust scheduler rates dynamically • Provide support for multicast • requires addition of next-hop IP address after queueing • Enable single slice to operate at 10 Gb/s • Support “slow” code options • Use separate rate classes to limit rate to slow code options • LCI QMs for Parse, NPUB QMs for HdrFmt

  6. SPP Version 2 System Architecture Default Data Path GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade

  7. SPP Version 2 System Architecture Fast-Path Data GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade

  8. SPP Version 2 System Architecture Exception Data PathLocal Delivery GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade

  9. NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM/3 SRAM/0 TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  10. NPE Version 2 Block Diagram NPUA Scr/512 Scr/1024 NN NN NN SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) Scr/256 SRAM TCAM SPI Switch SPI Switch Switch Blade Scr/256 Scr/256 StatsB (1 ME) SRAM/3 Scr/256 SRAM/0 TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) NN Scr/256 SRAM SRAM Scr2NN/Freelist (1 ME) SRAM Scr/256 NPUB

  11. NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  12. PlanetLab NPE Input Frame from LC or GPE • Ethernet Header: • DstAddr: MAC address of NPE • SrcAddr: MAC address of LC or GPE • VLAN: One VLAN per MR (MR == Slice) • Only use lower 11 bits of Vlan Tag • IP Header: • Dst Addr: IP address of this node • How many IP Addresses can a NODE have? • Src Addr: IP address of previous hop • Protocol: UDP • UDP Header: • Dst Port: Identifies input tunnel • Src Port: with IP Src Addr identifies sending entity DstAddr (6B) Ethernet Header SrcAddr (6B) Type=802.1Q (2B) VLAN (2B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) Src Addr (4B) IP Header Dst Addr (4B) IP Options (0-40B) Src Port (2B) UDP Header Dst Port (2B) UDP length (2B) UDP checksum (2B) UDP Payload (MN Packet) PAD (nB) Ethernet Trailer CRC (4B) Indicates 8-Byte Boundaries Assuming no IP Options

  13. Local Delivery / Exceptions • GPE has separate tunnels for LD and EX • Standard filters handle these packets • No internal packet headers required, although we can still use internal headers for exceptions • Return path from GPE uses same tunnels • Standard filters handle re-classify cases • Internal packet headers from GPE to NPE are MNet-specific • Provides filter key for GPE-routed packets • Substrate headers unchanged • MN frames carry code-option-specific details, filter key • For IPv4, MN frame has IP version 0, payload has 112b lookup key to use. If GPE wants to reclassify, it sends a normal packet.

  14. V 1 Rsv (3b) Intf (4b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM Eth. Frame Len (16b) Reserved (12b) Port (4b) SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  15. RxA • No change from V1

  16. V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM Eth. Frame Len (16b) Reserved (12b) Port (4b) MN Frm Length(16b) MN Frm Offset (16b) SRAM Slice ID (VLAN) (16b) Rx UDP DPort (16b) TxB (2 ME) HdrFmt/ SubEncap (4 MEs) Rx IP SAddr (32b) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) Rx UDP SPort (16b) Reserved (12b) Code (4b) Slice Data Ptr (32b) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  17. Decap • Inputs: • Packet from RxA • Outputs: • Meta-frame (handle, offset and length) • Slice ID (VLAN tag) • Actually, lower 11b of VLAN tag and lower 4b of RX DA in (for RxID) • Metainterface (Rx Saddr, Rx Sport, Rx Dport) • Code Option (4b, only 16 available) • Slice data pointer • Initialization: • VLAN table, NPE MAC Address • Functionality: • Read VLAN tag from DRAM, determine correct code option. • Validate packet. Drop invalid, unmatched packets. • IP Options for NPE dropped in LC, should never arrive here! • Enqueue valid packets to Scratch ring. • Update stats • Status: • Works for valid packets, invalid packet handling untested

  18. VLAN table • code_option = 0 implies invalid slice • “on switch” for a slice in the data plane • SD data is currently only counters • 56B slice data • Only use lower 11b of VLAN tag (2048 VLANs) • Only changes from V1: • No longer need all data on NPUA, drop HF data, per-slice buffer limits

  19. V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM MN Frm Length (16b) MN Frm Offset (16b) MN Frm Length(16b) MN Frm Offset (16b) SRAM Lookup Key[143-112] Type(1b)/RxID(4b)/Slice ID(11b)/Rx UDP DPort (16b) Slice ID (VLAN) (16b) Rx UDP DPort (16b) Rx IP SAddr (32b) TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) Lookup Key[111-80] DA (32b) Lookup Key[ 79-48] SA (32b) Rx UDP SPort (16b) Reserved (12b) Code (4b) Lookup Key[ 47-16] Ports (32b) Slice Data Ptr (32b) Code (4b) Exception Bits (12b) Lookup Key Proto/TCP_Flags [15- 0] (16b) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  20. Parse • Inputs: • Meta-frame (handle, offset and length) • Slice ID (VLAN tag, RxID) • Tunnel ID (Rx Saddr, Rx Sport, Rx Dport) • Code Option (4b, only 16 available) • Slice data pointer • Outputs: • Meta-frame (handle, offset and length) • Lookup key (Includes slice ID, Rx UDP dport) • Code Option (4b, only 16 available) • Exception bits (MN-specific) – do we still need these? (Probably) • Initialization: • Slice Data • Functionality: • Slice-specific processing: • Parse meta-frame. • Extract lookup key. • Raise any relevant exceptions. • Can pass slice data to HdrFmt in bytes 16..30 of packet. (0..15 are reserved for AddShim) • Substrate processing: • Add substrate-specific information to lookup key (32b: Lookup type, RxID, Slice ID, Rx UDP dport) • Status: • Needs internal packet handling from GPE for GPE-specified filter keys • Needs to use "special" filter key for exception path, 0x0. Substrate processing should still pre-pend substrate-specific key information (slice, MiID) • Works for normal (LCI-to-NPE) packets

  21. MN Frm Length (16b) Rsvd(16b) Stats Index (16b) MN Frm Offset (16b) V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM MN Frm Length (16b) MN Frm Offset (16b) SRAM Slice ID (VLAN) (16b) Lookup Key[143-112] Type(1b)/RxID(4b)/Slice ID(11b)/Rx UDP DPort (16b) Exception Bits (12b) Code (4b) Result Index (32b) Lookup Key[111-80] DA (32b) TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) Lookup Key[ 79-48] SA (32b) Lookup Key[ 47-16] Ports (32b) Code (4b) Exception Bits (12b) Lookup Key Proto/TCP_Flags [15- 0] (16b) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  22. LookupA • Inputs: • Meta-frame (handle, offset and length) • Lookup key (Includes slice ID, RxID, Rx UDP dport) • Code Option (4b, only 16 available) • Exception bits • Outputs: • Meta-frame (handle, offset and length) • Lookup Result (Index into SRAM table on NPUB) • Actual max index is 0x3FFFF (Unicast), with single-bit type flag = 19 bits • Slice ID (VLAN tag) • Code Option (4b, only 16 available) • Exception bits (from Parse) • Stats Index (from TCAM) • Can this fit in the 13 bits leftover from the result index? No, result is bigger now. • Initialization: • Filters set in TCAM by control • Functionality: • Look up key in TCAM • On miss, drop the packet • Local Delivery is now a normal lookup • Lookup result is now just a 32b index (and stats index) • Status: • Written; untested. • Result size currently 48b; would like to reduce to 32b.

  23. Rsvd(16b) MN Frm Length (16b) Stats Index (16b) MN Frm Offset (16b) V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM Slice ID (VLAN) (16b) Exception Bits (12b) Code (4b) Result Index (32b) TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  24. AddShim • Inputs: • Meta-frame (handle, offset and length) • Lookup Result (Index into SRAM table on NPUB) • Slice ID (VLAN tag) • Code Option (4b, only 16 available) • Exception bits (from Parse) • Stats Index (from TCAM) • Outputs: • Shim Packet (buffer handle) • Buffer descriptor contains updated offset and length, if needed • Initialization: • None. • Functionality: • Prepend shim header to preserve packet annotations across NPU’s • Overwrite the existing ethernet header (Up to 18B) with: • Slice ID (16b) • Code Option (4b) • Exception Bits (12b) • MN Frame Offset (16b) • MN Frame Length (16b) • Result Index (32b) • Stats Index (16b) [This is the same on NPUA, NPUB] • 30B for opaque slice data. • Proper memory alignment required • This is written by Parse, not AddShim! • Status: • Written. Works for properly aligned packets. Needs optimization.

  25. V 1 Rsv (3b) Intf (4b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  26. TxA • Sends shim packet to NPUB. • Unmodified 10 Gbps Tx 2×ME.

  27. SPP Version2 NPUA to NPUB Frame • SHIM (16B) • Slice ID (16b) • Code Option (4b) • Exception Bits (12b) • Result Index (32b) • Stats Index (16b) • Offset of MN Packet (16b) • Length of MN Packet (16b) • Memory Alignment Padding (2B) • IP Header, UDP Header may be overwritten by: • opaque slice data, written in Parse SHIM (16B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) Src Addr (4B) IP Header Dst Addr (4B) IP Options (0-40B) Src Port (2B) UDP Header Dst Port (2B) UDP length (2B) UDP checksum (2B) UDP Payload (MN Packet) PAD (nB) Ethernet Trailer CRC (4B) Indicates 8-Byte Boundaries Assuming no IP Options

  28. Reserved (8b) Buffer Handle(24b) Eth. Frame Len (16b) Reserved (12b) Port (4b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  29. RxB • Needs to switch from NN output to Scratch or SRAM • Comments in code indicate SRAM should work • Supporting code seems to be only for scratch rings • Needs further examination • DZar notes there are some obscure #define's needed for SRAM rings.

  30. Reserved (8b) Reserved (8b) Buffer Handle(24b) Buffer Handle(24b) Eth. Frame Len (16b) Reserved (12b) Port (4b) QM 2b Sch 3b PerSchedQID (15b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) Reserved (12b) StatsA (1 ME) SRAM TCAM Frame Length (16b) Stats Index (16b) SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  31. LookupB/Copy • Inputs: • Shim packet (buffer handle, frame length) • Outputs: • Packet (buffer handle, frame length) • QueueID (QM, Scheduler, Queue ID) • Stats Index • Initialization: • ResultTable (unicast+multicast) • local endpoint table • Ethernet SAddr • Per-slice Packet Limits • Functionality (Overview) • Copy shim header into buffer descriptor • Look up routing information from result index • If multicast, make the copies • Enqueue to correct QM (from ResultTable) • Status • Written, broken. • Needs changes to handling of ResultTable; result indices are now absolute, not per-slice.

  32. LookupB/Copy – Code Sketch if not currently processing mcast packet read packet from SRAM ring extract shim load ResultTable value fill buffer descriptor if unicast if per-slice packet limit permits update per-slice packet count write to SRAM ring for correct QM. (By qmschedID in result table value). else drop buffer else start mcast processing if per-slice packet limit permits update per-slice packet count fetch first header buffer descriptor if payload length ≠ 0 write ref count into payload descriptor else drop payload buffer else drop buffer finish mcast processing else (Currently processing buffer, have empty header buffer handle) fill header buffer descriptor only chain if payload buffer is not empty if still making copies fetch next header buffer descriptor else finish mcast processing write current header buffer handle to SRAM ring for correct QM. (By qmschedID). signal next ME

  33. Per Sched Entry: IP SAddr (32b) Eth DA (48b) ResultTable – Unicast • Data needed to enqueue, rewrite packet: • Fanout: Ignored (Memory padding) • QID • QMID, SchedID, QID (20b) (Lookup Result) • Src MI: • IP Saddr (32b) (Per SchedID Table) • UDP Sport (16b) (Lookup Result) • Tunnel Next Hop • IP DAddr (32b) (Lookup Result) • IP DPort (16b)(Lookup Result) • Chassis Addressing • Ethernet Dst MAC (48b) (Per SchedID Table) • Slice Specific Lookup Result Data (?) (Lookup Result) • Ethernet Src MAC • Should be constant across all pkts. Results Entry: Fanout (4b) QID (20b) IP DAddr (32b) UDP DPort (16b) UDP SPort (16b) HFIndex (16b)

  34. Per Sched Entry: IP SAddr (32b) Eth DA (48b) ResultTable – Multicast • Fanout gives the number of copies (0..15) • Data needed per copy on NPUB: • QID • QMID, SchedID, QID (20b) (Lookup Result) • Src MI: • IP Saddr (32b) (Per SchedID Table) • UDP Sport (16b) (Lookup Result) • Tunnel Next Hop • IP DAddr (32b) (Lookup Result) • IP DPort (16b)(Lookup Result) • Chassis Addressing • Ethernet Dst MAC (48b) (Per SchedID Table) • Slice Specific Lookup Result Data (?) (Lookup Result) • Ethernet Src MAC • Should be constant across all pkts. • Support Multicast but optimize for Unicast Results Entry: Fanout (4b) QID (20b) IP DAddr (32b) ×16 UDP DPort (16b) UDP SPort (16b) HFIndex (16b)

  35. Reserved (8b) Buffer Handle(24b) Reserved (12b) Frame Length (16b) Stats Index (16b) QM 2b Sch 3b PerSchedQID (15b) V 1 Rsv (3b) Intf (4b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  36. QM • No change from V1 • Incorporates change to limit queues by #pkts • Some changes in how control allocates bandwidth • Need to ensure that slow HdrFmt blocks can’t tie up the system • Currently looking at worst-case engineering • (everyone runs at slowest block speed)

  37. V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  38. HdrFmt / SubEncap • Inputs: • Buffer Handle • Remaining inputs come from Buffer Descriptor: • Multicast or Unicast (from buffer_next) • Frame length, offset • HFIndex (index into HFTable, a slice-specific table) • ResultIndex (for tunnel headers) • Outputs: • Packet (buffer handle) • Buffer descriptor contains updated offset and length • Initialization: • HFTable, containing slice-specific data. For IPv4, this is unused. • ResultTable, tunnel header information • Functionality: • Substrate level: • read buffer descriptor and pass frame offset, length, HFIndex, mcast/ucast to slice-specific HdrFmt • Slice level: arbitrary processing. • For IPv4, this writes any next-hop information. • Except for redirects such as exception packets, effectively does nothing. • Substrate level: • Encapsulate for output tunnel (from ResultTable) • Update stats • Status: • Revisit multicast model • Needs Internal Header code (Missing!)

  39. V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  40. Scr2NN/FreelistMgr • Inputs: • Buffer Handle (possibly chained) • Outputs: • Buffer Handle (possibly chained) • Initialization: • None • Functionality: • Combines Freelist Manager with Scr2NN glue • FM: Read from scratch ring. Free buffers, correctly handling chained buffers and reference counts. • Scr2NN: Read from Scratch, write to NN. • Status: • Needs to be reworked from scratch; my method of combining was wrong and could (probably would) deadlock. • Both blocks exist, but combining them is not straight-forward. • Open question: how should we prioritize among these tasks? The author should ensure that no deadlock is possible. (TxB writes to FM; if FM ring is full, TxB stalls. If Scr2NN is writing to TxB, it stalls. Gridlock.) • As of August 2009, we'll use a temporary 4×4 thread split and revisit later.

  41. V 1 Rsv (3b) Intf (4b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  42. TxB • Must support chained buffers • Multicast uses header buffers and payload buffers • Headers are slice-specific; we can’t rely on known, static lengths as we did in ONL. • Sends header from one buffer, payload from chained buffer. • Can TX do this? Comments in the code seem to imply that chained (non-SOP) buffers must start at offset 0. Our payloads usually won’t. • This will probably take some TX modification, but there’s no reason why it won’t work. Might have a performance penalty, of course…. [DZar]

  43. SPP V2 SideB SRAM Buffer Descriptor Offset (16b) Ref_Cnt (8b) Buffer_Next (32b) LW0 Buffer_Size (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) LW2 Stats Index (16b) Reserved (4b) Slice ID(xsid)(12b) LW3 HFIndex (16b) MR Exception Bits (16b) LW4 ResultIndex (32b) LW5 MR Bits (optional) (32b) LW6 Packet_Next (32b) LW7 Written by Rx, Added to by Copy Decremented by Freelist Mgr Written by Freelist Mgr Written by Rx Written by LookupB/Copy Written by Rx orLookupB/Copy Written by QM

  44. Offset (16b) SPP V2 SideB SRAM Buffer Descriptor Buffer_Next (32b) LW0 Buffer_Size (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) LW2 Stats Index (16b) Reserved (4b) Slice ID(xsid)(12b) LW3 HFIndex (16b) MR Exception Bits (16b) LW4 ResultIndex (32b) LW5 MR Bits (optional) (32b) LW6 Packet_Next (32b) • HFIndex is an index into the HFTable. Unused in IPv4. • May not be needed in Buffer Descriptor, since SubstrateEncap can fetch it using ResultIndex • ResultIndex is used to get tunnel header info from the ResultTable LW7

  45. SPP v2 Control • New data path adds new Control requirements • Heterogeneous MNet execution times • Control must select parameters for LCI QMs, NPUB QMs to avoid Parse, HdrFmt execution lag • Slice is now partial VLAN tag • Must ensure all VLAN tags have distinct low 11b • Filter/Results now split across NPUA, NPUB • Must coordinate updates to multiple data locations • Synchronization issues require some care in Control

  46. SPP v2 Control • NPUA Data areas requiring Control setup • NPE MAC address at • IPV4_SD_MAC_ADDR_HI32 • IPV4_SD_MAC_ADDR_lo16 • VLAN Table • Used by Decap, Parse • Maps VLANs to code options, data areas • 2048-entry table at PL_SD_VLAN_CODE_OPT_TABLE_BASE struct { unsigned int code_opt; // only 4 lsb used unsigned int slice_data_ptr; unsigned int slice_data_size; }

  47. SPP v2 Control • Data areas requiring Control setup • VLAN Table -cont'd- • Pointer to slice-specific SRAM areas • Slice owners request amount needed • (IPv4 code option needs 72B for counters) • Control must pass along Slice owner initialization data • Control can allocate in any 4B aligned location within Bank 3 addresses 0x300000..0x7FFFFF (upper 5MB of BANK3) • Each slice-specific region must be at least SLICE_DATA_ENTRY_SIZE_MINIMUM (56B) in size • Each code option has different additional size needs • E.g., for IPv4, 56+64=128B total • E.g., for i3, 56+3200 = 3256B total

  48. SPP v2 Control • Data areas requiring Control setup • TCAM filters • Used by LookupA • Tightly interlinked with tables on NPUB

  49. SPP v2 Control • NPUB Data areas requiring Control setup • NPE source MAC address (HdrFmt/SubstrateEncap) • LC_MAC_ADDR_HI32 • LC_MAC_ADDR_LO32 • Per-Slice (2048) packet limits table (LookupB/Copy) at LC_PER_SLICE_PACKET_LIMIT_BASE struct { unsigned int current; unsigned int maximum; unsigned int overLimits; } • Queue Manager parameters • Must properly rate limit both bandwidth and slow HdrFmt code options • No heterogeneous HdrFmt code options yet

  50. SPP v2 Control • NPUB Data areas requiring Control setup • Result Table • Used by LookupB/Copy, HdrFmt/SubstrateEncap • Results corresponding to TCAM lookups • Links to per-QM scheduler tunnel endpoint values • Also links to per-slice HdrFmt data areas

More Related