1 / 14

Block Design Review: ONL Header Format

This document reviews the header format implementation status, latency analysis, key code locations, and test procedures for performance optimization and validation. It also discusses performance goals for hitting a 5 Gb rate and outlines the semantics and optimizations related to dl_sink processing.

swillard
Download Presentation

Block Design Review: ONL Header Format

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Block Design Review:ONL Header Format Michael Wilson mlw2@arl.wustl.edu http://www.arl.wustl.edu/projects/techX

  2. Revision History • 4/10/07 (MLW): • Released • 4/11/07 (MLW) • Updates from feedback at 10 April meeting

  3. Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) Stats (1 ME) QM Copy Plugins SRAM Header Format Inputs/Outputs xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 32KW HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) Tx (1 ME) NN 64KW SRAM 32KW Each SRAM Ring NN NN NN NN Plugin4 Plugin5 Plugin1 Plugin2 Plugin3 SRAM xScale Scratch Ring NN Ring NN • slide taken from ONL_NProuter.ppt

  4. Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) Stats (1 ME) QM Copy Plugins SRAM V 1 Rsv (3b) Port (4b) Buffer Handle(24b) Ethernet DA[47-16] (32b) Ethernet DA[15-0](16b) Ethernet SA[47-32](16b) Ethernet SA[31-0] (32b) V 1 Rsv (3b) Port (4b) Buffer Handle(24b) Ethernet Type(16b) Reserved (16b) Header Format Inputs/Outputs xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 32KW HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) Tx (1 ME) NN 64KW SRAM 32KW Each SRAM Ring NN NN NN NN Plugin4 Plugin5 Plugin1 Plugin2 Plugin3 SRAM xScale Scratch Ring NN Ring NN • slide taken from ONL_NProuter.ppt

  5. Contents • Overview • Latency Analysis • Code Locations (Planned) • Test Procedures (Planned) • Implementation Status

  6. Overview • Initialization • Initialize local table of Source MAC addresses for output ports • Processing (Main Loop) • Receive handle from QM • Copy to output registers: • Buffer Handle (from NN ring if not chained, from buffer descriptor Buffer_Next otherwise) • Destination MAC, EtherType (from buffer descriptor) • Source MAC address (from local memory, indexed by port) • If chained, free the header buffer • Update Stats (index from buffer descriptor) • Forward packet to TX • Update TX Counters • Header Format will be written in C, not microcode

  7. 150 cycles 3 cycles Write Stats 60 cycles Write Handle to Freelist Mgr 60 cycles Latency Analysis Critical Path Latency:360 Cycles dl_source Negligible cycles Is Valid? No Yes Read Buffer Descriptor 150 cycles Read Source MAC Is Chained No Yes Write Buffer_Next:=NULL Write Stats 150 cycles 150 60 cycles 60 60 dl_sink 60 cycles

  8. Performance • What is our performance target? • To hit 5 Gb rate: • Minimum Ethernet frame: 76B • 64B frame + 12B InterFrame Spacing • 5 Gb/sec * 1B/8b * packet/76B = 8.22 Mpkt/sec • IXP ME processing: • 1.4Ghz clock rate • 1.4Gcycle/sec * 1 sec/ 8.22 Mp = 170.3 cycles per packet • compute budget: (MEs*170) • 1 ME: 170 cycles • 2 ME: 340 cycles • 3 ME: 510 cycles • 4 ME: 680 cycles • latency budget: (threads*170) • 1 ME: 8 threads: 1360 cycles • 2 ME: 16 threads: 2720 cycles • 3 ME: 24 threads: 4080 cycles • 4 ME: 32 threads: 5440 cycles • slide taken from ONL_NProuter.ppt

  9. dl_sink Semantics • One of my optimizations requires a change to dl_sink semantics. In pseudo-code: signal_t sig1, sig2, sig3; send_stats(stats, sig1); // 60 cycles free_block(hdr_buf, sig2); // 60 cycles dl_sink(data_buf, sig3); // 60 cycles wait(sig1, sig2, sig3) // 60+60+60=60 • As of 10 April Meeting, this optimization is no longer necessary for Header Format. • Header Format has enough slack to skip exotic optimizations • Header Format can start all of the scratch ring writes and then dl_sink, do the wait after dl_sink. PLC does not have this option, but this doesn’t impact Header Format.

  10. File locations (in …/ONL_Router/) • Code • src/hdrFormat/ONL/hdrfmt.c • Includes • src/dispatch_loop/ONL/dl_source.[h,c] • dl_source() and dl_sink() functions

  11. Test and Validation • All validation tests will done with 8 threads • Header Format has no loops and only two conditionals. All code paths will be tested once. • Invalid handle (Valid bit not set) • Unchained packet • Chained packet • Need to decide correct behavior in the face of erroneous input (port out of range) • Test back-pressure from TX through HdrFmt to QM • HdrFormat will be tested at high speeds to ensure I/O contention is not an issue

  12. Implementation Status • Still in pseudo-code • Working on a C-equivalent of the HdrFmt Stub as a framework for my implementation • Bugs • Doesn’t compile, as there is no source yet. • Untested • Everything • Optimizations not taken (but available if needed later) • The Buffer_Next field of the buffer descriptor can be read and written back-to-back because the memory controller guarantees in-order execution. Thus, we don’t need to read, check to see if we need to write, and then write. We can issue both at once and worry afterward. This won’t work with multi-buffer payload, but neither will the rest of Header Format.

  13. Extra Slides

  14. Offset (16b) Ref_Cnt (8b) ONL Buffer Descriptor Buffer_Next (32b) LW0 Buffer_Size (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) LW2 Stats Index (16b) MAC DAddr_47_32 (16b) LW3 MAC DAddr_31_00 (32b) LW4 Reserved (16b) EtherType (16b) LW5 Reserved (32b) LW6 Packet_Next (32b) LW7 1 Written by Rx, Added to by Copy Decremented by Freelist Mgr Written by Freelist Mgr Written by Rx Written by Copy Written by Rx and Plugins Written by QM

More Related