1 / 22

Developing TCP Chimney Drivers for Windows 7

Developing TCP Chimney Drivers for Windows 7. Joe Nievelt Vivek Bhanu Software Design Engineer TCP/IP - Networking joeniev@microsoft.com vbhanu@microsoft.com. Agenda. Overview Architecture Chimney Offload Overview Requirements for Chimney Targets Windows Implementation Specifics

ling
Download Presentation

Developing TCP Chimney Drivers for Windows 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing TCP Chimney Drivers for Windows 7 • Joe Nievelt • Vivek Bhanu • Software Design Engineer • TCP/IP - Networking • joeniev@microsoft.com • vbhanu@microsoft.com

  2. Agenda Overview Architecture Chimney Offload Overview Requirements for Chimney Targets Windows Implementation Specifics High Performance Considerations Contacts / References

  3. Overview Reduce server’s CPU utilization due to TCP for applications with long-lived connections • Fewer interrupts because TCP Ack packets processed in offload target • Zero copy receives for apps that pre-post receive buffers • Host stack implements all network management protocols Administrator can control connections eligible/ineligible for offload • netsh [add | delete] [chimneyports | chimneyapplications] Administrator can view chimney statistics such as number of offloaded connections • netsh interface tcp show [chimneystats | chimneyports | chimneyapplications] Windows Server 2008 R2 focus • 10 Gb Ethernet • Stability • Logo clarification • Characterizing file and web server workload performance

  4. Architecture • Application: Existing binaries run over either software stack or hardware • TCP Switch: Controls whether data transfer is through the host stack or the offload target stack Application Offload Target TCP Switch TCP Network Layer Framing Layer

  5. Implementing Chimney Offload Register optional miniport handlers with NDIS Generic Chimney miniport handlers manage 3 types of state variables • Const: provided by stack and never change • Cached: provided by stack and updated through MiniportUpdateOffload • Delegated: initialized by stack and queried through MiniportQueryOffload TCP Chimney miniport handlers manage send and receive • Initiation of send and receive is serialized per connection • Sending/receiving are not serialized across separate connections

  6. Offload Block List – Depth First Traversal

  7. Application Interaction Most major network applications fall into three categories Pre-post • Application keeps receive requests outstanding • NIC may DMA directly to the posted buffer, avoiding a copy operation • Examples: Backup applications • Benefits the most from Chimney Indicate and post • Application waits for receive indication, may partially consume, then posts a receive for the remainder • Requires copy from indication buffer to posted receive buffer • Examples: SMB, iSCSI Indicate and consume • Application waits for receive indication, consumes entirely • Examples: legacy TDI applications • Benefits the least from Chimney

  8. Requirements for Chimney Targets List of RFCs to implement • TCP: 793, 813, 1122, 1323, 2018, 2581, 2582, 2923, 2988, 3042, 3465, 3517 • IP: 791, 894, 1042, 1191, 1122, 2461 • Consult the Chimney WDK and Logo Requirements for the rest RFCs often provide multiple approaches to the same problem • E.g. Should a TCP zero window probe contain data? 793 & 1122 are ambiguous Use the Windows stack behavior as a guideline • Provide the same level of security and performance as Windows • Avoid interoperability problems with applications

  9. Chimney & Receive-Side Scaling (RSS) RSS distributes receive indications across processors • Uses 4-tuple hash and indirection table to determine processor for an indication • Prevents a single processor from becoming a bottleneck • Processor load is not necessarily even across the system With Chimney, processing may still be bottlenecked on a single CPU • Applications may process traffic in the receive context • Applications that do their own load distribution incur context switching costs Chimney with RSS allows applications to scale

  10. Handling Indirection Table Updates with Chimney • Each connection receives all indications on one processor at a time Indications may be in progress on one CPU as the connection is redirected to another CPU Indicating to multiple processors at once creates timing conditions where reordering may occur Non-offloaded connections can tolerate out-of-order packet indications at a performance cost Offloaded connections cannot tolerate out-of-order receive indications or completions Offload indications on the original processor must complete before beginning indications on the new processor

  11. Chimney & Virtualization Chimney capabilities are exposed to child VMs in Windows 7 Existing drivers work without modification • Source MAC address will vary Live migration is supported Collecting TCP and IP statistics per source MAC address • Improves manageability and diagnostics • Not a Logo requirement for 6.20 drivers Coexistence with virtual machine queue (VMQ) • Windows 7 will use only VMQ if both chimney and VMQ are available

  12. Receive Window Auto-Tuning • Default receive window size may limit connection throughput in high Bandwidth Delay Product situations • Windows tries to make its RcvWnd at least as large as the peer’s CWnd so that it isn’t a bottleneck • Vendors may implement any algorithm of their choice

  13. Receive Window Auto Tuning (contd.) Indicate the RcvWnd reported to the peer as part of upload • Exclude the bytes buffered from the maximum window advertised Make sure a fine grained RTT estimate is reported in SRTT Avoid feedback loops in which RcvWnd restricts the sender unnecessarily Don’t shrink the RcvWnd right edge

  14. Windows Implementation Specifics Zero Window Probing • RFCs 793 / 1122 allow zero window probes to contain no data or one (fake) byte of new data which must be ignored by receivers • RFCs don’t mention a FIN being sent as part of a zero window probe • Windows generates zero window probes with one byte of new data and may generate one with the FIN flag set • Logo requires window probes with one byte of new data Retransmission Timeout • Windows offloads connections with SRTT & RTTVAR represented as 8xSRTT • RTTVAR sent as 4xRTTVAR • Logo requires minimum value of 300ms for RTO • Logo requires maximum of 30s for RTT sample

  15. Windows Implementation Specifics (contd.) Silly Window Syndrome (SWS) • Chimney NICs must store the value of the largest window received • For performance reasons, Windows ignore SWS suppression if a sub-MSS segment would reach a push boundary • Logo requires that SWS be ignored if it can reach push boundary Black Hole Detection • Many black hole routers still out there • Logo requirement to support RFC 2923 TCP ACK Frequency • RFC 1122 suggests sending ACK for every 2 segments, with a sub 500ms timeout • Windows allows the frequency to be configured, sending an ACK for every N packets, default N=2 • Windows allows the timeout to be configured with 10ms granularity, default 200ms

  16. Windows Implementation Specifics (contd.) Keep Alive (KA) Timer • Logo requires that duplicate data segments reset the KA timer Receive Window Updates • If delivery and ACK frequency overlap to generate ACK segments, consolidate them to reduce network traffic Appropriate Byte Counting • RFC 3465 specifies: CWnd += (BytesAcked>= CWnd) ? MSS : 0 • Windows uses: CWnd += max((MSS * min(MSS * L, BytesAcked)) /CWnd, 1) • Logo accepts Windows CWnd calculation or the simpler RFC 3465 calculation Loss Recovery • RFC 2581 specifies: SsThresh = max (FlightSize / 2, 2*SMSS) • Windows uses: SsThresh = max(2*SMSS, min(CWnd, RcvWnd) / 2) • Logo requires the latter calculation

  17. Windows Implementation Specifics (contd.) TCP Reassembly • Many caveats around reassembly • Conflicts with out of order data • Conflicts with FIN • Possible resource constraints around reassembly holes • Must support at least 2 reassembly holes • Must be prepared to extend holes in either direction and coalesce Generation of RESET segment • RST is generated by Windows for: • In window SYN • Expiration of FIN_WAIT_2 timer • RexmitCount expiry • Others • Chimneys must generate RST only if the application aborted the connection • Upload the connection in the other cases

  18. Timer Implementation Accurate timers are needed to minimize the difference between uploaded and offloaded connections Accurate timers also improve round trip time and bandwidth calculations Logo requirement • Resolution must be 10ms or better • Must not drift significantly compared to typical CPU timers

  19. Call to Action Implement per-source MAC address statistics for virtualization Support receive-side scaling with TCP Chimney Implement receive window auto-tuning Design and develop with Logo requirements in mind

  20. Resources Windows Server 2008 Chimney WDK and Logo Kit: http://connect.microsoft.com • Windows 7 WDK will be available as of WinHEC Windows Logo Program Web Site: http://www.microsoft.com/whdc/winlogo/default.mspx NDIS 6 Feedback alias:ndis6fb@microsoft.com Test and Logo questions:offloadt@microsoft.com

  21. Related Sessions

More Related