1 / 50

USB Performance Analysis of Bulk Traffic

USB Performance Analysis of Bulk Traffic. Brian Leete brian.a.leete@intel.com. Introduction. Bulk Traffic Designed for reliable, highly variable data transfer No guarantees are made in the specification for throughput Is scheduled last after ISOC, Interrupt, and Control

ziva
Download Presentation

USB Performance Analysis of Bulk Traffic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. USB Performance Analysis of Bulk Traffic Brian Leete brian.a.leete@intel.com

  2. Introduction • Bulk Traffic • Designed for reliable, highly variable data transfer • No guarantees are made in the specification for throughput • Is scheduled last after ISOC, Interrupt, and Control • Throughput is dependant on many factors

  3. Introduction • We will look at Bulk Throughput from the following aspects • Distribution of Throughput for Various Packet Sizes and Endpoints • Low Bandwidth Performance • Small Endpoint Performance • Nak Performance • CPU Utilization • PCI bus Utilization

  4. Test Environment -- Hardware • PII 233 (8522px) with 512 Bytes Cache • Atlanta Motherboard with 440LX (PIX 4A) Chipset • 32 Meg Memory • Symbios OHCI Controller (for OHCI Measurements) • Intel Lava Card as Test Device

  5. Test Environment -- Software • Custom Driver and Application • Test Started by IOCTL • IOCTL allocates static memory structures, submits IRP to USBD • Completion routine resubmits next buffer • All processing done at ring 0, IRQL_DISPATCH

  6. Terminology • A “Packet” is a Single Packet of Data on the Bus. It is determined by Max Packet Size of the Device • Valid numbers are 8, 16, 32, 64 • A Buffer is the amount of data sent to USBD in a Single IRP. • In this presentation buffers range from 8 Bytes to 64K Bytes • Unless otherwise specified, Most Data Taken at 64 Byte Max Packet Size, 15 Endpoints Configured in the System

  7. Host Controller Operation (UHCI)

  8. Oscillations @ 256, 512 ByteBuffers Single Endpoint Throughput Flat Throughput @ 512 and 1024 Byte Buffers Small Buffer Throughput

  9. Small Buffer Throughput • For Buffer Sizes < Max Packet Size • Host Controller sends 1 Buffer per Frame • No Ability to Look Ahead and Schedule Another IRP Even Though Time Remains in the Frame • Why is this?

  10. Interrupt Delay

  11. Single Endpoint Graph • Flat Throughput @ 1024 and 512 Byte Graphs • Single Ended Throughput for 64K Byte Buffers Below Theoretical Max of 1216000 Bytes per Second • Both are explained by Looking at the Number of Packets per Frame

  12. Maximum Packets per Frame

  13. Throughput for Multiple Endpoints 512 Byte Buffers

  14. 512 Byte Buffers 1 Endpoint • 8 Packets * 64 Bytes per Packet = 512,000 B/S • 511986 Measured

  15. 512 Byte Buffers 2 Endpoints • 16 Packets * 64 Bytes per Packet = 1,024,000 B/S • 1,022,067 B/S Measured • Notice that Interrupt Delay is not a factor here!

  16. 512 Byte Buffer -- 3 Endpoints • 24 Packets * 64 Bytes / 2 Frames = 768,000 B/S • 776,211 Measured

  17. High End Throughput 18 PPF VS 17 PPF Single Ended Throughput 900,000 VS 950,000 B/S Flat Throughput @ 512 and 1024 B Buffers Oscillations @ 256 and 512 B buffers Small Buffer Throughput

  18. Minimal Endpoint Configuration

  19. Higher Single Endpoint Throughput 17 VS 15 PPF

  20. Host Controller Operation (UHCI)

  21. Results • We are working with Microsoft to remove unused endpoints from the Host Controller Data Structures

  22. More Endpoints get 18 Packets per Frame Higher Single Endpoint Throughput

  23. Distribution of Throughput across Endpoints

  24. Results • We are working with Microsoft to get the Host Controller driver to start sending packets at the next endpoint rather than starting over at the beginning of the frame.

  25. Limited Bandwidth Operation

  26. Small Endpoint Performance

  27. If you care about throughput…. • Use 64 byte Max Packet Size Endpoints • Use Large Buffers

  28. Nak Performance

  29. 45 % Drop in Total Throughput

  30. CPU Utilization

  31. CPU Utilization • Idle process incrementing a counter in main memory • Designed to simulate a heavily CPU bound load • Numbers indicate how much “work” the CPU could accomplish after servicing USB traffic • Higher numbers are better • Small buffers and large numbers of Endpoints take more overhead • Software Stack Navigation • Endpoint 0 is the Control -- No USB Traffic running

  32. PCI Utilization

  33. PCI Utilization (UHCI) • 15 Endpoint Configuration • For low numbers of active endpoints, Host Controller must poll memory for each unused endpoint, causing relatively high utilization. • Removing unused endpoints will lower single endpoint PCI utilization for this configuration.

  34. Conclusions • UHCI Host Controller Driver needs a few tweaks • Need to get Host Controller to start sending packets where it last left off rather than at endpoint 1. • Needs to remove unused endpoints from the list • Performance Recommendations • Use 64 Byte Max Packet Size Endpoints • Large Buffers are better than small buffers • Reduce NAK’d traffic • Fast devices if possible

  35. Future Research Topics • Multiple IRPS per Pipe • USB needs to control throughput to the slow device • Small Endpoints aren’t good • Small Buffers aren’t good • NAKing isn’t good

More Related