500 likes | 754 Views
USB Performance Analysis of Bulk Traffic. Brian Leete brian.a.leete@intel.com. Introduction. Bulk Traffic Designed for reliable, highly variable data transfer No guarantees are made in the specification for throughput Is scheduled last after ISOC, Interrupt, and Control
E N D
USB Performance Analysis of Bulk Traffic Brian Leete brian.a.leete@intel.com
Introduction • Bulk Traffic • Designed for reliable, highly variable data transfer • No guarantees are made in the specification for throughput • Is scheduled last after ISOC, Interrupt, and Control • Throughput is dependant on many factors
Introduction • We will look at Bulk Throughput from the following aspects • Distribution of Throughput for Various Packet Sizes and Endpoints • Low Bandwidth Performance • Small Endpoint Performance • Nak Performance • CPU Utilization • PCI bus Utilization
Test Environment -- Hardware • PII 233 (8522px) with 512 Bytes Cache • Atlanta Motherboard with 440LX (PIX 4A) Chipset • 32 Meg Memory • Symbios OHCI Controller (for OHCI Measurements) • Intel Lava Card as Test Device
Test Environment -- Software • Custom Driver and Application • Test Started by IOCTL • IOCTL allocates static memory structures, submits IRP to USBD • Completion routine resubmits next buffer • All processing done at ring 0, IRQL_DISPATCH
Terminology • A “Packet” is a Single Packet of Data on the Bus. It is determined by Max Packet Size of the Device • Valid numbers are 8, 16, 32, 64 • A Buffer is the amount of data sent to USBD in a Single IRP. • In this presentation buffers range from 8 Bytes to 64K Bytes • Unless otherwise specified, Most Data Taken at 64 Byte Max Packet Size, 15 Endpoints Configured in the System
Oscillations @ 256, 512 ByteBuffers Single Endpoint Throughput Flat Throughput @ 512 and 1024 Byte Buffers Small Buffer Throughput
Small Buffer Throughput • For Buffer Sizes < Max Packet Size • Host Controller sends 1 Buffer per Frame • No Ability to Look Ahead and Schedule Another IRP Even Though Time Remains in the Frame • Why is this?
Single Endpoint Graph • Flat Throughput @ 1024 and 512 Byte Graphs • Single Ended Throughput for 64K Byte Buffers Below Theoretical Max of 1216000 Bytes per Second • Both are explained by Looking at the Number of Packets per Frame
512 Byte Buffers 1 Endpoint • 8 Packets * 64 Bytes per Packet = 512,000 B/S • 511986 Measured
512 Byte Buffers 2 Endpoints • 16 Packets * 64 Bytes per Packet = 1,024,000 B/S • 1,022,067 B/S Measured • Notice that Interrupt Delay is not a factor here!
512 Byte Buffer -- 3 Endpoints • 24 Packets * 64 Bytes / 2 Frames = 768,000 B/S • 776,211 Measured
High End Throughput 18 PPF VS 17 PPF Single Ended Throughput 900,000 VS 950,000 B/S Flat Throughput @ 512 and 1024 B Buffers Oscillations @ 256 and 512 B buffers Small Buffer Throughput
Results • We are working with Microsoft to remove unused endpoints from the Host Controller Data Structures
More Endpoints get 18 Packets per Frame Higher Single Endpoint Throughput
Results • We are working with Microsoft to get the Host Controller driver to start sending packets at the next endpoint rather than starting over at the beginning of the frame.
If you care about throughput…. • Use 64 byte Max Packet Size Endpoints • Use Large Buffers
CPU Utilization • Idle process incrementing a counter in main memory • Designed to simulate a heavily CPU bound load • Numbers indicate how much “work” the CPU could accomplish after servicing USB traffic • Higher numbers are better • Small buffers and large numbers of Endpoints take more overhead • Software Stack Navigation • Endpoint 0 is the Control -- No USB Traffic running
PCI Utilization (UHCI) • 15 Endpoint Configuration • For low numbers of active endpoints, Host Controller must poll memory for each unused endpoint, causing relatively high utilization. • Removing unused endpoints will lower single endpoint PCI utilization for this configuration.
Conclusions • UHCI Host Controller Driver needs a few tweaks • Need to get Host Controller to start sending packets where it last left off rather than at endpoint 1. • Needs to remove unused endpoints from the list • Performance Recommendations • Use 64 Byte Max Packet Size Endpoints • Large Buffers are better than small buffers • Reduce NAK’d traffic • Fast devices if possible
Future Research Topics • Multiple IRPS per Pipe • USB needs to control throughput to the slow device • Small Endpoints aren’t good • Small Buffers aren’t good • NAKing isn’t good