USB Performance Analysis of Bulk Traffic

USB Performance Analysis of Bulk Traffic Brian Leete brian.a.leete@intel.com

Introduction • Bulk Traffic • Designed for reliable, highly variable data transfer • No guarantees are made in the specification for throughput • Is scheduled last after ISOC, Interrupt, and Control • Throughput is dependant on many factors

Introduction • We will look at Bulk Throughput from the following aspects • Distribution of Throughput for Various Packet Sizes and Endpoints • Low Bandwidth Performance • Small Endpoint Performance • Nak Performance • CPU Utilization • PCI bus Utilization

Test Environment -- Hardware • PII 233 (8522px) with 512 Bytes Cache • Atlanta Motherboard with 440LX (PIX 4A) Chipset • 32 Meg Memory • Symbios OHCI Controller (for OHCI Measurements) • Intel Lava Card as Test Device

Test Environment -- Software • Custom Driver and Application • Test Started by IOCTL • IOCTL allocates static memory structures, submits IRP to USBD • Completion routine resubmits next buffer • All processing done at ring 0, IRQL_DISPATCH

Terminology • A “Packet” is a Single Packet of Data on the Bus. It is determined by Max Packet Size of the Device • Valid numbers are 8, 16, 32, 64 • A Buffer is the amount of data sent to USBD in a Single IRP. • In this presentation buffers range from 8 Bytes to 64K Bytes • Unless otherwise specified, Most Data Taken at 64 Byte Max Packet Size, 15 Endpoints Configured in the System

Host Controller Operation (UHCI)

Oscillations @ 256, 512 ByteBuffers Single Endpoint Throughput Flat Throughput @ 512 and 1024 Byte Buffers Small Buffer Throughput

Small Buffer Throughput • For Buffer Sizes < Max Packet Size • Host Controller sends 1 Buffer per Frame • No Ability to Look Ahead and Schedule Another IRP Even Though Time Remains in the Frame • Why is this?

Interrupt Delay

Single Endpoint Graph • Flat Throughput @ 1024 and 512 Byte Graphs • Single Ended Throughput for 64K Byte Buffers Below Theoretical Max of 1216000 Bytes per Second • Both are explained by Looking at the Number of Packets per Frame

Maximum Packets per Frame

Throughput for Multiple Endpoints 512 Byte Buffers

512 Byte Buffers 1 Endpoint • 8 Packets * 64 Bytes per Packet = 512,000 B/S • 511986 Measured

512 Byte Buffers 2 Endpoints • 16 Packets * 64 Bytes per Packet = 1,024,000 B/S • 1,022,067 B/S Measured • Notice that Interrupt Delay is not a factor here!

512 Byte Buffer -- 3 Endpoints • 24 Packets * 64 Bytes / 2 Frames = 768,000 B/S • 776,211 Measured

High End Throughput 18 PPF VS 17 PPF Single Ended Throughput 900,000 VS 950,000 B/S Flat Throughput @ 512 and 1024 B Buffers Oscillations @ 256 and 512 B buffers Small Buffer Throughput

Minimal Endpoint Configuration

Higher Single Endpoint Throughput 17 VS 15 PPF

Host Controller Operation (UHCI)

Results • We are working with Microsoft to remove unused endpoints from the Host Controller Data Structures

More Endpoints get 18 Packets per Frame Higher Single Endpoint Throughput

Distribution of Throughput across Endpoints

Results • We are working with Microsoft to get the Host Controller driver to start sending packets at the next endpoint rather than starting over at the beginning of the frame.

Limited Bandwidth Operation

Small Endpoint Performance

If you care about throughput…. • Use 64 byte Max Packet Size Endpoints • Use Large Buffers

Nak Performance

45 % Drop in Total Throughput

CPU Utilization

CPU Utilization • Idle process incrementing a counter in main memory • Designed to simulate a heavily CPU bound load • Numbers indicate how much “work” the CPU could accomplish after servicing USB traffic • Higher numbers are better • Small buffers and large numbers of Endpoints take more overhead • Software Stack Navigation • Endpoint 0 is the Control -- No USB Traffic running

PCI Utilization

PCI Utilization (UHCI) • 15 Endpoint Configuration • For low numbers of active endpoints, Host Controller must poll memory for each unused endpoint, causing relatively high utilization. • Removing unused endpoints will lower single endpoint PCI utilization for this configuration.

Conclusions • UHCI Host Controller Driver needs a few tweaks • Need to get Host Controller to start sending packets where it last left off rather than at endpoint 1. • Needs to remove unused endpoints from the list • Performance Recommendations • Use 64 Byte Max Packet Size Endpoints • Large Buffers are better than small buffers • Reduce NAK’d traffic • Fast devices if possible

Future Research Topics • Multiple IRPS per Pipe • USB needs to control throughput to the slow device • Small Endpoints aren’t good • Small Buffers aren’t good • NAKing isn’t good

USB Performance Analysis of Bulk Traffic

USB Performance Analysis of Bulk Traffic

Presentation Transcript

Traffic Flow Analysis

PERFORMANCE ANALYSIS AND BENCHMARKING FOR BULK OPERATIONS

Performance Analysis of Orb

Bulk Transport/Vessicular Traffic

Analysis of sports Performance

Traffic Signal Performance Measures

Analysis of Performance

Analysis of Performance

Traffic Analysis Prevention

Performance Analysis of Processor

Analysis of Performance (AoP)

TRAFFIC DEMAND ANALYSIS

Performance of bulk-heterojunction organic photodetectors

Toward Prevention of Traffic Analysis

Performance Analysis of Processor

W504 – Bulk Analysis

Performance Analysis of Real Traffic Carried with Encrypted Cover Flows

Buy Cheap USB Flash Drives In Bulk

Buy Bulk Web Traffic

Traffic Analysis of Smartphone and Tablet

Analysis of Engine Performance