1 / 31

Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers

Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers. Tsung-i (Mark) Huang Jaspal Subhlok University of Houston GAN ’ 05 / May 10, 2005. Outline. Background Problem Description Methodology Experiments and Results Conclusion and Future Works. “Are we there yet?”.

jui
Download Presentation

Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers Tsung-i (Mark) Huang Jaspal Subhlok University of Houston GAN’05 / May 10, 2005

  2. Outline • Background • Problem Description • Methodology • Experiments and Results • Conclusion and Future Works

  3. “Are we there yet?” • When you need Throughput Prediction? • File download:xx minutes left:MS IE vs. Mozilla • Mirror site selection:Knoppix: Florida State Univ. (fsu.edu) or TU Ilmenau, Germany (tu-ilmenau.de) • Resource selection in a grid environment • Cache selection for web content delivery services

  4. Which site will give the best throughput? • Current approaches and tools: • Geographical distance • Ping (ICMP) • Download 512 KBytes (fixed size) – NWS / iperf • Download 10 seconds (fixed duration) - iperf • Last two approaches are most accurate: • How much data to download / How long? • Is “Bandwidth * Delay” the answer? One size fits all? • “All or nothing” – no result is available until the end of transmission

  5. Problem Description • Predicted future throughput can be used in mirror/replica site selection • Predict throughput of a TCP bulk transfer • Single TCP stream • Input: Time Series of (Arrival time, Bytes received) • Output: Predicted future throughput • Make a prediction of future throughput after 10 ~ 100 RTTs • Utilize knowledge of TCP flow patterns • Assume TCP flow patterns will repeat later in the same TCP stream

  6. TCP Flow Patterns • Textbook Examples: (a) Rate Control (b) Congestion Control • In Reality: (c) Rate Control with delay (d) Mixed Congestion Control

  7. Approach to Throughput Prediction • Analyze Time-Series (TS1) of (Arrival Time, Bytes received) to get a meaningful throughput Time-Series • Possible solutions: • Instant throughput: throughput since previous TCP segment • Fixed Interval throughput: avg throughput over a fixed time period • Per RTT throughput: partition using fixed SYN-ACK RTT • Idea: TCP sends a window full of data segments every RTT • Partition Time-Series (TS1 ) with fixed SYN-ACK RTT, and get per RTT Throughput (TS2 ) • Analyze per RTT Throughput Time-Series (TS2 ) to predict future throughput • Compare different prediction methods across all traces

  8. Over 1 GBytes/sec About 220 Bytes/sec TCP Segment Partitioning (1) SYN-ACK RTT = 176 ms per RTT Throughput Fixed Interval of 100 ms Log Scaled 121 KB/sec 40 KB/sec Instant throughput shows wide-range of fluctuation. Fixed Interval throughput shows less fluctuation.

  9. SYN RTT ACK TCP Segment Partitioning (2) • RTT estimation • Use fixed SYN-ACK RTT • Simple and effective • Partition TCP segments into per RTT throughput time series

  10. Flat Exponential Climb Linear Climb Drop points Throughput Prediction (1) • TCP Patterns • Rate Control limited (RC) • Congestion Control limited (CC) • Identify basic elements • Flat regions • Exponential Climb regions • Linear Climb regions • Drop points

  11. Peak of slow start Throughput Prediction (2) • Peak of slow start • Data points up to end of 1st slow start are ignored for prediction • initial slow start does not repeat • RC-based prediction • Use flat regions • CC-based prediction • Use complete CC cycles • Window-based prediction • If no clear pattern observed

  12. Experiments (1) - Setup • Download data files from 290 web sites (Debian/Gentoo mirrors) • Use TCPDUMP to capture receiver’s traffic • Record SYN-ACK RTTs • Include Retransmitted packets (0.09%) • Average file size is 30 MBytes • 461 traces collected at Univ. of Houston • Traces are analyzed using perl scripts

  13. predicted throughput – measured throughput x 100% measured throughput Experiments (2) – Prediction Methods • Prediction methods compared • Moving Average (MA) – avg throughput of previous 10 RTTs • Exponential Weighted Moving Average (EWMA) • Aggregate throughput – average past throughput (same as cumulative average); use this as predicted throughput • TCP Pattern prediction • Average error in predicted future throughput • Cut off at 100% if over, in case measured future throughput is very small

  14. Peak of slow start per RTT throughput Aggregate TCP Pattern Illustration of Prediction (1) Make a prediction for next 200 RTTs: Drop at 27th RTT 25th RTT 40th RTT Window size (in RTTs) Prediction at 25th RTT • Aggregate Throughput Prediction: average throughput • of 0~25 RTTs • TCP Throughput Prediction: average throughput of • 9~25 RTTs (RC-based prediction) Prediction at 40th RTT • TCP Throughput Prediction: using Window-based • prediction after 27th RTTs (a significant drop)

  15. per RTT throughput Aggregate TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern Illustration of Prediction (2) Make a prediction for next 200 RTTs: Window size (in RTTs) Closer to 0, better the prediction. • Avg error against measured future throughput of next 200 RTTs • (for example, at 20th RTT, avg throughput of 21~220 RTTs is used)

  16. per RTT throughput Aggregate TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern Illustration of Prediction (3) Make a prediction for next 200 RTTs: One complete CC cycle Prediction made at 65th RTT using 3 CC complete cycles Closer to 0, better the prediction. Throughput prediction using Congestion-Control based patterns.

  17. per RTT throughput Moving Average EWMA TCP Pattern per RTT throughput Aggregate TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern Results (1) –predict next 200 RTTs at different time 30th RTT • Aggregate is not accurate for small window size (< 30 RTTs) • MA / EWMA generally not as accurate

  18. per RTT throughput Moving Average EWMA TCP Pattern per RTT throughput Aggregate TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern Results (2)– predict at 15th RTT for different time in the future • When only limited data is available, • Aggregate is not accurate • MA performs best; TCP Pattern is close

  19. per RTT throughput Moving Average EWMA TCP Pattern per RTT throughput Aggregate TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern Results (3)– predict at 25th RTT for different time in the future • More data is available, • Aggregate performs better • TCP Pattern performs best;MA is close

  20. per RTT throughput Moving Average EWMA TCP Pattern per RTT throughput Aggregate TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern Results (4)– predict at 50th RTT for different time in the future • Even more data is available, • TCP Pattern best and Aggregate is close • MA now performs worse, due to dynamic of TCP flows

  21. Summary of Results • Aggregate is accurate with sufficient data, not with a few RTTs of data • MA performs very well for a few RTTs of data • EWMA is not a good predictor • TCP Pattern generally performs better or as well as other methods

  22. Summary of Results (table view)

  23. Conclusion and Future Works • TCP-pattern based throughput prediction is as good or better than other methods. • Good predictions within 25 RTTs (or ~ 5 sec). • Patterns observed: 65% Rate Control, few Congestion Control • Methods using Aggregate (e.g. NWS) can not be expected to work well for small test files • What’s next? • Identify more patterns • Add a degree of confidence for each prediction • Multiple TCP streams

  24. That’s all, folks! Thank You!

  25. Supplement Slides

  26. Characteristics of collected traces (1)

  27. Characteristics of collected traces (2) • Classification: one trace presents over 50% “sometype” of patterns.

  28. Some Trace Patterns (300 RTTs) Under-estimated RTT; 100 RTTs

  29. per RTT throughput Aggregate Moving Average EWMA TCP Pattern Results (0.5) –predict next 100 RTTs at different time

  30. per RTT throughput Aggregate Moving Average EWMA TCP Pattern Results (1.5) –predict next 400 RTTs at different time

  31. Bandwidth • Bandwidth: • The amount of data that can be pushed through a link in unit time. Usually measured in bits or bytes per second. • Bottleneck Bandwidth (BB) • Available Bandwidth (AB) • Throughput (T) • T ≤ AB ≤ BB

More Related