510 likes | 589 Views
TCP Round Trip Time Analysis in a University Network. Justifying the pursuit of Active Queue Management (AQM) research. Author: Jonathan Thyer September 2004 - March 2005. Disclaimers. This work used to be a thesis… And then reality sunk in…
E N D
TCP Round Trip Time Analysis in a University Network. Justifying the pursuit of Active Queue Management (AQM) research. Author: Jonathan Thyer September 2004 - March 2005
Disclaimers • This work used to be a thesis… • And then reality sunk in… • Just one individual in a talented research community hoping to make a small contribution. • Wonderful family and busy work life. • This really is a mixture between a thesis and project presentation. • Now – have I lowered your expectations enough? 8-)
How do you communicate across the Internet? • Access your local network • Use a well defined communications protocol • Ie: HTTP (the web) • Email • Type in some Internet destination address and away you go! • lunatic@thyer.org • http://www.thyer.org/ • But communications are not as efficient as they could be. • Assertion: The Internet is under-performing!
What is happening under the computer covers? • Your Internet destination name address gets translated into a 32 bit number by a network service called the Domain Name System (DNS) • Your computer initiates communication with the destination Internet address. • Numerous Internet protocol routers, switches, hubs and physical media carry your communications from source to destination and back again.
Communications Protocols • Open Systems Interconnection (OSI) model. • Communication protocols are defined in a layered application programming interface. • Why? Because it is easy to understand and to programmatically implement! • Layer 7: Applications (Web browser, HTTP etc) • Layer 6: Presentation layer (data conversions) • Layer 5: Session establishment (not communication orientated) • Layer 4: Transport protocol (often UDP/TCP) • Layer 3: Internet Protocol (logical addresses) • Layer 2: Data link layer - framing characteristics (often Ethernet) • Layer 1: Physical (radio frequency) characteristics
Data link layer (layer 2) • Data can be sent between local area network devices at layer 2. • Data is broken down into smaller chunks of data called packets. • Different data link transmission protocols can be used. • Ethernet has become the common standard and uses 48-bit (6 bytes) source and destination addresses. • Data link layer communications are confined to local area networks through either point to point or shared media links. • Typically less than 1000 devices in a local area network. (often less than 255)
Internet Protocol (layer 3) • Known as the logical layer (32 bit source/destination addresses) • Number addresses have a system called “Domain Name Service” that converts numbers to names. • Eg: 152.13.2.96 = www.uncg.edu • Data is also transported in packet form but can be routed between multiple local area networks. • A protocol called “Address Resolution Protocol” (ARP) translates IP (layer 3) addresses into layer 2 Ethernet addresses. • ARP is the glue between layer 3 and layer 2.
Computer 1 Computer 2 • C#1 sends ARP request – who has 192.168.1.2? • C#2 replies – thats me and supplies 48-bit addr. • C#1 addresses data to C#2 using the supplied 48-bit address and sends it.
Computer 1 Computer 4 • C#1 knows that C#4 is not in local network. • How? C#1 uses a mathematical masking operation by performing a logical AND operation on the destination IP address. • C#1 sends ARP – who has 192.168.1.254? – router replies with 48-bit address • C#1 sends data to router, router then looks in route tables for destination logical address. • Router sends ARP into destination address – who has 192.168.99.1? C#4 replies – thats me!!!
What is a router? • A router is a device that operates at the OSI logical layer 3. • It knows what to do with data arriving that has logical IP addresses for source and destination. • A router builds routing tables to represent “networks” that are either directly connected or available through a neighboring router. • A router is designed to find the shortest network path between a source network and destination network. • A router often has multiple different physical links connected to it. There are often multiple possible routes to any specific network.
Transport (layer 4) • User Datagram Protocol (UDP) – a stateless and connectionless protocol. • UDP packets get sent directly from source to destination and there is no possible way for the source to know that the data arrives intact. • Transport Control Protocol (TCP) – a stateful and connection oriented protocol. • TCP data is sent in segments. • A positive acknowledgement must be received for each segment sent. • TCP is the majority carrier of traffic on the Internet. • Why? • It is reliable – guaranteed delivery of all data content. • Validated over time and widely implemented. • First proposed in 1981 by John Postel. (RFC-793)
Round Trip Time (RTT) • RTT is the time elapsed between when a TCP data segment is sent and that segments corresponding acknowledgement (ACK) is received. • RTT is an important measure of Internet performance. • RTT directly impacts TCP performance characteristics on end systems. • RTT is impacted by router’s along the communication path.
My Goals • Develop a tool to measure TCP RTT data between the UNCG campus and the Internet • Produce frequency plots of the RTT data collected • Why? Because I had to do something to prove what Shan Suthaharan was telling me! • Try and explain the results. • Build a small network to perform further research within. • The tool developed is called tcpflowstat
Data Collection Setup • NCREN: North Carolina Education and Research Network • UNCG to NCREN link averages about 60 – 80 m/bits per second over time. • Common “port spanning” method used to “copy” all Internet data to collection host • Collection host uses a program called “tcpdpriv” to collect the data. • Collected 100,000,000 packet samples over several days.
Ethical Concerns • “tcpdpriv” does a number of things to change the data while preserving traffic characteristics • Source and Destination addresses are replaced with incrementing 32-bit numbers starting from 0. • TCP port information is replaced with random numbers. • Data content section of packet is discarded. • Packet header is stored to a file in “PCAP” format. • PCAP is a public domain packet header capture format for UNIX systems.
Definition of a TCP Flow • A unique, reliable communication between a source and destination computer using the TCP protocol. • Think of dialing an office phone number, then using an extension number after that. • The phone number would be the destination IP address, then the extension becomes the TCP socket or port number. • A TCP flow is defined as the five-tuple of TCP protocol, source IP address, destination IP address, source TCP port, and destination TCP port. • There can be multiple TCP flows between a source and destination computer.
RTT – How to calculate it! • From research literature, there are three basic calculation methods • Subtract the time difference between the TCP SYN packet and the resulting ACK of that SYN. • Use the change in window size during slow start – calculate the difference between data segment inter-arrival times… Uses a time threshold to determine a flight (burst) of packets. • Use a fluid dynamic view treating traffic at a bits level per unit time. Basis is that when TCP is in congestion avoidance mode, the window size increases by one MSS every RTT.
RTT – with limited resources… • Related research shows that the SYN – SYN/ACK method is a reasonably good estimator of RTT. • Other methods depend on averaging several hundred values per TCP flow of communication. • I had only limited computing power available!
Basic operation of the tcpflowstat program • Open’s a packet capture file • For each packet header in the file • Find a TCP packet • If the packet is a SYN packet, allocate a tcpflow data structure node and use the IP and TCP port addressing as the key item. • If not, search the tcpflow data structure to see if this packet matches an existing flow • If the packet is a SYN-ACK, then calculate time difference and update RTT data value in flow data structure. • If the packet is a FIN or RST packet, then the flow is removed from the data structure and placed in a “completed flow” linked list.
Challenges to overcome • Each TCP flow detected (by seeing a SYN) forces the code to allocate memory for a tcpflow node. • Each TCP packet potentially results in a search of the tcpflow data structure • Data structures must be efficient.
The tcpflow data structure • I chose to use a hash table to implement the data structure. • Hash table size was set to a large prime number not close to a power of 2. • In this case, the number was set to 47,189 • about halfway between 2^15 and 2^16 • This ensures fairly even distribution of hashing keys in the table.
RTT – A word about time! • The UNIX PCAP library code stores packets with a millisecond and nanosecond time resolution. • Time delay may be introduced due to processing the packet header on the data collection host. • The nanosecond portion of the time stamp was multiplied by 1000 and added to the millisecond portion to bring the measurements into a millisecond time unit.
Final processing • When all packets have been read and processed, the final steps are as follows • Sort flow duration data, then calculate the min, max, mean, and median • Sort flow RTT data, then calculate the min, max, mean, and median • Calculate the flow RTT frequency and output an RTT frequency file for gnuplot to process. • Output RTT, duration, and overall statistics to the screen.
Tcpflowstat - code performance • Run on a Pentium III, 1Ghz CPU, and 512 Mb of RAM. • Processed 100 million packets of data in 30 minutes of elapsed time. • Approximately 7 – 10 Gbytes of data on disk to process. • Most time spent in waiting for disk activity, and memory management routines. • UNIX malloc code is notoriously inefficient (linear) especially when using the “free” routines.
Performance cont… • Hash table exhibited collisions linearly from the upper bound. • Collision resolution was implemented through simple linked lists. • “tcpdpriv” sequential numbering of addresses created sequential hash keys (not too bad actually) • UNIX modulus function could be optimized • Large amount of RAM usage due to thousands of parallel TCP flows being processed within any time span. • A multiple indirect hashing approach would be better – ie: break the src/dest IP address down by octet. This is commonly implemented in routers.
Initial Results TCP Flow Duration ------------------------- MIN = 2 ms MAX = 4794030 ms MEAN = 3593 ms MEDIAN = 758 ms ------------------------- TCP Round Trip Time (RTT) Overall Stats ------------------------- -------------------------- MIN = 0 ms TCP = 93455966 packets MAX = 63108 ms UDP = 6290127 packets MEAN = 221 ms ICMP = 192462 packets MEDIAN = 11 ms ------------------------- -------------------------- 802373 TCP flows counted. TOTAL = 100000000 packets ------------------------- --------------------------
Observations on statistical breakdown • Over 90% of traffic is TCP. • A vast majority of flow durations are short (less than 1 second) • Likely due to web transactions which tend to be many and short. • Mean flow duration is higher than the mean. • A fair number of measured flows have longer durations. • Related research confirms that longer duration flows dominate the Internet traffic.
Observations • High frequency of TCP flows exhibiting RTT of less than 5 ms. • Significant percentage of TCP flows with RTT of approx. 20ms. • Peaks and valleys across the distribution • Skepticism thus: • Four more samples were taken over the period of about 1 week.
Possible explanations • Assuming that a router becomes a congestion point, a burst of traffic will cause queue overflow (droptail) • Global TCP congestion control synchronization will occur during queue overflow. • All affected TCP flows will synchronously reduce their Window size by 2. (multiplicative decrease) • Flows deeper in queue will not experience packet drop but will experience delay. • Flow treatment is not equal.
How to achieve a desired result • Only a router along the path knows its own congestion conditions at a point in time. • At high congestion times, we must ensure that there is no congestion control synchronization. • Random packet drop or marking (ECN) is appropriate to force a selection of flows to reduce their window sizes. • ECN is defined in RFC-3168 (borrows two bits from a reserved part of the header) • Queue size must be optimized to that flow delay is minimized.
Random Early Detection (RED) [Floyd/Jacobson] • Two queue thresholds used, min_th and max_th. • When ave size < min_th, no packets marked • When ave size >= min_th, <= max_th, mark packets with probability p where p is a function of ave queue size. • When ave size > max_th, mark all packets
RED is not always sufficient • Sudden congestion can keep queue depth above the maximum threshold. • RED can degenerate into the same behavior as a drop-tail configuration. • Weighted moving average algorithm reacts too slowly to sudden changes.
Queue depth and router buffer sizes • 1994 paper (Villamizar and Song) set the standard router buffer size at • This is a commonly used formula today! • Subsequent paper at SIGCOMM 2004 from researchers at Stanford suggest the more appropriate formula is • Where C is the link capacity and n = no. of flows. • The denominator of this equation represents a variable that must be dynamic. It is the “predictor of congestion” variable. Shan Suthaharan is currently seeking a patent for predictive algorithms that determine this variable.
Why not just reduce buffer size? • Reducing buffer size would likely reduce the incidence of delayed traffic flows. • Buffer overflow would still result in a TCP congestion control synchronization problem. • Still also have the problem of unfair treatment of flows – first come, first serve is not necessarily best. • Poor performance would still result.
Conclusions • Maximum buffer size should be reduced as shown in SIGCOMM’04 paper saves $ and eases hardware design concerns. • Active queue management (AQM) should be used. • A combination of reduced, preferably dynamic, maximum buffer size and AQM should reduce congestion control synchronization and increase fair treatment of different TCP flows. • Implementations should be simple to use; perhaps even be the default configuration. • New methods of active queue management must continue to be researched and developed.
Furthering research efforts • Collect more data, 8 – 12 hour samples would be nice. • Build a test network rather than using a simulator. • Use sources of real traffic as testing environment. • Write a program to completely replay all traffic capture on a specific Internet link. (not easy)
FreeBSD is useful! • O/S has an in-built firewall for matching specific packet flows. • A kernel module called “DummyNet” exists for research use. • DummyNet can be configured to buffer traffic for extra time • Uses mbufs – BSD ring buffer to delay traffic • Danger of overflowing delay buffer • Full source code is freely available and well documented.
Using the end hosts, generate multiple thousands of TCP data streams. • Simple server listener code that generates character data should be sufficient • Lower the link speed at the center of the network to force the routers to buffer traffic • Modify the ALTQ code to implement different active queue management algorithms. • Connect analyzer host and use tcpflowstat to analyze traffic characteristics.