Impact & Control of Persistent Clients

Impact & Control ofPersistent Clients Hani Jamjoom EECS University of Michigan

What is Persistence? “Persistence is the repeated attempts of accessing a service even after network congestion or server overload is detected.” • Examples: • Real users pressing reload button when servers are too slow • TCP retransmission upon packet loss • Basic mechanism to recover from errors

Focus of Talk • Persistence of new connection requests • Flash crowds • Web servers are dominated by short-lived connections • Leaky-bucket-based flow control is inaccurate • Traditional implicit controls do not consider client-perceived delay • Our approach • Considers persistence in underlying traffic • Reduces client-perceived delay by predictive control • Improve rate-based control techniques

Increased Burstiness • Increases aggregate traffic Persistence Changes Arrival Behavior • Reduces effectiveness of applied controls

Flow Example router drop prob. = 0.5 network pipe

Factors Affecting Clients’ Persistence CLIENT ROUTER SERVER Queue Management Browser Parallelism HTML Content Congestion Control (loss recovery) Connection Processing

Connection Establishment • Three-way handshake: • Upon loss packets, retransmit after long timeout • Typical values: 3, 6, 12 sec • Must “correctly” account for retransmissions, not just new connection requests client data SYN SYN-ACK ACK server

Browsers’ Persistence • Experiment Design: • Tested 5 browsers on Windows and Linux • Access 10 random links from the 25 most popular websites (Nielsen Netratings) • Intermediate Linux router collects client statistics

Browser Persistence Results • IE6 (on average) issues 8 parallel connections per page, and maximum of 16. • Each connection is used for 1 or 2 HTTP requests. • Initial burst of connections, followed by average inter-arrival time is approx. 0.25 sec • Slowest connection dominates client-perceived delay

Impact on Network traffic • Re-synchronization of dropped requests • Changes the characteristics of inbound traffic • Increase aggregate traffic

Re-synchronization of Dropped Requests • A dropped burst, with high probability, will be retransmitted and arrive at the same time burst gets retransmitted after 3 sec, gets dropped gain this process repeats until packets timeout burst arrives and gets dropped 6 sec 12 sec

Cause of Re-synchronization • Depends mainly on RTO Retransmission Timeout (RTO) = 3 sec client 1 server client 2 One-way Transit Time 3 sec

Measuring RTO Accuracy • Collect RTO from different machines • Use laptop, a crossover cable, and tcpdump • Try to use machines with different hardware configurations • Total of 48 different machines • Collected 10 sample measurements from each machine • Organized data based on the running OS

RTO Distribution Time (sec)

Inbound Traffic Responds to Shaping Drops cause more traffic which causes more drops Tighter control causes longer delays & increased traffic

Increase of Aggregate Traffic • Assume independence • n transmission attempts    + p+ p2  + … + pn  = (1-pn+1) / (1-p)  only pn+1 requests time out Define as the true impact of control ( p* )

will incur no delay will incur T1 delay will incur T2 delay p0…pn  will time out & incur Tabort delay Estimation of Delay • Split incoming streams into n transmission classes • Assign each transmission class k a drop probability pk • Let Tk be the retransmission time of class k • Rewrite the aggregate arrival rate:    + p0 + p0 p1  + … + p0…pn-1 

Optimization • Connection-establishment latency ELh = (1- p0…pn) RTT + p0 (1-p1) T1 + p0 p1 (1-p2) T2 + … (p0…pn) Tabort • Set p0 = p* and p1 = p2 = … = 1 • Delay is minimized ELh = (1- p*) RTT + (p*) Tabort • Same timeout probability

Persistent Dropping (PD) • Randomly choose (p*) requests and keep dropping them on every retransmission attempt • Advantages: • Minimizes delay • Minimizes aggregate traffic • Reduces correlation between current and future arrivals • Decouples control from connection-establishment delay

Flow Example of PD router drop prob. is 0.5 but time out prob. is 0.125 network pipe

Success Likelihood of Rate-Based Dropping • For simplicity, each drop prob. in all routers is the same • p = 0.05 SERVER Internet CLIENT ROUTER (1-p)m (1-p)2 (1-p) m = 3 p = 0.05 success = 0.857

Success Likelihood of PD • Match effective timeout probability of PD to rate-based dropping • p* = 0.001 (i.e., p = 0.05) SERVER Internet CLIENT ROUTER (1-pn+1)m (1-pn+1)2 (1-pn+1) m = 3 n = 4 p = 0.05 success = 0.999

PD Benefits Applications • Does not require application modification • Does not interfere with high-level application mechanisms • Does not require OS modification • Reduced latency for clients: • HTTP, IIOP, SOAP, RPC, … <buzzword>

Implementing PD • How to choose connections to drop • IP source & destination address (easy) • TCP source & destination ports • (better requires more state) • Using hashing to improve performance h(x1,x2, …, xk) = x1 x2 …  xk K(t) mod R • Use prime multiplier, K(t), to minimize hash collisions

Stateless Implementation 1 allow Incoming packets Hash p* 0 drop

State-based Implementation 1 drop table allow U Incoming packets Hash used p* used 0 drop update

State-based Implementation 1 drop table allow U Incoming packets Hash used p* used 0 drop

Pros and Cons • Disadvantages: • Higher computational and memory requirement • Hash table size  max× timeout • Better fairness: • Drop is based on random selection, not hash mapping • Better adaptation to changing load conditions

Evaluation Setup CLIENT SERVER • Focus on how control affects connection-establishment latency • Abundant bandwidth • All clients access the same page ROUTER Implement different drop policies Eve mimicking IE6 Apache 1.3

user think time New clients arrive independently Emulating Persistent Clients sync point sync point Observed Behavior ON period OFF period

Rate-Based Dropping 56% of requests will timeout 10% and will have 100% longer delays than PD Mean Request Delay

Successful Request Delay

Delay of Successful Visits

What is missing? • Denied requests must time out (implicit notification) • Persistent dropping is a form of implicit admission control • It minimizes delay of admitted requests • Eliminating excess traffic caused by shaping

Solution • Immediately notify clients by sending explicit NACK message • ICMP destination unreachable • RST packet • Need new ICMP reject packet • Predict if packet will be accepted in future • Rejection of out of profile is too aggressive • Reject those packets that will not be accepted not intended for flow control

Abacus Filters

Advantages of Abacus Filter • Eliminate retransmission due to traffic shaping • Incur delay only for clients with high likelihood of being accepted • Can be tuned to balance between increase throughput vs. client perceived latency • Independent of arrival process

Performance Advantage

Cost of Controlling Delay

Conclusions • Client persistence matters • Traditional control mechanisms are ill-suited to deal with client’s persistence • Controls the server more accurately • Reduce delay for successful requests • Explicit notification solves key problem of traffic shaping • Retransmissions are eliminated if likelihood of success is low • Can be implemented efficiently using today’s technology • Hashing based flow selection • Exploit TCP logic to predict future traffic • PD + Abacus Filters effectively control client-perceived delay • 600% improvement in response time for successful connections at the expense of 15% loss in throughput

Thank You… • Hani Jamjoom and Kang G. Shin, “Persistent Dropping: An Efficient Control of Traffic Aggregates,” To appear in SIGCOMM’03 • Hani Jamjoom, Padmanabhan Pillai, and Kang G. Shin, “Re-synchronization and Controllability of Bursty Service Requests,” Submitted to ToN • Hani Jamjoom and Kang G. Shin, “Eve: A Scalable Network Client Emulator,” University of Michigan CSE-TR-478-03

Impact & Control of Persistent Clients