340 likes | 496 Views
Locality-Aware Request Distribution in Cluster-based Network Servers. Presented by: Kevin Boos Authors: Vivek S. Pai , Mohit Aron , et al. Rice University ASPLOS 1998 *** Figures adapted from original presentation ***. Time Warp to 1998. Rapid Internet growth Bandwidth limitations
E N D
Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, MohitAron, et al.Rice UniversityASPLOS 1998*** Figures adapted from original presentation ***
Time Warp to 1998 • Rapid Internet growth • Bandwidth limitations • “Cheap” PCs and “fast” LANs • Need for increased throughput
Clustered Servers Front-End Node Back-End Node Client Back-End Node LAN (Switch) Back-End Node Client
Motivation for Change • Weighted Round Robin • Disregards content on back-end nodes • Many cache misses • Limited by disk performance • Pure Locality-Based Distribution • Disregards current load on back-end nodes • Uneven load distribution • Inefficient use of resources
LARD Concepts • Locality-Aware Request Distribution • Goal: improve performance • Higher throughput • Higher cache hit rates • Reduced disk access • Even load distribution + content-based distribution • The best of both algorithms
Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing
Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing
Basic LARD Algorithm • Front-end maps target content to back-end nodes • 1-to-1 mapping • First request for each target is assigned to the least-loaded back-end node • Subsequent requests are distributed to the same back-end node based on target content mapping • Unless overloaded… • Re-assigns target content to a new back-end node
Flow of Basic LARD Front-End A a A a A A Client
Determining Load in Basic LARD • Ask the server? • Introduces unnecessary communication • Current load = number of open connections • Tracked in the front-end node • Use thresholds to determine when to re-balance • Low, High, and Limit • Re-balance when (load > Tlimit) or (load > Thigh and there is a “free” node with load < Tlow)
Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing
LARD Needs Improvement • Only one back-end node per target content • Working set is a single node • Front-end must limit total connections • Still need to increase throughput • One node per content type is unrealistic • …add more back-end nodes?
LARD/R • LARD with Replication • Maps target content to a setof back-end nodes • Working set is several nodes with similar cache content • Sends new requests to least-loaded node in set • Moves nodes to/from sets based on load imbalance • Idle nodes in a low-load set are moved to higher-load set
Flow of LARD/R Front-End A a A a A a A A A Client
LARD Outline • Basic LARD Algorithm • Improvements to LARD • Request Handoff Protocol • Simulation and Results • Prototype Implementation and Testing
Determining Content Type • How do we determine content in the front-end? • Front-end must see network traffic • Standard TCP Assumptions • Requests are small and light • Responses are big and heavy • How do we forward requests?
Potential TCP Solutions • Simple TCP Proxy • Everything must flow through front-end node • Can inspect all incoming content • Cannot respond directly from back-end to client • But front-end can also inspect all outgoing content • Better for persistent connections
TCP Connection Handoff • Front-end connects to client • Inspects content • Forwards request to back-end node • Returned directly back to client from back-end node
LARD Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing
Evaluation Goals • Throughput • Requests/second served by entire cluster • Hit rate • (Requests that hit memory cache) / (total requests) • Underutilization time • Time that a node’s load is ≤ 40% of Tlow
Simulation Model • 300MHz Pentium II • 32MB Memory (cache) • 100Mbps Ethernet • Traces from web servers at Rice and IBM
Simulation Results – Prior Work • Weighted Round Robin • Lowest throughput • Highest cache miss ratio • But lowest idle time • Pure Locality-Based • An increase in nodes decrease in cache miss ratio • But idle time increases (unbalanced load) • Only minor improvement over WRR
Simulation Results – LARD & LARD/R • Throughput ~4x better (8 nodes) • WRR would need nodes with a 10x larger cache size • CPU bound after 8 nodes • Cache miss rate decreases • Only 1% idle time on average
What Affects Performance? • WRR is disk-bound, LARD/R is CPU bound • Increasing CPU speed improves LARD/R, not WRR • Adding more disks improves WRR, not LARD/R • LARD/R shows no improvement if a node has > 2 disks • WRR is not scalable
LARD Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing
Prototype Implementation • One front-end PC • 300MHz Pentium II, 128MB RAM • 6 back-end PCs • 7 client PCs • 166MHz Pentium Pro, 64MB RAM • 100Mb Ethernet, 24-port switch
Evaluation Shortcomings • What influences the results more? • LARD/R protocol? • TCP handoff protocol?
Conclusion • LARD and LARD/R significantly better than WRR • Higher throughput • Better CPU utilization • More frequent cache hits • Reduced disk access • Benefits of Locality-Based and Load-Balanced • Scalable at low cost