Load Balancing & Update Propagation in Tashkent+

Load Balancing &Update Propagation in Tashkent+ Sameh Elnikety Steven Dropsho Willy Zwaenepoel EPFL

Load Balancing • Traditional: • Optimize for equal load on replicas • MALB (Memory-Aware Load Balancing): • Optimize for in-memory execution

MALB Doubles Throughput 37 X TPC-W Ordering 16 replicas 25 X TPS 12 X 1 X

Update Propagation • Traditional: • Propagate updates everywhere • Update Filtering: • Propagate updates to where they are needed

MALB + UF Triples Throughput 37 X TPC-W Ordering 16 replicas 25 X TPS 12 X 1 X

Contributions • MALB (Memory-Aware Load Balancing): • Optimize for in-memory execution • Update filtering: • Propagate updates to where they are needed • Implementation: • Tashkent+ middleware • Experimental evaluation: • Big performance gain

Background: DB Replication Single DBMS Replica 1 Load Balancer Replica 2 Replica 3

Read Tx Replica 1 Load Balancer Replica 2 T Replica 3

Update Tx Replica 1 Load Balancer Replica 2 T ws ws Replica 3

Traditional Load Balancing Replica 1 Equal load on replicas Load Balancer Replica 2 Replica 3 MALB: (Memory-Aware Load Balancing)Optimize for in-memory execution

How Does MALB Work? 1 2 3 Database 1 2 Workload A → B → 2 3 Mem Memory

Read Data From Disk Replica 1 A → B → 1 2 A, B, A, B Mem 2 3 2 Slow Disk A, B, A, B Least Loaded 1 1 2 3 3 Replica 2 Mem Slow Disk A, B, A, B 1 2 3

Data Fits in Memory Replica 1 A → B → 1 2 A, A, A, A Mem 2 3 1 2 Fast Disk A, B, A, B MALB 1 2 3 Replica 2 Mem Fast 2 3 Memory info? Many tx and replicas? Disk B, B, B, B 1 2 3

Estimate Tx Memory Needs • Exploit tx execution plan • Which tables & indices are accessed • Their access pattern • Linear scan, direct access • Metadata from database • Sizes of tables and indices

Grouping Transactions • Objective: • Construct tx groups that fit together in memory • Bin packing: • Item: tx memory needs • Bin: memory of replica • Heuristic: Best Fit Decreasing • Allocate replicas to tx groups • Adjust for group loads

MALB in Action Replica Group A Memory needs for A, B, C, D, E, F A Disk A B CD E F MALB Replica Group B C B C Disk Group D E F Replica D E F Disk

MALB Summary • Objective: • Optimize for in-memory execution • Method: • Estimate tx memory needs • Construct tx groups • Allocate replicas to tx groups

Update Propagation • Traditional: • Propagate updates everywhere • Update Filtering: • Propagate updates to where they are needed

Update Filtering Example Replica 1 Group A A → B → 1 2 Mem 2 3 1 2 Disk A, B, A, B MALB UF 1 2 3 Replica 2 Mem 2 3 Disk Group B 1 2 3

Update Filtering Example Replica 1 Group A A → B → 1 2 Mem 2 3 1 2 Update table 1 Disk A, B, A, B MALB UF 1 2 3 Update table 3 Replica 2 Mem 2 3 Disk Group B 1 2 3

Update Filtering in Action Update to red table UF Update to green table

Tashkent+ Implementation • Middleware: • Based on Tashkent [EuroSys 2006] • Consistency: • No change in consistency • Our workloads run serializably • GSI (Generalized Snapshot Isolation)

Experimental Evaluation • Compare: • LeastLoaded: optimize for equal load • MALB: optimize for in-memory execution • MALB+UF: propagate updates to where needed • Environment: • Linux cluster running PostgreSQL • Workload: TPC-W Ordering (50% update txs)

MALB Doubles Throughput Read I/O, normalized 37 X TPC-W Ordering 16 replicas 25 X 105% TPS 12 X 1 X

Big Gains with MALB DB Size Run from disk 12% 75% 182% Big 105% 45% 48% Small 29% 4% 0% Run from memory Mem Size Small Big

MALB + UF Triples Throughput 37 X 49% 15 Prop. Updates TPC-W Ordering 16 replicas 25 X TPS 7 12 X 1 X

Filtering Opportunities Ratio MALB+UF / MALB Updates 5% Browsing Mix 50% Ordering Mix

Checkout the Paper • MALB is better than LARD (we explain why) • Methods for grouping transactions • Better be conservative in estimating memory • Dynamic reallocation for MALB • Adjust to workload changes • More workloads • TPC-W browsing & shopping • RUBiS bidding

Conclusions • MALB (Memory-Aware Load Balancing): • Optimize for in-memory execution • Update filtering: • Propagate updates to where they are needed • Implementation: • Tashkent+ middleware • Experimental evaluation: • Big performance gain, achievable in many cases

Thank You • Questions? • MALB (Memory-Aware Load Balancing): • Optimize for in-memory execution • Update filtering: • Propagate updates to where they are needed

Example • Transaction type: • SELECT item-name • FROM shopping-cart • WHERE user = ? Transaction type: A Instances: A1, A2, A3

Transaction Types A: SELECT * FROM address WHERE id=$1 B: INSERT INTO course VALUES ($1, $2) C: UPDATE emp SET sal=$1 WHERE id=$2 D: DELETE FROM bid WHERE id=$1

Transaction Types A: SELECT * FROM address WHERE id=$1 B: INSERT INTO course VALUES ($1, $2) C: UPDATE emp SET sal=$1 WHERE id=$2 D: DELETE FROM bid WHERE id=$1 • A1: SELECT * FROM address WHERE id=10 • A2: SELECT * FROM address WHERE id=20 • A3: SELECT * FROM address WHERE id=76 • A4: SELECT * FROM address WHERE id=78 • A5: SELECT * FROM address WHERE id=97

MALB Grouping • MALB-S • Size • MALB-SC • Size and contents • No double counting of shared tables & indices • MALB-SCAP • Size, contents, and access pattern • Overflow tx types • Each tx type in its own group

Tx Grouping Alternatives

MALB Allocation – Load • Collect CPU and disk utilizations • Maintain smoothed averages • Group load: average across replicas • Example • 3 replicas • Data ( CPU, disk ) = (45, 10), (53, 9), (40, 8) • Group load = (46, 9) • Comparing loads: max( CPU, disk ) • Example • Max(46, 9) = 46%

MALB Allocation – Basic • Move one replica at a time • From least loaded group • To most loaded group • Future load estimates • 2 replicas, 20% load -> 1 replica, 40% • 6 replicas, 25% load -> 5 replicas, 30% • Hysteresis • Prevent oscillations • Trigger re-allocation if load difference > 1.25x

MALB Allocation – Fast • Quickly snap to a good configuration • Solve set of simultaneous equations • Example • Current configuration • Group M, 3 replicas, 70% • Group N, 7 replicas, 10% • Equations • Mrep + Nrep = 10 • (70% * 3) / Mrep = (10% * 7) / Nrep • Fine tune with additional adjustments

Configuration of MALB-SC 

Disk IO Average Disk IO per Transaction

MALB & Update Filtering TPC-W Ordering LeastCon, MALB, MALB+UF MidDB 1.8GB with different memory sizes

MALB & Update Filtering SmallDB 0.7GB BigDB 2.9GB

Dynamic Reallocation

Related Work • Load balancing • LARD, HACC, FLEX • Workload of small static files • Do not estimate working set • Replicated databases • Front-end load balancer • Conflict-aware scheduling [Pedone06] • Reduce conflict and therefore abort rate • Conflict-aware scheduling [Amza03] • Correctness • Partial replication

Related Work • Replicated database systems • J. Gray et al., The Dangers of Replication and a Solution, SIGMOD’96. • C. Amza, Consistent Replication for Scaling Back-end Databases for Dynamic Content Servers, MiddleWare’03. • C. Plattner et al., Ganymed: Scalable Replication for Transactional Web Applications, Middleware’04. • B. Kemme et al., Postgres-R(SI) Middleware based Data Replication providing Snapshot Isolation, SIGMOD’05. • K. Daudjee and K. Salem, Lazy Database Replication with Snapshot Isolation, VLDB’06. • Networked server systems • V. Pai et al, Locality-aware Request distribution in Clustered-based Network Servers, ASPLOS’98.

Load Balancing & Update Propagation in Tashkent+