190 likes | 281 Views
Fine-Grained Replication and Scheduling with Freshness and Correctness Guarantees. F.Akal 1 , C.Türker 1 , H.-J.Schek 1 , Y.Breitbart 2 , T.Grabs 3 , L.Veen 4 1 ETH Zurich, Institute of Information Systems, 8092 Zurich, Switzerland, {akal,tuerker,schek}@inf.ethz.ch
E N D
Fine-Grained Replication and Scheduling with Freshness and Correctness Guarantees F.Akal1, C.Türker1, H.-J.Schek1, Y.Breitbart2, T.Grabs3, L.Veen4 1ETH Zurich, Institute of Information Systems, 8092 Zurich, Switzerland, {akal,tuerker,schek}@inf.ethz.ch 2Kent State University, Department of Computer Science, Kent OH 44240, USA, yuri@cs.kent.edu 3One Microsoft Way, Redmond, WA 98052, USA, grabs@acm.org 4University of Twente, 7500 AE Enschede, The Netherlands, lourens@rainbowdesert.net This work was supported partially by Microsoft 31st International Conference on Very Large Data Bases, Trondheim, Norway, 30 August – 2 September, 2005.
Overview • Introduction and Motivation • Replication in a Database Cluster • Need for a New Replication Scheme • PowerDB Replication (PDBREP) • Overview of The PDBREP Protocol • Freshness Locking • Experimental Evaluations • Conclusions
Introduction • Replication is an essential technique to improve performance of reads when writes are rare • Different approaches have been studied so far • eager replication • synchronization within the same transaction • conventional protocols have drawbacks regarding performance and scalability • Newer protocols reduce these drawbacks by using group communication • lazy replication • decoupled replica maintenance • additional efforts are necessary to guarantee serializable executions • older works focused on performance and correctness, freshness of data was not considered enough • Recently, coordinated replication management proposed within PowerDB project at ETH Zürich addresses freshness issues
OLTP OLAP The PowerDB Approach • Cluster of databases • Cluster of off-the-shelf PCs • Each PC runs a commercially available RDBMS • Fast Ethernet Connection (100 Mbit/s) • Middleware Architecture • Clients access the cluster over the middleware only • Distinguished cluster into two parts • Lazy replication management • Eager from user‘s perspective • The “Scale-out” vision • Adding new nodes for higher performance • More nodes allow to increase parallelism Update andRead-onlyTransactions Clients Coordination Middleware Cluster of DBs
T1 T2 T3 w(a) r(a) r(b) r(c) r(d) a,b,c,d a,b,c,d a,b,c,d a,b,c,d a,b,c,d Decoupled Refresh Transactions The Early PowerDB Approach to Replication : FAS (Freshness Aware Scheduling) Read-only Transactions Update Transactions • Relies on full replication • Read-only transactions execute where they are initiated • Users may specify their freshness needs • How much the accessed data may deviate from up-to-date data • Freshness at the database level • Locks entire database • Read-only sites are maintained by means of decoupled refresh transactions, e.g., on-demand refreshment PowerDB Middleware „I am fine with 2 minutes old data“ „I want fresh data“ Update Sites Read-only Sites
T1 T2 T3 w(a) r(a) r(b) r(b) r(d) a,c b,d a,b,d d a,b,c,d a,b,c,d a,b,c,d a,b,c,d Continuous Update Propagation to Read-only Sites Systems Are Evolving… So Is PowerDB… Read-only Transactions Update Transactions • Customized node groups for certain query types • Requires support for arbitrary physical data design • Read-only transactions may span over many nodes providing query parallelizm • Users may still specify their freshness needs • Freshness at the database object level • Fine-grained locking • Read-only transactions should be served fast even with the higher update rates and freshness requirements • Read-only sites must be kept as up-to-date as possible PowerDB Middleware a,b,c,d Update Sites Read-only Sites
Why There Is Need For A New Replication Protocol? • Distributed executions of read-only transactions might cause non-serializable global schedules • Continuous update propagation must be coordinated with read-only transaction execution • Having arbitrary physical layouts and extending locking to finer granules require more effort to maintain replicas and to execute read-only transactions • Sophisticated replication mechanism is required...
update transactions read-only transactions T4 T2 T3 T1 r(a) r(a) w(a) w(b) w(d) r(c) w(c) w(a) w(b) w(a) w(b) TL1 TL1 T5 T6 Propagation transactions Overview of The PDBREP Protocol Freshness Locks System does not allow propagation transactions to overwrite versions needed by read-only transactions TL1 TL1 TL1 TL2 TL2 TL2 local propagation queue a,b a,c c,d c,d a,b,d Site s2 Site s1 Site s5 Site s3 Site s4 all changes are serialized in a global log SN: 031011 v[a]: 6 v[b]: 8v[c]: 3 v[d]: 1 SN: 031011 v[a]: 6 v[b]: 8v[c]: 3 v[d]: 1 SN: 031010 v[a]: 5 v[b]: 7v[c]: 3 v[d]: 1 TG3: SN: 031013 OP: w(d) TG2: SN: 031012 OP: w(c) TG1: SN: 031011 OP: w(a) w(b) SN: 031013 v[a]: 6 v[b]: 8v[c]: 4 v[d]: 2 Update Counter Vector Broadcast to Read-only Sites • Global log records are continuously being broadcast to read-only sites • They are enqueued in the local propagation queues in their serialization order • Localized log records are applied to the site by using propagation transactions when that site is idle • Continuous broadcasting and update propagation keep the site as up-to-date as possible Global Counter Vector Update Sites Read-only Sites
T4 T8 T7 r(a) r(d) r(c) r(d) r(a) w(c) w(c) w(d) w(d) TL2 TL2 TG3 TG3 T10 T9 Refresh Transactions Overview of The PDBREP Protocol Freshness is not explicitly specified fresh data required update transactions read-only transactions T2 T3 T1 r(a) w(a) w(b) w(d) r(c) w(c) TL2 TL1 TL2 TL2 local propagation queue a,b a,c c,d c,d a,b,d Site s2 Site s1 Site s5 Site s3 Site s4 all changes are serialized in a global log SN: 031013 v[a]: 6 v[b]: 8v[c]: 4 v[d]: 2 SN: 031011 v[a]: 6 v[b]: 8v[c]: 3 v[d]: 1 SN: 031011 v[a]: 6 v[b]: 8v[c]: 3 v[d]: 1 SN: 031013 v[a]: 6 v[b]: 8v[c]: 4 v[d]: 2 SN: 031010 v[a]: 5 v[b]: 7v[c]: 3 v[d]: 1 TG3: SN: 031013 OP: w(d) TG2: SN: 031012 OP: w(c) TG1: SN: 031011 OP: w(a) w(b) SN: 031013 v[a]: 6 v[b]: 8v[c]: 4 v[d]: 2 Update Counter Vector Broadcast to Read-only Sites To ensure correct executions, each read-only transaction determines the version of the objects it reads at its start Global Counter Vector Update Sites Read-only Sites
Data Propagation T3 T2 T1 Refresh Tx Propagation Tx Required Timestamp Freshness Locks • Freshness locks are placed on the objects to ensure that ongoing replica maintenance transactions do not overwrite versions needed by ongoing read-only transactions • Freshness locks keep the objects accessed by read-only transaction at a certain freshness level during the execution of that transaction Propagation stops T3 may continue Propagation stops T2 may continue T2 commits Propagation stops T1 may continue T1 commits While propagation transactions continue, read-only transaction T1 arrives. Then, propagation transactions stop. A refresh transaction is invoked. When T1 commits, it releases its freshness locks. T2 causes another refresh to be invoked.
T2 T2 T3 T3 T1 T1 Scheduling a Read-only Transaction T3 (r3(a), r3(b), TS=7) There are younger site b´s lock upgraded TS becomes 8 T2 (r2(a), r2(b), TS=3) Both sites younger b´s freshness lock upgraded TS becomes 5 T1 (r1(a), r1(b), TS=1) Sites are older Data b a Required Timestamp 1 2 3 4 5 6 7 8 Current freshness freshness lock (request)
Experimental Evaluations • Investigating the influence of continuous update broadcasting and propagations on cluster performance. We considered… • three different … • settings : There are two basic options that we can switch on or off • No-Broadcasting and No-Propagation • Broadcasting and No-Propagation • Broadcasting and Propagation • workloads : 50%, 75% and 100% loaded clusters • 50% loaded cluster means that the cluster is busy with evaluating queries for the half of the experiment duration • freshness • Five freshness levels: 0.6, 0.7, 0,8, 0.9, 1.0, e.g., 1.0 means the freshest • Freshness window of 30 seconds, e.g., 0.0 means 30 second old data • Looking at the scalability of PDBREP • Comparing PDBREP to its predecessor (FAS)
Experimental Setup • Cluster of 64 PCs • 1GHz Pentium III, 256 MB RAM, 2 SCSI Discs, 100MBit Ethernet • SQL Server 2000 running under Windows 2000 Advanced Server • TPCR-R Database with scale factor 1 • ~4.3G together with indexes, 200 updates per second • Node Groups of 4 nodes • Small tables are fully-replicated, the huge ones are partitioned (over order_key) within the NGs.
Average Query Evaluation Times for Different Workloads and Freshness Values • Turning on propagation and/or broadcasting always improves the performance. • The lower the workload is, the higher the gain in perfomance becomes, e.g., improvement is 82% for 50% loaded cluster.
Average Refresh Transaction Size for Different Workloads and Freshness Values • Propagation eliminates the need for refresh transactions except for the maximum freshness requirements and workloads. • This results in query execution times practically independent of the overall workload for the given update rate. • For fully loaded cluster, there is simply no time for propagations except at the beginning and end of transactions, which results in small performance improvement.
Scalability of PDBREP : Query Throughput for Varying Cluster Sizes • PDBREP scales up with the increasing cluster size (Above chart shows the scalability for 50% loaded cluster) • The results for freshness 0.9 and below are virtually identical due to local refresh transactions
PDBREP vs. FAS (freshness aware scheduling) : Relative Query Throughput • For all three workloads, PDBREP performs significantly better than FAS (30%, 72% and 125%) • PDBREP uses partitioning of data while FAS relies on full replication, which results in small refresh transactions • PDBREP allows distributed executions and gains from parallelization
Conclusions • PDBREP respects user demanded freshness requirements and extends the notion of freshness to finer granules of data • PDBREP requires less refresh effort to serve queries due to continuous propagations of updates • PDBREP allows distributed executions of read-only transactions and produces globally correct schedules • PDBREP supports different physical data organization schemes • PDBREP scales even with higher update rates