410 likes | 637 Views
Asynchronous View Maintenance for VLSD Databases Under the guidance of : Prof. S. Sudarshan. Purva P. Joshi 08305907. Introduction. Large scale shared nothing databases (PNUTS, BigTable, Dynamo, Cassandra) trade query expressiveness for scalability and performance.
E N D
Asynchronous View Maintenance for VLSD DatabasesUnder the guidance of : Prof. S. Sudarshan Purva P. Joshi 08305907
Introduction • Large scale shared nothing databases (PNUTS, BigTable, Dynamo, Cassandra) trade query expressiveness for scalability and performance. • Support primary key operations like: • Range scans • Single record look-ups • Indexes and Materialised Views are the only feasible way for such D/b s to handle : • Equijoins • Aggregate queries • Look-ups on secondary attributes
Motivation for Asynchronous Updates Asynchronous updates: Updates to the view are made “after” updates to the base table. Synchronous updates: Updates to the view are made together with updates to the base table. The data is highly distributed across different geographical areas and hence synchronous updates are extremely expensive and lead to higher query response times.
Asynchronous v/s Synchronous View Maintenance • Advantages of Asynchronous View Maintenance : • Decrease in latency of writing to the database. • Less cross-server communication. • Approach followed : Deferred View Maintenance • Challenges: • Scalability • View must be updated even in presence of failures (aborting an update is not possible) • Efficient view replication across distant data centres
PNUTS : Architectural Overview Query Processor Update Processor LVT Maintainer RVT Maintainer Storage Clients API Read Query Request Storage server Query Processor Query Routers Log Manager Request forwarded to server Partition Controller Storage servers
PNUTS : Architectural Overview Query Processor Update Processor LVT Maintainer RVT Maintainer Storage Clients To remote Data centers asynchronous- -ly API Write Query Request Storage server Query Processor Query Routers Log Manager Request forwarded to server Write-ahead log Partition Controller Storage servers
Publish-Subscribe Model and Distribution Updates are committed when they are published to the YMB YMB propagates the update to different regions and applies to the replicas Steps to ensure no loss occurs before updates are applied to database : Logging to multiple disks on different servers Wide area replication : messages are relayed across YMB clusters in separate geographical areas for delivery to local subscribers.
Data and Query Model • Tables : • Items (ItemId, Name, Category, Description, Price) • Reviews (ReviewId, ItemId, Rating, Text, ReviewerId) • Queries not supported • Join and Group-by aggregate queries against base tables • Consistency • Per-record consistency • No ACID-style guarantee • Range and table scans do not guarantee a consistent snapshot of the data
Record mastery and Record timelines • Record mastery • Per-record master-ship • All updates first applied to master and then propagated to replicas • Master record holds locks and prevents conflicting writes • Log manager delivers updates made to a record to all replicas in appropriate order w.r.t the master record • Record timeline • Each point on timeline identified by a record version • Matser record : latest version
Mechanisms for View Maintenance : RVTs • Remote View Tables (RVTs) • Each view is stored in an independent PNUTS table – separate from the base table (the table on which the view is defined) • Partitioned based on its own key, which is different from the base table key . • So, view records are likely to be stored on different servers than the corresponding base table records. • Maintained “asynchronously” : can be stale • E.g. : CREATE VIEW ByPrice SELECT Price, ItemId, Name, Category FROM Items; • Partitioned on View key : (Price, ItemId)
Maintaining RVTs Query Processor Update Processor LVT Maintainer RVT Maintainer Storage Clients To remote Data centers asynchronous- -ly API Update base table Request Storage server S1 Query Processor Query Routers Log Manager Request forwarded to server Write-ahead log plus “Information necessary to update views (old field values)” RVT Maintainer Partition Controller Storage servers RVT Mainainer of S1 subscribes to the log manager to get updates for records in S1
Mechanisms for View Maintenance : LVTs • Local View Tables (LVTs) • Construct a view over each base table partition • View records are on same partition as corresponding base records. • Maintained “synchronously” : High query cost • E.g. : CREATE VIEW ByPrice AS SELECT Price, count(*) FROM Items GROUP BY Price; • Items of a price range (say, “7000”) will be scattered across partitions. • Need to retrieve LVT records from all such partitions and add them to get the total sum.
Combining RVTs and LVTs • Main application of LVTs : materialize aggregates over RVTs. • E.g. : LVT on RVT RVT on Base table Base Table
View Types • Indexes • Equi-joins • Selections • Group-by-aggregates
View Types : Indexes • Index : Projection and re-ordering of a base table • E.g. RVT : ByPrice(Price, ItemId, Name, Category) • Name, Category : extra attributes • Addition of new record to base table:
View Types : Indexes • Updation of View key attribute: • Updation of non view-key attribute: • Index as LVT : search expensive
View Types : Equi-joins • E.g. : Joining Items and Reviews tables • RVT on two ItemId indexes (one each for tables Items and Reviews), sorted on ItemId • Actual join performed at query time
View Types : Equi-joins • Co-locating records to be joined in the same partitions but deferring the actual join until query time. • Outer-join • Can join three or more tables on the same attribute via view • LVTs are not appropriate : different base table partitions
View Types : Selections Subset of base table records E.g. : CREATE VIEW ELECTRONICITEMS SELECT * FROM Items WHERE Category=‘Electronics’ ; Implemented as RVTs LVTs can be used , but cost is high Not widely used
View Types : Group-by aggregates RVTs : replica of a single record; cannot be used LVTs are useful : synchronously updated Design issue: LVTs on base table LVTs on RVT of base table Maintaining aggregates : synchronous update when base record changes For min (max) queries, scan on partition after update to get new min (max)
Aggregate LVT and Query example RVT on Base table LVT on RVT LVT on Base Base Table
Unsupported View Definitions • Joins of three or more tables on different attributes • Non-co-located records • Joins other than equi-joins • Expensive • Full SQL-99 aggregate functions • Only SQL-92 aggregates : count, min, max, sum, avg • No support for percentile, standard deviation
Design Issues • View Maintenance : by Client or by System • Maintenance by clients possible • Hiding logs : maintenance by system preferred • Updates’ frequency • Synchronous • View update as part of base table transaction - approach followed by LVTs • Lazy • approach followed by RVTs • Batched lazy • Group commit • Periodic view refresh : • high throughput • high staleness • wasted effort
Consistency Model v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 • Review of record-level consistency model • Figure modifed from the original at “PNUTS: Yahoo!’s Hosted Data Serving Platform” slides by : Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon,et al Update($20) Update($15) Insert (“toaster”,$10) ReadAny=$10 ReadLatest=$15 ReadCritical(5)=$20
Maintaining View Consistency • Base-consistency model : timelines of all records are independent • Views : Multiple records are connected to base records • A view record “vr” is depedent on base record “r” on which it is defined, while the base record “r” is incident on “vr” • Indexes, selections, equi-joins : one-to-one • Group-by-aggregates : many-to-one
Maintaining View Consistency v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 • E.g. : Update Review(1,’BAD’,1) Insert Review(1,’GOOD’,5) ReadCritical(v.6) = ‘BAD’ BAD GOOD 5 1
Cost of View Maintenance • For an update to a base record “br” • Indexes : at most two updates (updates to view key) • Equi-joins : similar to indexes • Selections : single view update if selection condition is satisfied/not fulfilled by “br” • Group-by-aggregates : for sum/count, updation needs only the change in the value of aggregate field of “br” and value off grouped attribute • Log record for “br” holds required information
Read Consistency for Views : RVTs • Single-record Reads : one-to-one views • consistency guarantees are same as base table views : ReadAny(vr), ReadCritical(vr,v’), ReadLatest(vr) • No need for separate version numbers for view records • Routing to ensure readlatest returns correct value. • Single-record Reads : many-to-one views • Multiple base records are incident on a single view record • ReadAny : any version of base records • ReadCritical : Specific versions of a certain subset of base records (mentioned via a vector) and ReadAny for all other base records • Easy since base record versions are available in the RVT/base table on which the aggregate LVT is defined • ReadLatest : accessing base table
Read Consistency for Views : Cost • ReadAny : may return ver.0 (if record is absent) • ReadCritical : in case of stale record, read base table master • ReadLatest : • High cost for RVTs • Must access master base table every time • Cheaper than scanning the entire table • LVTs are always up-to-date w.r.t local replicas • LVT-on-RVT : • ReadCritical – cheap • ReadLatest - expensive
Read Consistency for Views : Range Scans • Range scanning over views • Stale records, Missing records • Challenges due to insert and delete during View Maintenance – asynchronous propagation • Insert arrives before delete : a record may appear twice • Delete appears before insert : missing record • Solution • Filter out multiple records that correspond to the same base record • Retain “tombstones” during deletes • Look-up base-table record using key stored in tombstones • Needs garbage collection • Not yet implemented in PNUTS
Evaluation • View Maintenance cost measured on • Latency : reasonable impact • Throughput : decreases • Staleness • Setup : C++ and Linux/FreeBSD • Evaluation of costs • 10 GB data on each server • MySQL buffer pool – 2GB • 90% reads served from cache • Thin views (indexed attribute and record primary key) • I/O bound
Experiment 1: Varying View Type • Need to provide enough capacity to accommodate extra view maintenance work
Experiment 2: Varying Read/Write Workload Latency increases with increase in write percentage
Experiment 3: Varying no. of views • Effect is larger for RVTs than LVTs
Query Evaluation • Index plans • Look-up on secondary attributes • Cost of index scan increases with size of result set • Aggregates • Count • Index scan • LVT on base • LVT on RVT
Query Evaluation : Aggregates • LVT approaches constant across all group sizes • LVT-on-base : most expensive • LVT on RVT : cheapest • Cost of index scan increases with group size
Query Evaluation : Aggregates • Fixed group size : 500 • Index scan and LVT on RVT unaffected by no of partitions • For small partitions, LVT on base beats index scan • LVT on RVT – best strategy
Conclusion Views are essential to enhance query power in distributed systems. RVTs : index, equi-join and selection views LVTs : group-by-aggregate views Deferred view maintenance : easier since it uses existing PNUTs mechanisms for replication and recovery.
References Asynchronous View maintenance for VLSD databases Parag Agrawal,Adam S, Brian C, Utkarsh S, Raghu Ramakrishnan, SIGMOD 2009 “PNUTS: Yahoo!’s Hosted Data Serving Platform” Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon,et al