310 likes | 433 Views
Sándor Héman Marcin Zukowski Niels Nes Lefteris Sidirourgos Peter Boncz. Positional Update Handling in Column Stores. Jim Gray, 1981. UPDATE IN PLACE: A Poison Apple?. “..for performance reasons, most disc-based systems have been seduced into updating the data in place.”.
E N D
SándorHéman Marcin Zukowski Niels Nes Lefteris Sidirourgos Peter Boncz Positional Update Handlingin Column Stores
Jim Gray, 1981 UPDATE IN PLACE:A Poison Apple? “..for performance reasons, most disc-based systems have been seduced into updating the data in place.” 30 years of hardware improvements in sequential/throughput beating random/latency…. in-placelessfeasibleeveryyear. alternative: differentialapproach. In column stores, in-place updating is bynowclearlyinfeasible
Problem: Column Store Updates • I/O proportional to number of attributes • I/O blocks large and compressed • Sometimes even replicated • Read-Optimized Update-Unfriendly • Table often kept ordered on sort-key (SK) attributes • Uniform update load scattered write access
Solution: Differential Structure • Maintain updates (INS/DEL/MOD) in a differential structure • Merge with base table during scan
Solution: Differential Structure • Maintain updates (INS/DEL/MOD) in a differential structure • Merge with base table during scan • Challenges: • Efficiently maintainable data-structure • Minimize Merge impact for read-only queries
Naïve Approach: Delta Tables Base table: inventorySort-Key (SK): [store, prod] • For each table, maintain two update friendly row-store tables: • INS(C1..Cn) • DEL(SK1..SKm) • MOD = DEL + INS Inserts table: INS Deletes table: DEL
Naïve Approach: Delta Tables Base table: inventorySort-Key (SK): [store, prod] • Rewrite table scans:MergeUnion[store,prod](Scan(INS),MergeDiff[store,prod]( Scan(Inventory), Scan(DEL))) Inserts table: INS Deletes table: DEL
Naïve Approach: Delta Tables Actual table: inventorySort-Key (SK): [store, prod] • Rewrite table scans:MergeUnion[store,prod](Scan(INS),MergeDiff[store,prod]( Scan(Inventory), Scan(DEL))) for up-to-date image • Expensive! • I/O to scan SK ‘merge’ columns; also if query does not need SK cols • Each query pays CPU effort to locate the same change positions over and over again
The Idea: Positional Updates • Remember the position of an update rather than its SK values • Merge once at write Read-Optimized approach • No need to scan SK columns • Scan can skip less CPU overhead Notation: • TABLEx state of TABLE at time x • SID(t): StableID • Position of tuple t in immutable base TABLE0 Stable • RIDx(t): RowID • Position of visible tuple t at time x VOLATILE! • SID(t) = RID0(t)
SID/RID Example INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20) INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5) TABLE0 TABLE1
SIDs and RIDs • RID(t) = SID(t) + ∆(t) • ∆(t) = #inserts before t – #deletes before t = RID(t) – SID(t) • SID and RID are monotonically increasing • organize positional updates on SID in a counting B-Tree that keeps track cumulative deltas (∆) • Positional Delta Tree (PDT) • SIDs are stable • Only need to maintain cumulative ∆ on path root leaf
PDT Example INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20) INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5) TABLE0 SID Insert Value Table ∆ i0 i1 SID SID i2 type type value value
PDT Example DELETE FROM inventory WHERE store = ‘Berlin’ AND prod = ‘table’DELETE FROM inventory WHERE store = ‘Paris’ AND prod = ‘rug’ TABLE1 SID Insert Value Table ∆ i0 i1 SID SID i2 type type value value
PDT Example INSERT INTO inventory VALUES (‘Paris’, ‘rack’, Y, 4) Insert at RID = 5 TABLE2 SID SID Insert Value Table Insert Value Table ∆ ∆ i0 i0 RID 5 > 0 + 2 i1 i1 SID SID SID SID i2 i2 type type type type value value value value
PDT Example INSERT INTO inventory VALUES (‘London’, ‘rack’, Y, 4)INSERT INTO inventory VALUES (‘Berlin’, ‘rack’, Y, 4) SID Separator SIDs Subtree ∆ ∆ ∆ Running ∆ RID Separator RIDs SID SID ∆ ∆ ∆ ∆ RID RID SID SID SID SID type type type type value value value value ∆ ∆ ∆ ∆ RID RID RID RID
Stacking PDTs • Arbitrary number of layers: “deltas on deltas on ..” • RID domain of child PDT = SID domain of parent PDT generalization: • PDT contains all differences in time [lo,hi] lo hi consecutive t2=t1 t2 t0 t1 t2 t0 PDT vs are PDT PDT PDT PDT PDT PDT PDT t3 t1 t1 t2 t3 PDT PDT PDT PDT PDT PDT Table Table
Stacking PDTs • Arbitrary number of layers: “deltas on deltas on ..” • RID domain of child PDT = SID domain of parent PDT generalization: • PDT contains all differences in time [lo,hi] lo hi consecutive t2=t1 aligned t2=t0 “same base” t2 t0 PDT vs are PDT t3 t1 PDT PDT Table Table
Stacking PDTs • Arbitrary number of layers: “deltas on deltas on ..” • RID domain of child PDT = SID domain of parent PDT generalization: • PDT contains all differences in time [lo,hi] lo hi consecutive t2=t1 aligned t2=t0 “same base” overlapping [t2,t3] overlaps[t0,t1] “uncomparable” / “incompatible” t2 t0 PDT vs are PDT t3 t1 PDT PDT PDT Table Table
Stacking for Isolation • ‘lock’ PDT down for further updates • Immutable read-PDT BIG: main memory resident • ‘stack’ empty PDT on top • Updateable write-PDT SMALL: L2 cache resident • Note: PDTs are consecutive • once in a while changes are propagated • Propagate() operation • Requires consecutive PDTs TABLEx Write-PDT Propagate() Read-PDT Read-PDT Stable Table
Snapshot Isolation • Transaction creates snapshot copy of write-PDT • Updates go into trans-PDT • On commit, Propagate() trans-PDT into write-PDT TransactionState Trans PDT Copy Write-PDT Propagate() TABLEx Write-PDT Write-PDT Read-PDT Stable Table
Optimistic Concurrency Control • Two concurrent transactions TransB TransA Trans PDT Trans PDT Copy Write-PDT Copy Write-PDT TABLEx Write-PDT Read-PDT Stable Table
Optimistic Concurrency Control • Two concurrent transactions • A commits before B TransB TransA Trans PDT Trans PDT Copy Write-PDT Copy Write-PDT Propagate() TABLEx Write-PDT Read-PDT Stable Table
Optimistic Concurrency Control • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! • A changed RID enumeration TransB TransA Trans PDT Trans PDT Propagate() Copy Write-PDT TABLEx Write-PDT Read-PDT Stable Table
Optimistic Concurrency Control • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! • A changed RID enumeration • Serialize(A, B) • Makes aligned PDTs consecutive • MAY FAIL!! trans abort = succeeds if no conflict = write set intersection Consecutive! Trans PDT TransB TransA Trans PDT Trans PDT Serialize() TABLEx Write-PDT Read-PDT Stable Table
Optimistic Concurrency Control • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! • A changed RID enumeration • Serialize(A, B) • Makes aligned PDTs consecutive • MAY FAIL!! trans abort = succeeds if no conflict = write set intersection TransB TransA Trans PDT Trans PDT Serialize() Propagate() TABLEx Write-PDT Read-PDT Stable Table • Extend to any number of concurrent transactions by serializing • against all PDTs of transactions that committed during its lifetime • (a.k.a. backward looking OCC)
Concluding.. • PDTs speed-up differential update merging • Reduced I/O volume • Reduced CPU merge overhead • Tree structure • logarithmic lookup & maintenance of volatile RIDs • main operations: Merge(), Propagate(), Serialize() • PDTs are stackable, and capture Write-Set • Great structure for Snapshot Isolation • Formal definitions, algorithms and benchmarks in paper