1 / 29

Positional Update Handling in Column Stores

Sándor Héman Marcin Zukowski Niels Nes Lefteris Sidirourgos Peter Boncz. Positional Update Handling in Column Stores. Jim Gray, 1981. UPDATE IN PLACE: A Poison Apple?. “..for performance reasons, most disc-based systems have been seduced into updating the data in place.”.

holden
Download Presentation

Positional Update Handling in Column Stores

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SándorHéman Marcin Zukowski Niels Nes Lefteris Sidirourgos Peter Boncz Positional Update Handlingin Column Stores

  2. Jim Gray, 1981 UPDATE IN PLACE:A Poison Apple? “..for performance reasons, most disc-based systems have been seduced into updating the data in place.” 30 years of hardware improvements in sequential/throughput beating random/latency…. in-placelessfeasibleeveryyear. alternative: differentialapproach. In column stores, in-place updating is bynowclearlyinfeasible

  3. Problem: Column Store Updates • I/O proportional to number of attributes • I/O blocks large and compressed • Sometimes even replicated • Read-Optimized  Update-Unfriendly  • Table often kept ordered on sort-key (SK) attributes • Uniform update load  scattered write access

  4. Solution: Differential Structure • Maintain updates (INS/DEL/MOD) in a differential structure • Merge with base table during scan

  5. Solution: Differential Structure • Maintain updates (INS/DEL/MOD) in a differential structure • Merge with base table during scan • Challenges: • Efficiently maintainable data-structure • Minimize Merge impact for read-only queries

  6. Naïve Approach: Delta Tables Base table: inventorySort-Key (SK): [store, prod] • For each table, maintain two update friendly row-store tables: • INS(C1..Cn) • DEL(SK1..SKm) • MOD = DEL + INS Inserts table: INS Deletes table: DEL

  7. Naïve Approach: Delta Tables Base table: inventorySort-Key (SK): [store, prod] • Rewrite table scans:MergeUnion[store,prod](Scan(INS),MergeDiff[store,prod]( Scan(Inventory), Scan(DEL))) Inserts table: INS Deletes table: DEL

  8. Naïve Approach: Delta Tables Actual table: inventorySort-Key (SK): [store, prod] • Rewrite table scans:MergeUnion[store,prod](Scan(INS),MergeDiff[store,prod]( Scan(Inventory), Scan(DEL))) for up-to-date image • Expensive! • I/O to scan SK ‘merge’ columns; also if query does not need SK cols • Each query pays CPU effort to locate the same change positions over and over again

  9. The Idea: Positional Updates • Remember the position of an update rather than its SK values • Merge once at write  Read-Optimized approach • No need to scan SK columns • Scan can skip  less CPU overhead Notation: • TABLEx state of TABLE at time x • SID(t): StableID • Position of tuple t in immutable base TABLE0 Stable • RIDx(t): RowID • Position of visible tuple t at time x VOLATILE! • SID(t) = RID0(t)

  10. SID/RID Example INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20) INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5) TABLE0 TABLE1

  11. SIDs and RIDs • RID(t) = SID(t) + ∆(t) • ∆(t) = #inserts before t – #deletes before t = RID(t) – SID(t) • SID and RID are monotonically increasing  • organize positional updates on SID in a counting B-Tree that keeps track cumulative deltas (∆) • Positional Delta Tree (PDT) • SIDs are stable • Only need to maintain cumulative ∆ on path root  leaf

  12. PDT Example INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20) INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5) TABLE0 SID Insert Value Table ∆ i0 i1 SID SID i2 type type value value

  13. PDT Example DELETE FROM inventory WHERE store = ‘Berlin’ AND prod = ‘table’DELETE FROM inventory WHERE store = ‘Paris’ AND prod = ‘rug’ TABLE1 SID Insert Value Table ∆ i0 i1 SID SID i2 type type value value

  14. PDT Example INSERT INTO inventory VALUES (‘Paris’, ‘rack’, Y, 4) Insert at RID = 5 TABLE2 SID SID Insert Value Table Insert Value Table ∆ ∆ i0 i0 RID 5 > 0 + 2 i1 i1 SID SID SID SID i2 i2 type type type type value value value value

  15. PDT Example INSERT INTO inventory VALUES (‘London’, ‘rack’, Y, 4)INSERT INTO inventory VALUES (‘Berlin’, ‘rack’, Y, 4) SID Separator SIDs Subtree ∆ ∆ ∆ Running ∆ RID Separator RIDs SID SID ∆ ∆ ∆ ∆ RID RID SID SID SID SID type type type type value value value value ∆ ∆ ∆ ∆ RID RID RID RID

  16. Stacking PDTs • Arbitrary number of layers: “deltas on deltas on ..” • RID domain of child PDT = SID domain of parent PDT generalization: • PDT contains all differences in time [lo,hi] lo hi consecutive  t2=t1 t2 t0 t1 t2 t0 PDT vs are PDT PDT PDT PDT PDT PDT PDT t3 t1 t1 t2 t3 PDT PDT PDT PDT PDT PDT Table Table

  17. Stacking PDTs • Arbitrary number of layers: “deltas on deltas on ..” • RID domain of child PDT = SID domain of parent PDT generalization: • PDT contains all differences in time [lo,hi] lo hi consecutive  t2=t1 aligned t2=t0 “same base” t2 t0 PDT vs are PDT t3 t1 PDT PDT Table Table

  18. Stacking PDTs • Arbitrary number of layers: “deltas on deltas on ..” • RID domain of child PDT = SID domain of parent PDT generalization: • PDT contains all differences in time [lo,hi] lo hi consecutive  t2=t1 aligned t2=t0 “same base” overlapping  [t2,t3] overlaps[t0,t1] “uncomparable” / “incompatible” t2 t0 PDT vs are PDT t3 t1 PDT PDT PDT Table Table

  19. Stacking for Isolation • ‘lock’ PDT down for further updates • Immutable read-PDT  BIG: main memory resident • ‘stack’ empty PDT on top • Updateable write-PDT  SMALL: L2 cache resident • Note: PDTs are consecutive • once in a while changes are propagated • Propagate() operation • Requires consecutive PDTs TABLEx Write-PDT Propagate() Read-PDT Read-PDT Stable Table

  20. Snapshot Isolation • Transaction creates snapshot copy of write-PDT • Updates go into trans-PDT • On commit, Propagate() trans-PDT into write-PDT TransactionState Trans PDT Copy Write-PDT Propagate() TABLEx Write-PDT Write-PDT Read-PDT Stable Table

  21. Optimistic Concurrency Control • Two concurrent transactions TransB TransA Trans PDT Trans PDT Copy Write-PDT Copy Write-PDT TABLEx Write-PDT Read-PDT Stable Table

  22. Optimistic Concurrency Control • Two concurrent transactions • A commits before B TransB TransA Trans PDT Trans PDT Copy Write-PDT Copy Write-PDT Propagate() TABLEx Write-PDT Read-PDT Stable Table

  23. Optimistic Concurrency Control • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! • A changed RID enumeration TransB TransA Trans PDT Trans PDT Propagate() Copy Write-PDT TABLEx Write-PDT Read-PDT Stable Table

  24. Optimistic Concurrency Control • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! • A changed RID enumeration • Serialize(A, B) • Makes aligned PDTs consecutive • MAY FAIL!!  trans abort = succeeds if no conflict = write set intersection Consecutive! Trans PDT TransB TransA Trans PDT Trans PDT Serialize() TABLEx Write-PDT Read-PDT Stable Table

  25. Optimistic Concurrency Control • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! • A changed RID enumeration • Serialize(A, B) • Makes aligned PDTs consecutive • MAY FAIL!!  trans abort = succeeds if no conflict = write set intersection TransB TransA Trans PDT Trans PDT Serialize() Propagate() TABLEx Write-PDT Read-PDT Stable Table • Extend to any number of concurrent transactions by serializing • against all PDTs of transactions that committed during its lifetime • (a.k.a. backward looking OCC)

  26. Concluding.. • PDTs speed-up differential update merging • Reduced I/O volume • Reduced CPU merge overhead • Tree structure • logarithmic lookup & maintenance of volatile RIDs • main operations: Merge(), Propagate(), Serialize() • PDTs are stackable, and capture Write-Set • Great structure for Snapshot Isolation • Formal definitions, algorithms and benchmarks in paper

  27. Thank you!

  28. Microbenchmarks

  29. TPCH-30

More Related