Positional Update Handling in Column Stores

SándorHéman Marcin Zukowski Niels Nes Lefteris Sidirourgos Peter Boncz Positional Update Handlingin Column Stores

Jim Gray, 1981 UPDATE IN PLACE:A Poison Apple? “..for performance reasons, most disc-based systems have been seduced into updating the data in place.” 30 years of hardware improvements in sequential/throughput beating random/latency…. in-placelessfeasibleeveryyear. alternative: differentialapproach. In column stores, in-place updating is bynowclearlyinfeasible

Problem: Column Store Updates • I/O proportional to number of attributes • I/O blocks large and compressed • Sometimes even replicated • Read-Optimized  Update-Unfriendly  • Table often kept ordered on sort-key (SK) attributes • Uniform update load  scattered write access

Solution: Differential Structure • Maintain updates (INS/DEL/MOD) in a differential structure • Merge with base table during scan

Solution: Differential Structure • Maintain updates (INS/DEL/MOD) in a differential structure • Merge with base table during scan • Challenges: • Efficiently maintainable data-structure • Minimize Merge impact for read-only queries

Naïve Approach: Delta Tables Base table: inventorySort-Key (SK): [store, prod] • For each table, maintain two update friendly row-store tables: • INS(C1..Cn) • DEL(SK1..SKm) • MOD = DEL + INS Inserts table: INS Deletes table: DEL

Naïve Approach: Delta Tables Base table: inventorySort-Key (SK): [store, prod] • Rewrite table scans:MergeUnion[store,prod](Scan(INS),MergeDiff[store,prod]( Scan(Inventory), Scan(DEL))) Inserts table: INS Deletes table: DEL

Naïve Approach: Delta Tables Actual table: inventorySort-Key (SK): [store, prod] • Rewrite table scans:MergeUnion[store,prod](Scan(INS),MergeDiff[store,prod]( Scan(Inventory), Scan(DEL))) for up-to-date image • Expensive! • I/O to scan SK ‘merge’ columns; also if query does not need SK cols • Each query pays CPU effort to locate the same change positions over and over again

The Idea: Positional Updates • Remember the position of an update rather than its SK values • Merge once at write  Read-Optimized approach • No need to scan SK columns • Scan can skip  less CPU overhead Notation: • TABLEx state of TABLE at time x • SID(t): StableID • Position of tuple t in immutable base TABLE0 Stable • RIDx(t): RowID • Position of visible tuple t at time x VOLATILE! • SID(t) = RID0(t)

SID/RID Example INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20) INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5) TABLE0 TABLE1

SIDs and RIDs • RID(t) = SID(t) + ∆(t) • ∆(t) = #inserts before t – #deletes before t = RID(t) – SID(t) • SID and RID are monotonically increasing  • organize positional updates on SID in a counting B-Tree that keeps track cumulative deltas (∆) • Positional Delta Tree (PDT) • SIDs are stable • Only need to maintain cumulative ∆ on path root  leaf

PDT Example INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20) INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5) TABLE0 SID Insert Value Table ∆ i0 i1 SID SID i2 type type value value

PDT Example DELETE FROM inventory WHERE store = ‘Berlin’ AND prod = ‘table’DELETE FROM inventory WHERE store = ‘Paris’ AND prod = ‘rug’ TABLE1 SID Insert Value Table ∆ i0 i1 SID SID i2 type type value value

PDT Example INSERT INTO inventory VALUES (‘Paris’, ‘rack’, Y, 4) Insert at RID = 5 TABLE2 SID SID Insert Value Table Insert Value Table ∆ ∆ i0 i0 RID 5 > 0 + 2 i1 i1 SID SID SID SID i2 i2 type type type type value value value value

PDT Example INSERT INTO inventory VALUES (‘London’, ‘rack’, Y, 4)INSERT INTO inventory VALUES (‘Berlin’, ‘rack’, Y, 4) SID Separator SIDs Subtree ∆ ∆ ∆ Running ∆ RID Separator RIDs SID SID ∆ ∆ ∆ ∆ RID RID SID SID SID SID type type type type value value value value ∆ ∆ ∆ ∆ RID RID RID RID

Stacking PDTs • Arbitrary number of layers: “deltas on deltas on ..” • RID domain of child PDT = SID domain of parent PDT generalization: • PDT contains all differences in time [lo,hi] lo hi consecutive  t2=t1 t2 t0 t1 t2 t0 PDT vs are PDT PDT PDT PDT PDT PDT PDT t3 t1 t1 t2 t3 PDT PDT PDT PDT PDT PDT Table Table

Stacking PDTs • Arbitrary number of layers: “deltas on deltas on ..” • RID domain of child PDT = SID domain of parent PDT generalization: • PDT contains all differences in time [lo,hi] lo hi consecutive  t2=t1 aligned t2=t0 “same base” t2 t0 PDT vs are PDT t3 t1 PDT PDT Table Table

Stacking PDTs • Arbitrary number of layers: “deltas on deltas on ..” • RID domain of child PDT = SID domain of parent PDT generalization: • PDT contains all differences in time [lo,hi] lo hi consecutive  t2=t1 aligned t2=t0 “same base” overlapping  [t2,t3] overlaps[t0,t1] “uncomparable” / “incompatible” t2 t0 PDT vs are PDT t3 t1 PDT PDT PDT Table Table

Stacking for Isolation • ‘lock’ PDT down for further updates • Immutable read-PDT  BIG: main memory resident • ‘stack’ empty PDT on top • Updateable write-PDT  SMALL: L2 cache resident • Note: PDTs are consecutive • once in a while changes are propagated • Propagate() operation • Requires consecutive PDTs TABLEx Write-PDT Propagate() Read-PDT Read-PDT Stable Table

Snapshot Isolation • Transaction creates snapshot copy of write-PDT • Updates go into trans-PDT • On commit, Propagate() trans-PDT into write-PDT TransactionState Trans PDT Copy Write-PDT Propagate() TABLEx Write-PDT Write-PDT Read-PDT Stable Table

Optimistic Concurrency Control • Two concurrent transactions TransB TransA Trans PDT Trans PDT Copy Write-PDT Copy Write-PDT TABLEx Write-PDT Read-PDT Stable Table

Optimistic Concurrency Control • Two concurrent transactions • A commits before B TransB TransA Trans PDT Trans PDT Copy Write-PDT Copy Write-PDT Propagate() TABLEx Write-PDT Read-PDT Stable Table

Optimistic Concurrency Control • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! • A changed RID enumeration TransB TransA Trans PDT Trans PDT Propagate() Copy Write-PDT TABLEx Write-PDT Read-PDT Stable Table

Optimistic Concurrency Control • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! • A changed RID enumeration • Serialize(A, B) • Makes aligned PDTs consecutive • MAY FAIL!!  trans abort = succeeds if no conflict = write set intersection Consecutive! Trans PDT TransB TransA Trans PDT Trans PDT Serialize() TABLEx Write-PDT Read-PDT Stable Table

Optimistic Concurrency Control • Two concurrent transactions • A commits before B • Can not commit B into modified write-PDT! • A changed RID enumeration • Serialize(A, B) • Makes aligned PDTs consecutive • MAY FAIL!!  trans abort = succeeds if no conflict = write set intersection TransB TransA Trans PDT Trans PDT Serialize() Propagate() TABLEx Write-PDT Read-PDT Stable Table • Extend to any number of concurrent transactions by serializing • against all PDTs of transactions that committed during its lifetime • (a.k.a. backward looking OCC)

Concluding.. • PDTs speed-up differential update merging • Reduced I/O volume • Reduced CPU merge overhead • Tree structure • logarithmic lookup & maintenance of volatile RIDs • main operations: Merge(), Propagate(), Serialize() • PDTs are stackable, and capture Write-Set • Great structure for Snapshot Isolation • Formal definitions, algorithms and benchmarks in paper

Thank you!

Microbenchmarks

TPCH-30

Positional Update Handling in Column Stores

Positional Update Handling in Column Stores

Presentation Transcript

Benign Positional Vertigo

4.3 Computation in Positional Systems

MEASURING POSITIONAL CHANGE

Column Theory - Column Strength Curve

Benign Positional Vertigo

Positional Asphyxia

Column-Stores vs. Row-Stores How Different are they Really?

Update: Visionworks Retail Stores Rebrand

Column-Stores vs. Row-Stores How Different are they Really?

Number Bases in Positional Systems

Scaling up analytical queries with column -stores

Positional Number Systems

Column Stores For Wide and Sparse Data

C-Store: How Different are Column-Stores and Row-Stores?

Positional Truth

Computation in Positional Systems

Positional Astronomy

Positional Association Rules

Positional Association Rules

C-Store: RDF Data Management Using Column Stores

Positional Cloning

Commodity Positional Calls