270 likes | 408 Views
LIVE A lineage-supported, versioned DBMS. Anish Das Sarma Martin Theobald Jennifer Widom. Agenda. ULDB Data Model and the Trio System Uncertainty & Lineage LIVE Data Model (LDM) Uncertainty, Lineage & Versioning Data Modifications Insert/Delete Tuples, Update Values, Update Confidences
E N D
LIVE A lineage-supported, versioned DBMS Anish Das Sarma Martin Theobald Jennifer Widom
Agenda • ULDB Data Model and the Trio System • Uncertainty & Lineage • LIVE Data Model (LDM) • Uncertainty, Lineage & Versioning • Data Modifications • Insert/Delete Tuples, Update Values, Update Confidences • Query Evaluation • Valid-At vs. Snapshot Queries, Interval Computations, Confidence Computations, Complexity • Experiments/Conclusions LIVE - A lineage-supported, versioned DBMS
ULDB Data Model • Different types of uncertainty: • 1. Tuple Alternatives • 2. ‘?’ (Maybe) Annotations • 3. Confidences • Implementation of the ULDB data model: • Trio System • TriQL query language • TrioExplorer browser frontend, trioplus client, API • Enhanced PostgreSQL backend (SPI) • Search for “Stanford Trio” LIVE - A lineage-supported, versioned DBMS
ULDBs – Alternatives • 1. Alternatives:uncertainty about attribute values • 2. ‘?’ (Maybe) Annotations • 3. Confidences Three possible worlds LIVE - A lineage-supported, versioned DBMS
ULDBs – Maybe Annotations • 1. Alternatives • 2.‘?’ (Maybe): uncertainty about tuple presence • 3. Confidences ? Six possible worlds LIVE - A lineage-supported, versioned DBMS
ULDBs – Confidences • 1. Alternatives • 2. ‘?’ (Maybe) Annotations • 3. Confidences: weighted uncertainty ? Six possible worlds, each with a probability LIVE - A lineage-supported, versioned DBMS
ULDBs – Closure Suspects= πperson(Saw ⋈ Drives) CANNOT Does not correctly capture possible worlds in the result! ? ? ? LIVE - A lineage-supported, versioned DBMS
ULDBs – Lineage Suspects= πperson(Saw ⋈ Drives) ? ? ? λ(31) = (11,2)(21,2) λ(32,1) = (11,1)(22,1) ; λ(32,2) = (11,1)(22,2) λ(33) = (11,1)23 LIVE - A lineage-supported, versioned DBMS
ULDBs – Summary Uncertainty-Lineage Databases (ULDBs) • Alternatives • ‘?’ (Maybe) Annotations • Confidences • Lineage • ULDBs are closed and complete LIVE - A lineage-supported, versioned DBMS
Lineage & Confidences • Can exclusively utilize lineage in order to compute the confidence of a result tuple. • #P-complete for general Boolean formulas • Approximation algorithms: Luby-Karp, etc. Select distinct car from Saw; 0.99 λ(21) = (11 12 13) P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5) LIVE - A lineage-supported, versioned DBMS
Versioning (LDM Data Model) • Version intervals for tuples • Contiguous version numbers0,…, • Database has current version vD • Tuples have a validity intervals[s, e] • Valid-At Queries: • Select * from Photo valid-at 2; • Snapshot Queries: • View Photo at 2; • Possible Worlds: • LDM databases encode lists of sets of possible worlds. LIVE - A lineage-supported, versioned DBMS
Data Modifications – Insert • Insert Tuple: • Insert t with version [vD+1,] • commit; Increase vD (2) (1) (2) LIVE - A lineage-supported, versioned DBMS
Data Modifications – Delete • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • commit; Increase vD (3) (2) (1) (2) LIVE - A lineage-supported, versioned DBMS
Data Modifications – Update • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • Update Value: • Set end(t) to vD • Insert t’ with version [vD+1,] • commit; Increase vD (4) (3) (2) (1) (2) (4) LIVE - A lineage-supported, versioned DBMS
Data Modifications – Update • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • Update Value: • Set end(t) to vD • Insert t’ with version [vD+1,] • Update Probability: • Set end(t) to vD • Insert t’=t with probability p’ and version [vD+1,] • commit; Increase vD (4) (3) (2) (1) (2) (4) (5) LIVE - A lineage-supported, versioned DBMS
Data Modifications – Summary • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • Update Value: • Set end(t) to vD • Insert t’ with version [vD+1,] • Update Probability: • Set end(t) to vD • Insert t’=t with probability p’ and version [vD+1,] • Possible worlds: • Updates may create duplicate worlds, which are merged (at any version v). (4) (3) (2) (1) (2) (4) (5) LIVE - A lineage-supported, versioned DBMS
Query Evaluation implementation of Q • 1) Data Computation (regular SQL, including lineage) • 2) Interval Computation (stored procedure) D D + Result operational semantics possible worlds at versions encoding of possible worlds Qon each world D1, D2, …, Dn1 Q(D1), Q(D2), …, Q(Dn) @ (0) @ (0) … D1, D2, …, Dn2 @ (1) … D1, D2, …, Dnv @ (vD) LIVE - A lineage-supported, versioned DBMS
Lineage, Confidences & Versions • Can exclusively utilize lineage in order to compute the confidence of any result tuple. • Can exclusively utilize lineage in order to compute the version interval of any result tuple. LIVE - A lineage-supported, versioned DBMS
Version Interval Computation • Positive Lineage (disjunctions & conjunctions) • In the lineage formulaλ(t) • Replace every tuple t’ by its version interval • Replace every with and every with Select distinct car from Saw; [1,]: 0.99 λ(21) = (11 12 13) P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5) LIVE - A lineage-supported, versioned DBMS
Version & Confidence Computation • Positive Lineage (disjunctions & conjunctions) • In the lineage formulaλ(t) • Replace every tuple t’ by its version interval • Replace every with and every with Select distinct car from Saw; Select distinct car from Saw valid-at 2; [1,] : 0.98 λ(21) = (11 12) P(21) = 1 – (1-0.8) X (1-0.9) LIVE - A lineage-supported, versioned DBMS
Interval Computations & Query Plans • Can decouple interval computation from data computation • Or:push interval computation into query plans only when there is no negation. Select R.A from R EXCEPT ( Select R.A from R EXCEPT Select S.A from S ); Select R.A from R,S Where R.A=S.A; t=(a)[0,10] – t=(a)[5,10] r=(a)[0,10] u=(a)[0,10] – r=(a)[0,10] s=(a)[5,15] r=(a)[0,10] s=(a)[5,15] LIVE - A lineage-supported, versioned DBMS
Complexity Results • Positive Lineage (disjunctions & conjunctions) • Version interval computation • PTIME (linear) • Confidence computation • #P-complete • Arbitrary Lineage(including negation) • Version interval computation • PTIME (linear) if all confidences are known • NP-hard if confidences are not known (need to check for idempotence of negated tuples) • Confidence computation • #P-complete LIVE - A lineage-supported, versioned DBMS
Experiments – Setup • Probabilistic & versioned TPC-H setting • Queries over Lineitem, Orders tables with varying join selectivity from 0.1% to 1% (6,000-60,000 and1,500-15,000 tuples for Lineitem & Orders) • Update 0.1% to 1% of the input data • Assign probabilities within [0,1] uniform-randomly to tuples • Additional indexes for versioning • Two B+-trees on (start, end)and end points of intervals • Rewrite valid-at & snapshot queries using WHERE (start ≤ v ≤ end)predicates LIVE - A lineage-supported, versioned DBMS
Experiments – Results (I) • Join query • Overhead of versioned system vs. non-versioned system (versions not computed) (%) • Join query • Overhead of computing versions (versioned system) LIVE - A lineage-supported, versioned DBMS
Experiments – Results (II) • Join query • Progressive data updates (overwrite multiple times) • Join query • Valid-at queries vs. • full version computation LIVE - A lineage-supported, versioned DBMS
Experiments – Results (III) • Overhead of version computation, different query types (1% data modified) LIVE - A lineage-supported, versioned DBMS
Conclusions • LDMs are closed and complete • Generalizes to full ULDB data model (including value alternatives & maybe (?) annotations) • Can employ lineage also for update propagations • Supports all of INSERT/DELETE/UPDATE with INTERSECT/UNION/EXCEPT set operations Uncertainty Versioning Lineage DBMS LIVE - A lineage-supported, versioned DBMS