190 likes | 205 Views
Explore research problems related to The Third Manifesto defining relational database systems. Focus on efficient constraint enforcement, semantic query optimization, and more using Datalog. Learn about materialized views, multiple assignment, and transaction repair.
E N D
Adrian Hudnott Warwick Postgraduate Colloquium in Computer Science 2008 Dependency Problems for Future Database Systems:Ingres Project D
Research Problems • The Third Manifesto is the subtitle of a book written by Chris Date and Hugh Darwen that defines what a relational database system should and should not do • The Third Manifesto and related publications do not include any implementation techniques • There is no scalable implementation of any part of The Manifesto currently in existence • There are concerns that some of the demands of The Manifesto cannot be met using extensions of techniques in use by SQL DBMS • When these were catalogued all of the difficult ones had a common theme: constraints • All the problems involve the DBMS efficiently deducing a fact using known properties about the data source • Little or no access to the actual data
Research Problems • Constraint Enforcement • Semantic Query Optimization • Multiple Simultaneous Assignment • Updates to Views • Interface with SQL Databases • Determination of Type (compile- and run-time)
Database Representation Use Datalog (like Prolog) E.g. supplier(s1, smith, 20, london). supplier(s2, jones, 10, paris). supplier(s3, blake, 30, paris). supplier(s4, clarke, 20, london). supplier(s5, adams, 30, athens).
Views Expressed using Datalog rules E.g. supplier_city(C) :- supplier(SN, N, ST, C).
Notation Extensional Database: EDB Intensional Database: IDB Statistics: Stat Constraints (invariant): Inv Assignments: Ri := Ui Queries: Qi(t) Boolean expressions: Bi(t) Cost function: c(exp)
Constraint Enforcement (Prove Safety) Or, alternatively: (Prove Failure) N.B. Minimize access to the database (EDB)
Updates in the Presence of Views • Example insertion: supplier(s6, carter, 30, oxford). • R is “supplier” • U is <previous definition> OR <tuple is the one above> • supplier_city(C) :- supplier(SN, N, ST, C); (SN=s6, N=carter, ST=30, C=oxford).
Materialized Views Materialized View: contents of a view cached to avoid re-calculation. Can be modelled using constraints CREATE TABLE supplier (City PRIMARY KEY REFERENCES supplier_city,... CREATE TABLE supplier_city (City PRIMARY KEY REFERENCES supplier); And vice versa: A constraint is a view that is always empty.
Multiple Assignment: Naive Implementation Example: X := Y, Y := X; Internal representation: atomic { Z := X; X := Y; Y := Z; }
Multiple Assignment .... R’:= U’, R1 := U1, …, Rn := Un; IIF U’ is the first assignment:
Multiple Assignment Idea • Semantically optimize RHS expressions • Place RHS expressions into canonical form and find subexpressions equal to other RHS expressions • Build a dependency graph • Depth first search to find cycles • Break cycles by creating simulated copies • Schedule and execute assignments
Transaction Repair Transaction repair is modifying an update request so that it conforms with the constraints Compensating actions E.g. ON DELETE CASCADE Research on more advanced repairs Generates more than one repair
View Update = Transaction Repair + Disambiguation DELETE FROM supplier_city WHERE City = ‘London’; Transaction repair needed: DELETE FROM supplier WHERE city=‘London’; For more complex cases, disambiguation is needed Default values, rules, etc.
Semantic Optimization (Equivalence) and: (Lower Cost) No access to the data
NULLs are flawed for many reasons Simple example: B OR NOT B is NULL if B is NULL Can consider missing data as free variables Problem is to efficiently determine if a formula is true for all possible values of the free variable Missing Information
Current Progress • Catalogued all known issues requiring investigation • Outlined solutions or literature for the lesser problems • E.g. comparison of large values • Devised formal semantics for type checking Tutorial D programs • Elaborated on the problems requiring original research in the context of dependency • Resolved perceived ambiguities in The Manifesto • Detailed ideas for future development into a solution to the multiple assignment problem • Completed a literature review on constraint checking techniques
Summing up Project D • Essentially about relieving the industry of the burdens that have been imposed by SQL implementations in the past 30 years • We think that the relational model can be properly implemented without SQL’s restrictions and ad hoc additions • … but requires innovative research • Project D is still in very early development • However, some small successes already