250 likes | 345 Views
CPSC 504: Data Management Review of Relational Model 2/2. Laks V.S. Lakshmanan Dept. of CS UBC. Getting at the data – Querying. Relational DBs are queried with SQL . But where did that come from what is the basis for it? Relational DBs can be queried using logic .
E N D
CPSC 504: Data ManagementReview of Relational Model 2/2 Laks V.S. Lakshmanan Dept. of CS UBC
Getting at the data – Querying • Relational DBs are queried with SQL. But where did that come from what is the basis for it? • Relational DBs can be queried using logic. • In fact, we will review some logic-based QLs. • SQL = logic + some practically crucial features like aggregation & nesting.
Logic Query Language(s) • stocks(Ticker, Company), prices(Date, Ticker, Type, Value), indexes(Date, DOW, TSX, S&P). • Find the ticker of “Syncrude Corp.”: • {T.Ticker | stocks(T) & T.Company = “Syncrude Corp.”}. • Find the Tickers of companies, company names, and the corresponding closing prices on those days when DOW was more than 12,000. • {(T.Ticker, T.Company, P.Date, P.Value) | stocks(T) & prices(P) & indexes(I) & T.Ticker=P.Ticker & P.Date=I.Date & I.DOW>=12000 & P.Type=`closing’}.
Logic QL(s) – Tuple Relational Calculus • TRC key features: • Tuple variables (basic unit) • Output tuple assembled from pieces of tuple vars • Conditions imposed as “built-in” predicates • Quantifiers • Quantifier example: Find stocks (tickers) which had a higher closing price than every other company on August 15, 2011. {(T.Ticker) | stocks(T) & (P1)[prices(P1) & T.Ticker=P1.Ticker & P1.Type=`closing’ & P1.Date=2011/08/15 & (P2)[prices(P2) & P2.Date=2011/08/15 & P2.Type=`closing’ P2.Value ≤ P1.Value]]}.
Logic QL – Datalog (in lieu of Domain Relational Calculus) • Rule-based query language. • Syntax similar to DRC. • Supports recursion. • E.g.: Q1: q1(T) stocks(T, `Syncrude Corp.’). Q2: q2(T, C, D, P) stocks(T, C) & prices(D, T, `closing’, P) & indexes(D, DJ, W1, W2) & DJ >= 12000.
Datalog (contd.) • Note the use of variables and constants as predicate arguments. • Database predicates vs. built-in predicates. • Base tables vs. derived tables (aka views). • Rule ::= Head Body. • Head – a DB predicate. • Body – a conjunction of DB and built-in predicates. • Query – a set of rules, defining a query predicate. • Rules need to be safe.
Datalog (contd.) • There is an implicit in front of every rule body. – e.g.? • Can we express at all? • E.g.: Q3: q3(T) stocks(T, C) & bad(T). bad(T1) stocks(T1, C1) & stocks(T2, C2) & prices(2007/08/15, T1, `closing’, V1) & prices(2007/08/15, T2, `closing’, V2) & V2 > V1.
Datalog (contd.) • Datalog can go beyond what we have just seen. • Recursion: e.g., let flights(F, T) denote there is a direct flight from city F to city T. Find all cities you can fly to from Vancouver, possibly in a series of hops. flyTo(X, Y) flights(X, Y). flyTo(X, Y) flights(X, Z) & flyTo(Z, Y). ?– flyTo(`Vancouver’, Y).
Datalog wrap up. • Efficient query answering – esp. when recursion, negation, aggregation(will see shortly), or combos are present. • Powerful QL. • Numerous efficient QP strategies have been developed.
Relational Algebra • RA is based on five simple ops – select, project, Cartesian (aka cross) product, union, minus. • When combined, it makes for a rather powerful QL, equiv. in expressive power, to TRC or Datalog w/o recursion. • You just need efficient algorithms for basic ops and useful macros. • And a query optimizer that chooses the best plan for evaluating a query based on estimated cost, using a cost model.
RA • Select: Company=`Sybcrude Corp.’(stocks) – filter out tuples whose value for Company is `Syncrude Corp.’ • Project: Ticker(stocks) – find all tickers. • Product: stocks x prices – find all combinations of tuples from the two relations. • Union: Ticker(stocks) Ticker(prices). • Minus: Ticker(stocks) Ticker(prices).
RA • Example “macros”: • Join and division – examples. • Other macros: In implementing operators, you want to piggyback when it makes sense: e.g., if we want to compute a Join;select;project cascade, we can do select and project “for free” on the fly, while paying only for joining. • Exercise: Express Q1—Q3 in RA.
SQL (Structured Query Language) • Inspired mostly by TRC. • Ad hoc additions – partly inspired by RA and partly by need. • “Natural join”, “left outer join”, etc. • SUM(Sal), AVG(Height), etc. • Nesting queries inside others. • SQL can also express updates, unlike the “pure” QLs seen so far.
SQL review (contd.) • Q1: select Ticker from stocks where Company=`Syncrude Corp.’ • What is the connection to TRC? • Q2: select S.Ticker, Company, P.Date, Value from stocks S, prices P, indexes I where S.Ticker=P.Ticker AND P.Date=I.Date AND I.DOW>=12000
SQL review (contd.) • Q3: select S.Ticker from stocks S where NOT EXISTS ( select * from stocks S2, prices P1, prices P2 where P1.Date=2007/08/15 AND P2.Date=2007/08/15 AND S.Ticker=P1.Ticker AND S2.Ticker=P2.Ticker AND P1.Value < P2.Value )
SQL review wrap up • Q3 can be expressed more concisely using grouping and aggregation. • Q4: Find the average value of each type of price. select Type, AVG(Value) from prices group by Type
SQL updates • We can explicitly insert a tuple of values into a table. • Can modify select fields of a specific tuple. • Can perform query-driven updates.
SQL DDL • Can define schema. • Can define ICs and triggers.
Intro. to Conjunctive Queries • In datalog, a rule of the form: H B1, ..., Bm. • range-restricted and safe. e.g., p(X,Y) a(X,Z), b(Z,W), c(Z,Y), W>1. In SQL, single block queries w/ no agg or grouping. In RA, SPJ queries. Tableau Queries.
Concurrency control • Supports access by multiple users/processes, while preserving integrity of data. • E.g.: child checking account balance. • father depositing money into account. • Mother making a withdrawal. • Each transaction = read;change; write. • Should be interleaved carefully to prevent incorrect state!
Transactions • Atomicity: either a transaction as a whole succeeds, or fails; nothing part way. • Consistency: only transactions that respect DB’s ICs are allowed. • Isolation: at any time, the schedule of actions (coming from diff. transactions) being performed is serializable, i.e., is equivalent to running them one transaction at a time. • Durability: after a commit, the effect of a trsnsaction persists.
Recovery • From disk failures – done through RAID. • From power failures – done by keeping a detailed log of transactions (actions) performed. Roll back if need be to preserve correct state.
Summing it all up • DBMS – one of the most sophisticated mission-critical software systems. • Real DBMSs – tend to be complex with many components. • Query Optimizer, Transaction Manager, Disk Space Manager – key components. • Based on decades of solid research. • In some ways, RDBMS as a model and as a technology – a gold standard: • For data models. • For software systems.
Further Reading • In addition to the list already seen: • P. Bernstein, V. Hadzilacos, and N. Goodman: Concurrency Control and Recovery in Database Systems. • J. Gray and A. Reuter: Transaction Processing: Concepts and Techniques. • M. Stonebraker and J. Hellerstein: Readings in DB Systems (the red book) – contains several great papers (on CC & Recovery and other topics).