SchemaSQL Semantics & Optimization

SchemaSQL Semantics & Optimization Review SQL semantics: select attrList, aggList from tupleVar declaration list where whereConditions having aggConditions Q: databases  relations (fixed I/O schema.) instantiation-based semantics. i: tuple vars in Q  tuples in D, input db.

i may (not) satisfy whereConditions: valid/invalid instantiations. • tuple-assembly_Q(i) – concatenateall i(t)[A] where “t.A” is in attrList. • one o/p tuple per valid instantiation. • may produce duplicates (multiset semantics). • What if aggList is non-empty? • Partition set of valid instantiations into classes: • i ~ j if they agree on all group-by attributes. • compute aggregate value(s) per block of partition. • Blocks/classes may/may not satisfy havingConditions: valid blocks.

For each valid block e, compute aggregate tuple: concatenate agg[i(t)[B] | i in e], where “agg(t.B)” in aggList. • Then append this aggregate tuple with a tuple over the attrList attributes assembled using any instantiation in e. (why does this work?) • That’s a 1 min. summary of SQL semantics (with simplifications). • Same approach carries over to SchemaSQL.

SchemaSQL semantics • See TODS 2001 paper for technical development. • Key issue: what happens when query/view restructures i/p schema? • Need to determine o/p schema first. • Then allocate data into it, putting each piece in the “right” slot. • Need to consider equiv. relations w.r.t. create view variables in addition to group-by (if any)!

SchamaSQL Query Evaluation Architecture Resident SQL engine Answers to local queries Final series of queries FA Input SchemaSQL Query Final answer SchemaSQL Server Optimized local query Optimized local query DBMS1 DBMSn

Query Processing • Phase I: rewrite conditions into CNF; generate instantiation tables for (groups of) vars declared in from clause. • Transform original SchemaSQL query/view into equivalent SQL one against the natural join of above VITs. • Assumption: all relevant meta-data (and stats) stored in a federation system table (FST). Not unlike DBMS catalog tables!

Query Processing – Example select RelC, C.salFloor from univC RelC, univC::RelC C, univD::salInfo D where RelC = D.dept AND C.salFloor > D.technician AND C.category = “technician”

QP – example: Phase I • Populate VIT_RelC(RelC) via: select rel-name as RelC from FST where db-name = “univC” • Schema of VIT_C? • Include related meta-var: RelC. • Include all relevant attributes/attr vars for C: C.salFloor; no need to include C.technician (why?).

QP – example: Phase I • Populate VIT_C(RelC,CsalFloor) by (against univC): select `r1’ as RelC, salFloor as CsalFloor from r1 where category = “technician” UNION … UNION select `rm’ as RelC, salFloor as CsalFloor from rm where category = “technician” where {r1, …, rm} are the bindings for RelC from VIT_RelC.

QP – example: Phase I & II • Populate VIT_D(Ddept, Dtechnician) by (against univD): select dept as Ddept, technician as Dtechnician fromsalFloor why did we include Ddept? • Phase II: Join VIT_RelC(RelC), VIT_C(RelC, CsalFloor), and VIT_D(Ddept, Dtechnician) and impose CsalFloor > Dtechnician. • how does this work? • Other cases: aggregation, dynamic o/p schema, etc. – see paper.

SchemaSQL Semantics &amp; Optimization

SchemaSQL Semantics &amp; Optimization

Presentation Transcript

SchemaSQL Semantics & Optimization

SchemaSQL Semantics & Optimization