180 likes | 282 Views
View Matching for Outer-Join Views. Paul Larson and Jingren Zhou Microsoft Research. Outline of talk. Motivation Definitions and notation Join-disjunctive normal form View-matching algorithm Experimental results. Outer-join usage. Outer joins can be used for several purposes
E N D
View Matching for Outer-Join Views Paul Larson and Jingren ZhouMicrosoft Research
Outline of talk • Motivation • Definitions and notation • Join-disjunctive normal form • View-matching algorithm • Experimental results VLDB 2005
Outer-join usage Outer joins can be used for several purposes • OLAP queries that preserve tuples from dimension tables • Construct hierarchical views (XML) that preserve objects with no children • Computing negative subqueries (NOT EXISTS, NOT IN, ALL…) SELECT Department.dname FROM Department WHERE NOT EXISTS (SELECT * FROM Employee WHERE Department.dno = Employee.dno) SELECT Department.dname FROM Department LEFT OUTER JOIN Employee ON (Department.dno = Employee.dno) WHERE Employee.dno IS NULL VLDB 2005
Motiving example SELECT ... FROM Part LEFT OUTER JOIN Lineitem ON (p_partkey=l_partkey) ? View V: SELECT * FROM Part LEFT OUTER JOIN (Orders LEFT OUTER JOIN Lineitem ON (o_orderkey=l_orderkey)) ON (p_partkey=l_partkey) NOTE:l_orderkey -> o_orderkey l_partkey -> p_partkey Rewrite:SELECT ... FROM VWHERE p_partkey is not null ? SELECT ... FROM Orders LEFT OUTER JOIN Lineitem ON (o_orderkey=l_orderkey) VLDB 2005
Definitions and notation (1) • Predicate P is strong (null-rejecting) if P is false when any referenced column is null • Outer union R ⊎ S • Schema is union of columns in R and S • Null-extend input tuples and take union VLDB 2005
Definitions and notation (2) • A tuple t1 subsumes as tuple t2 if • t1 agrees with t2 on all non-null columns and • t1 contains fewer null values than t2 • Removal of subsumed tuples T⇓ eliminates all subsumed tuples from T • Minimum union: R ⊕ S = (R ⊎ S)⇓ • Is associative and commutative • Left outer join: T1 o⋈p T2 = (T1 ⋈p T2) ⊕ T1 • Full outer join: T1 o⋈op T2 = (T1 ⋈p T2) ⊕ T1 ⊕ T2 VLDB 2005
Normalize σp1∧p2(P×O×L) σp1(O×L) O P Join-disjunctive normal form Full outer join P2 is null-rejecting σp2(P,O), σp1∧p2(P,O,L); P; O, σp1(O,L); Left outer join o⋈op2 Part o⋈p1 O; σp1(O,L) Orders Lineitem Minimum union σp1∧p2(P×O×L) ⊕ σp1(O×L)⊕O ⊕ P C. Galindo-Legaria, Sigmod, 1994 VLDB 2005
? Matching term by term View V Query 1 Query 2 o⋈p2 o⋈p1 o⋈p2 Part o⋈p1 Part Lineitem Orders Lineitem Foreign-key join from L to O Orders Lineitem σp1(O,L) ⊕ O σp2(P,L) ⊕ P σp1∧p2(P,O,L) ⊕ P VLDB 2005
Outer-join view matching View Step 1: Convert to normal form Step 2: Check containment of terms Step 3: Recover required terms - selects with not-null predicates Step 4: Select desired tuples - apply residual query predicates Step 5: Eliminate subsumed tuples - selects + [duplicate elimination] Step 6: Combine partial results - outer union of partial results - reduce no of scans Query VLDB 2005
Example view and query View V1: Select lok, ln, lq, lp, ok, od, otp, ck, cn, cnkfrom (select * from C where cnk < 10) Cr right outer join ((select * from O where otp > 50 ) Or full outer join (select * from L where lq < 100) Lr on (ok = lok) ) OLj on (ck = ock) Query Q1: Select lok, lq, lp, od, otpfrom (select * from O where otp > 150 ) Or right outer join (select * from L where lq < 100) Lr on (ok = lok) VLDB 2005
Step 1: Convert to normal form V1 = σ[cnk<10 & otp >50 & lq <100 & jp_co & jp_ol] (C,O,L) ⊕ σ[cnk<10 & otp >50 & jp_co] (C,O) ⊕ σ[otp >50 & lq < 100 & jp_ol] (O,L) ⊕σ[otp >50 ] (O) ⊕σ[lq < 100] (L) Q1 = σ[otp >150 & lq < 100 & jp_ol] (O,L) ⊕σ[lq < 100] (L) VLDB 2005
Step 2: Check containment Is every query tuple contained in the view? Test each term: pred(Q) ⇒pred(V) ? (O,L)-term: (otp >150 & lq < 100 & jp_ol) ⇒(otp >50 & lq < 100 & jp_ol) (L)-term: (lq < 100) ⇒ (lq < 100) VLDB 2005
Step 3: Term recovery • (O,L)-term: • Duplicate elimination not required because the (C,O,L) term has the same hub as the (O,L) term • Non-null columns available for O and L • σ[ok ≠ null & ln ≠ null] V1 • (L)-term: • Duplicate elimination not required because the terms (C,O,L) and (O,L) have the same hub as the (L) term • Non-null columns available for L • σ[ln ≠ null] V1 VLDB 2005
Step 4: Select desired tuples • (O,L)-term: • View predicate: (otp > 50 & lq < 100 & jp_ol) • Query predicate: (otp >150 & lq < 100 & jp_ol) • σ[otp > 150]σ[ok ≠ null & ln ≠ null] V1 • (L)-term: • View predicate: (lq < 100) • Query predicate: (lq < 100) • σ[ln ≠ null] V1 VLDB 2005
Step 5: Eliminate subsumed tuples • (O,L)-term: • Maximal term contains no subsumed tuples • σ[otp > 150 & ok ≠ null & ln ≠ null] V1 • (L)-term: • An (L)-tuple may be subsumed by a tuple in the (O,L)-term • Discard all tuples that also satisfy predicate of (O,L)-term • σ[ln ≠ null & ~(otp > 150 & ok ≠ null & ln ≠ null) ] V1 VLDB 2005
Step 6: Combine partial results • Project each term onto required columns • Combine using outer union • Q1 =Πlok, lq, lp, od, otpσ[otp > 150 & ok ≠ null & ln ≠ null] V1⊎Πlok, lq, lp σ[ln ≠ null & ~(otp > 150 & ok ≠ null & ln ≠ null) ] V1 • The two predicates are mutually exclusive • Combine the two scans to produce final substitute expression • Q1 =Πlok, lq, lp, cstmtσ[ln ≠ null ) ] V1 • cstmt = case when ln ≠ null & ~(otp > 150 & ok ≠ null & ln ≠ null) then null, null else od, otp end VLDB 2005
Experimental results Database: TPC-R, 1GB View: σ(C)⊕ σ(C,O) ⊕σ(C,O,L) Query 1: σ(C,O,L) Query 2: σ(C,O) ⊕σ(C,O,L) Query 3: σ(C,O) ⊕σ(C,O,L), two scans, one with duplicate elimination Query 4: σ(C,O), duplicate elimination VLDB 2005
Conclusion • First general view-matching algorithm for outer-join views • Produces efficient substitute expressions • Extended to outer-join views with aggregation (see paper and tech report) • Additional results and proofs in tech report (MSR-TR-2005-78) available at www.research.microsoft.com VLDB 2005