320 likes | 435 Views
QUERY BY EXCEL. Witkowski, S. Bellamkonda, T. Bozkaya, A. Naimat, L. Sheng, S. Subramanian, A. Waingold. Oracle Corporation. Spreadsheets. Spreadsheets are established analytical tools: Attractive user interface Easy to use computational model Interactivity for what if analysis.
E N D
QUERY BY EXCEL • Witkowski, S. Bellamkonda, T. Bozkaya, • A. Naimat, L. Sheng, • S. Subramanian, A. Waingold Oracle Corporation
Spreadsheets Spreadsheets are established analytical tools: • Attractive user interface • Easy to use computational model • Interactivity for what if analysis But, they do not offer: • Scalability • Parallelization • A unified view of the data model
Our proposal QUERY BY EXCEL (QBX) Combines • Presentational interactive modeling power of Excel (spreadsheet tools) • Computational power and scalability of RDBMS via analytical extensions
QBX – How it works • Analyst builds a model using Excel. The model is translated to SQL and stored in relational views. • Analysts designate areas in Excel as relational sources (RTables). An RTable can be transformed into another RTable using Excel operations corresponding to Outer Join, Selection, Projection and Aggregation. Analyst does not write any SQL during this process. • Analysts write Excel formulas on samples of relational sources that fit in a spreadsheet. The translated SQL works on the whole data for scalability. • Business Reporting tools can access the relational views for consolidation.
Analytic SQL Extensions for QBX • SQL MODEL (Witkowski, et al. Sigmod 2003) • SQL PIVOT (Cunningham, et al. Vldb 2004)
QBX Architecture Excel Analyst Application RDBMS User Excel -> SQL Translation QBX generated SQL Objects RDBMS Interaction & Modeling QBX Database Schema Persistence EXCEL RDBMS
QBX Metadata • Excels( eid, name, owner, ExcelBinary, SQLView) • Cells( eid, sheet, row, col, x, f) For this Excel spreadsheet, we store five rows in Cells Table: C1, C2, C3, D1, D3 • RTables( eid, RTable, sheet, row, col, sample, RTableView, …)
QBX Infrastructure • Interaction and Modeling Component (VBA add on) • Menu interface (QBX) • QBX->Rtables manages Rtables (import, add column, save as relational view..) • QBX->Spreadsheet translates Excel to SQL, saves and loads it. • Persistence Component (VBA add-on) • Translation Component
Excel to SQL Translation • Fix Frame translation • Table Translation • Unified Translation
Fix Frame Translation SELECT sheet, row, col, x FROM cells MODEL DBY (sheet,row,col) MEA (x) RULES AUTOMATIC ORDER ( x[1,1,2] = x[1,1,1] + x[1,2,1], -- B1=A1+A2 x[1,2,2] = x[1,3,1] + 1, -- B2=A3+1 x[1,3,2] = sum(x)[1,1<=row<=3,1] –- B3=sum(A1:A3) );
Fix Frame Translation-VLOOKUP - HLOOKUP We use REFERENCE SQL MODEL: VLOOKUP(key, (<rs,cs>, <re,ce>), col): REFERENCE vlookup_ref ON ( SELECT k.x key, v.x value FROM cells k, cells v WHERE k.col=cs AND v.col=cs+col-1 AND k.row >= rs AND k.row <= re AND v.row=k.row ) DIMENSION BY (key) MEASURES (value)
C3 Fix Frame Translation-VLOOKUP - HLOOKUP EXAMPLE: A3 =Vlookup(C3, A1:B4, 2) SELECT row, col, x FROM cells MODEL REFERENCE vlookup_ref ON (SELECT k.x key,v.x value FROM cells k,cells v WHERE k.col = 1 AND v.col = 2 AND k.row >= 0 AND k.row <= 4 AND v.row = k.row) DIMENSION BY(key) MEASURES(value) MAIN DIMENSION BY (row, col) MEASURES (x) RULES ( x[3,1] = vlookup_ref.value[ x[3,3] ] ); 6
Table Translation • Table Translation creates named, protected regions within Excel named RTables. • We remember associated metadata for RTable regions (PK, PK-FK constraints, etc..) • A direct RTable represents a • An RDBMS table (entire table or sample) • An RDBMS view (Direct Rtables can be created through QBX menu) • A derived RTable represents the result of relational operations on other RTables.
Rtable Example TIME_D(I2:J4) PROD_D(I7:J9) FACT(A2:D10) REGION_D(I12:J14)
Table Translation - Operations • Inter-column calculations Adding new (calculated) column to an Rtable • Projection • Joining of Rtables The closest Excel operation to join is Hlookup/Vlookup, Which is similar to relational OUTER JOIN. Steps: (R1 LEFT OUTER JOIN R2 ON R1.col1=R2.col2) • A new column is added. • The new column is populated with (VLOOKUP(R1.Col1, R2, R2.Col2) • Aggregation
Inter-column Calculations • Computations involving columns of the same row. EX: A1 = B1+D1 MODEL DBY(row,col) MEASURES (x) RULES ( x[ANY, 1] = x[cv(row),2] + x[cv(row),4] )
Table Translation – Join Example E3=VLOOKUP(B3,I8:J9,2) F3=VLOOKUP(C3,I3:J4,2)
Join SQL SELECT f.city, f.prod, f.month, g.state, p.categ, t.year, sale, row_number() over (order by city, prod, month) rn FROM fact f outer join time_d t on f.month = t.month outer join prod_d p on f.prod = p.prod outer join geog_d g on f.city = g.city ORDER BY city NULLS LAST, prod NULLS LAST, month NULLS LAST;
Table Translation- Aggregation Aggregation in Excel can be done through DATA PIVOTTABLE operation. This corresponds to (in RDBMS): • Aggregation via SQL GROUP BY operator • Aggregation via SQL PIVOT operator
Aggregation – SQL GROUP BY SELECT state, year, sum(amt) amt, row_number() over (order by state,year) rn FROM fact f outer join time_d t on f.month = t.month outer join prod_d p on f.prod = p.prod outer join geog_d g on f.city = g.city GROUP BY GROUPING SETS ((state,year),(state)) ORDER BY state NULLS LAST, year NULLS LAST;
Translation of Fix Frame Operations on RTables Excel computation is possible once we map relational data to 2-D form. This is called linearization. • Assignment Linearization • Reference Linearization
Reference Linearization SELECT row, col, x FROM cells MODEL REFERENCE r ON ( SELECT rn, state, time, total FROM RT ) DIMENSION BY (rn) MEASURES (state, time, total) DIMENSION BY (row, col) MEASURES (x) ( x[3, 15] = r.total[1] / r.total[3], -- =N3/N5 x[4, 15] = r.total[2] / r.total[3], -- =N4/N5 x[5, 15] = r.total[3] / r.total[3], -- =N5/N5 x[6, 15] = r.total[4] / r.total[6], -- =N6/N5 x[7, 15] = r.total[5] / r.total[6], -- =N6/N5 x[8, 15] = r.total[6] / r.total[6] -- =N8/N5 );
Relative Referencing to RTables Introducing a new lookup function for referencing values in Rtables: RTLOOKUP(RTREGION, COL, {PKEYS}) O3 = N3/rtlookup(L2:N8,3,L3,NULL) O4 = N3/rtlookup(L2:N8,3,L3,NULL)
Relative Referencing to RTables CREATE VIEW AGG_Q AS SELECT state, year, ratio, sum(amt) amt, row_number() over () (order by state, year) rn FROM fact f outer join time_d t on f.month=t.month outer join prod_d p on f.prod = p.prod outer join geog_d g on f.city = g.city GROUP BY GROUPING SETS ((state,year),(state)) MODEL DBY (state, year) MEA (total, 0 ratio) ( ratio[ANY,ANY]= total[CV(state),CV(year)]/total[CV(state), null] ) ORDER BY state nulls last, year nulls last;
Optimizations • Collapsing of Equivalent Rules • For Loops vs. Existential Form • Existing optimization of SQL Model functionality • Rule pruning • Filter pushdown • Others…
A1=B1+D1 MODEL DBY(row,col) MEA(x) RULES ( x[for row from 1 to 20,1] = x[CV(row),2] + x[CV(row), 4] ); == Optimizations- Collapsing of Rules MODEL DBY(row,col) MEA(x) RULES ( x[1,1] = x[1,2] + x[1,4], x[2,1] = x[2,2] + x[2,4], … x[20,1] = x[20,2] +x[20,4] );
A1=B1+D1 MODEL DBY(row,col) MEA(x) RULES ( x[1<=row<=20,1] = x[CV(row),2] + x[CV(row), 4] ); VS Optimization – For Loops vs Existential Rules MODEL DBY(row,col) MEA(x) RULES ( x[for row from 1 to 20,1] = x[CV(row),2] + x[CV(row), 4] );
Use For Loops Use scan Optimization – For Loops vs Existential Rules
Conclusion • Our goal is to translate Excel computation to SQL so that Business Models built in Excel can be stored and queried in RDBMS. • We proposed translation techniques for expressing Excel computation in RDBMS SQL using new analytic extensions. • We proposed representation techniques for relational data in Excel by using Rtables and described how Excel operations on RTables can be simulated in SQL. • We discussed how this proposed system would fit into our RDBMS SQL execution engine and benefit from all its capabilities and optimizations.
What is ahead? • Excel • Pivoting and advanced filtering turned out to be essential, but a few more relational friendly extensions would go a long way, particularly in simulating joins, window function computations. • SQL • RDBMS SQL needs to be extended to cover the functionality provided in Excel, particularly financial functions.
Q & Q U E S T I O N S A N S W E R S A