430 likes | 593 Views
A Framework for Relational Interoperability in Federated Databases. 2002 Catharine M. Wyss All Rights Reserved. Talk Overview. Introduction Motivating scenario General problem statement and discussion Technical discussion terminology relational querying (SQL/RA)
E N D
A Framework for Relational Interoperability in Federated Databases 2002 Catharine M. Wyss All Rights Reserved
Talk Overview • Introduction • Motivating scenario • General problem statement and discussion • Technical discussion • terminology • relational querying (SQL/RA) • my extended framework (FISQL/FIRA) • Further issues, future work Catharine M. Wyss
Motivating Scenario • Amart Conglomerated – quarterly review • Average sales by store and clothing department • three midwestern cities: Indianapolis, Milwaukee, Chicago • most likely receive information in tabular or relational form Catharine M. Wyss
Indianapolis Data Catharine M. Wyss
Chicago Data Catharine M. Wyss
Milwaukee Data LincAve FreePk WashSt Catharine M. Wyss
Data Reconciliation • How do we reconcile the data? • re-entry • costly and inefficient • special purpose translation programs • semi-automated • require highly specialized knowledge • must be rewritten when data changes Catharine M. Wyss
relational data tabular data relational data tabular data relational data centralized or summarized database The Data Integration Problem Catharine M. Wyss
Caveats • The data integration problem has many facets • Requires multifaceted solution • very large scale • Other facets • semantic matching (“WashSt” vs. “WashSt”) • networking issues Catharine M. Wyss
Existing Solutions • FIS (Federated Information Systems) • case-by-case solutions • some overarching theories developed • structure of relations assumed known • Query Languages • HiLog, Ross Algebra • SchemaLog, SchemaSQL Catharine M. Wyss
Specific Problem Aspects Chi::AvgSales Indy::Sales Run an Indianapolis summarization program on the Chicago data. Which Indianapolis departments made 40% less than one of their Chicago counterparts? Catharine M. Wyss
The Relational Model • Universal standard for data storage and manipulation • Arose out of seminal work by E.F. Codd in the 1970s • SQL (Structured Query Language) • Oracle, DB2, SQLServer, Access • A database is a set of relations • relation: formalization of table Catharine M. Wyss
Chi::AvgSales Indy::Sales Relation = pathname + schema + tuples + schema + tuples + schema metadata data Terminology Catharine M. Wyss
SQL Which Indianapolis departments made more than $60,000 in this quarter? SELECT Dept FROM Indy::Sales WHERE AvgSales > 60000 Catharine M. Wyss
The Relational Algebra • Operational semantics of SQL • Allows query optimization • manipulation using algebraic identities • Six parametrized operators • Limited to data querying/manipulation • Codd considered metadata querying (1979) Catharine M. Wyss
( (Indy::Sales)) Dept AvgSales > 60,000 The Relational Algebra Which Indianapolis departments made more than $60,000 in this quarter? Catharine M. Wyss
Lifecycle of an SQL Query optimization SQL query RA query SELECT ... FROM ... WHERE ... ((......) input database output relation Catharine M. Wyss
Lifecycle of a FISQL Query optimization FISQL query FIRA query SELECT ... FROM ... WHERE ... ((......) output federation input federation Catharine M. Wyss
FISQL Federated Interoperable SQL Which Indianapolis departments made 40% less than one of their Chicago counterparts? SELECT I.Dept FROM Indy::Sales AS I, Chi::AvgSales AS C, Chi::AvgSales -> A WHERE(I.Dept = A) AND (I.AvgSales < (0.6)* P.A) Catharine M. Wyss
EBNF for Core FISQL <query> ::= SELECT <col_decls> INTO <name_term> WITHIN <name_term> FROM <variable_decl> {, <variable_decl>}* [WHERE <condition> {AND <condition>}*] <col_decls> ::= <col_decl> {, <col_decl>}* <col_decl> ::= <name_term> AS <string> | * [DROP <name_term> {, <name_term>}*] <variable_decl> ::= <base_var_decl> | (<query>) <base_var_decl> | (<fed_name>) <base_var_decl> <base_var_decl> ::= <varname(db)> <varname(rel)> <varname(att)> AS <varname(tup)> <condition> ::= (<condition>) | <name_term> <cond_operator> <name_term> <cond_operator> ::= = | != | <= | < | > | >= <name_term> ::= <string> | <varname(meta)> | <varname(tup)>.<varname(meta)> | <varname(tup)>}.<dom_elt> | <varname(tup)>.<varname(tup).<dom_elt> <varname(meta)> ::= <varname(db)> | <varname(rel)> | <varname(att)> <varname(X)> ::= <dom_elt> // X is db, rel, att, or tup <fed_name> ::= <dom_elt> <dom_elt> ::= ( a - z | A - Z | 0 - 9 ) {(a - z | A - Z | 0-9 | - )}* Catharine M. Wyss
FIRA Federated Interoperable RA • gives FISQL an operational semantics • RA is isomorphic to a proper sub-algebra • query optimization techniques apply directly • FISQL lifecycle parallels SQL lifecycle • Based on the relational model • no complex objects • legacy queries easily incorporated Catharine M. Wyss
Terminology II Definition 1.4.1 A tuple is a mapping from dom to dom {} that is almost everywhere. Definition 1.4.2 A (federated) relation is a pair pathn(R),R where R is a finite set of tuples and pathn(R) is a unique pathname. In addition, asch(R) denotes the active schema of R which is the union of the non-null-mapped domain elements of the tuples in R. Catharine M. Wyss
FIRA: Base Operators • Basic terms are federations: F1, F2, ... • Meta-metadata: D = {di, ri, ai} for iN. Definition 3.1.1 (Lift Operator) Let R be a canonical relation and iN. Then the i-th lift of R, lifti(R) is {metadata(R) × R}. In addition, pathn(lifti(R)) = pathn(R). Note that asch(lifti(R)) = asch(R) {di, ri, ai}. Catharine M. Wyss
Example lift3(Milw::FreePk) Catharine M. Wyss
FIRA: Relational Sub-Algebra • Federated counterparts for 6 RA operators: • Pathnames remain unchanged • Negative projection: • special counterparts: , , Definition (Federated Projection) Let Akdom D for 1kn. Then A1,...An(F) = {A1,...,An(R) | RF}. . . . Catharine M. Wyss
FIRA: Column Dereference Definition 3.1.6 Let R = pathn(R),R be a relation. Let asch(R) = {A1, ..., An} and Adom D, Xdom , X asch(R). Then AX(R) = {pathn(R), R }, where tR is obtained from tR as follows. X: aj, A1:a1, ..., An: an in case t[A] = Ajasch(R) and t[Aj] = aj t = X:, A1:a1, ..., An: an otherwise Catharine M. Wyss
Example Chi::AvgSales Use a column of data as attributes Catharine M. Wyss
FIRA: Generalized Union Definition 3.1.11 Let F be a federation and iN. Let d,r dom. Thend::ri(F) = { d::r, lifti(R) }. . RF Catharine M. Wyss
Milw::LincAve Milw::FreePk Milw::WashSt Catharine M. Wyss
FIRA: Partition Operator Definition 3.1.12 (Relational Partition) Let R be a relation having asch(R) dom = {A1, ..., An}. Let X,Y dom D. Let X,Y(R) = {X:x1, Y:y1, ..., X:xk, Y:yk} where xi xj and yiyj for 1 i,j k. Then PX::Y(R) = { x1::y1, A1,...,An(X=`x1'Y=`y1'(R)) , ..., xk::yk, A1,...,An(X=`xk'Y=`yk'(R)) | xi, yi } Catharine M. Wyss
Example: Distribute Quotas Milw::FreePk Indy::PineSt Chi::CtrSq Catharine M. Wyss
What’s Missing? Catharine M. Wyss
The Indianapolis to Chicago translation involves transforming data to attributes What’s Missing? Chi::AvgSales Indy::Sales Catharine M. Wyss
Data to Column Names Indy::Sales Catharine M. Wyss
Data to Column Names Indy::Sales Catharine M. Wyss
FIRA: Transpose Operator Definition 5.2.1 (Relational Transpose) Let R be a relation and A,BdomD. Then the transpose of A on B in R, denoted AB(R) is given as follows. Let asch(R) {A,B,A1,...,An}. 1. The active schema of AB(R) is asch(R) {v: v B(R), v }, and 2. Whenever t = A:a, B:b, A1:a1, ..., An:an R, b, then A:a, B:b, A1:a1, ..., An:an, b:aAB(R). Catharine M. Wyss
Example AvgSalesDept(Indy::Sales) Catharine M. Wyss
Data to Column Names • Optimal Tuple Merge: Can we “merge” tuples with null values to obtain a result that is minimal? • NP-Complete problem • FISQL counterpart to transpose • ON in SELECT clause Catharine M. Wyss
Main Theorem Theorem 5.3.1 Let F be a relational federation. 1. For each FISQL+ON query, Q, there is an equivalent FIRA+ expression Q' such that Q(F) = Q' (F). 2. For each FIRA+ expression, Q', there is an equivalent FISQL+ON query Q such that Q' (F) = Q(F). Catharine M. Wyss
Properties of FISQL/FIRA • General framework • don’t have to know exact structure in advance • Optimization • RA optimization techniques carry over • real time interoperation • FIRA complexity low • Ease of use • one FISQL query to assimilate each university’s data Catharine M. Wyss
Properties of FISQL/FIRA (See table 7.1) Catharine M. Wyss
Future Work • Implementation of FISQL/FIRA (chapter 6) • proceduralize FIRA operations • indirect index • Application domains • schema evolution • support web search engine for relational data • Extend pathnames • Multilevel pathnames • URLs Catharine M. Wyss
Projects • Theory • Algebraic identities • Equivalent relational calculus • (generally) properties of the languages • Implementation • Aggregation (indirect, block, …) • FISQL query optimization • Prototype systems Catharine M. Wyss