1 / 43

A Framework for Relational Interoperability in Federated Databases

A Framework for Relational Interoperability in Federated Databases.  2002 Catharine M. Wyss All Rights Reserved. Talk Overview. Introduction Motivating scenario General problem statement and discussion Technical discussion terminology relational querying (SQL/RA)

star
Download Presentation

A Framework for Relational Interoperability in Federated Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework for Relational Interoperability in Federated Databases  2002 Catharine M. Wyss All Rights Reserved

  2. Talk Overview • Introduction • Motivating scenario • General problem statement and discussion • Technical discussion • terminology • relational querying (SQL/RA) • my extended framework (FISQL/FIRA) • Further issues, future work Catharine M. Wyss

  3. Motivating Scenario • Amart Conglomerated – quarterly review • Average sales by store and clothing department • three midwestern cities: Indianapolis, Milwaukee, Chicago • most likely receive information in tabular or relational form Catharine M. Wyss

  4. Indianapolis Data Catharine M. Wyss

  5. Chicago Data Catharine M. Wyss

  6. Milwaukee Data LincAve FreePk WashSt Catharine M. Wyss

  7. Data Reconciliation • How do we reconcile the data? • re-entry • costly and inefficient • special purpose translation programs • semi-automated • require highly specialized knowledge • must be rewritten when data changes Catharine M. Wyss

  8. relational data tabular data relational data tabular data relational data centralized or summarized database The Data Integration Problem Catharine M. Wyss

  9. Caveats • The data integration problem has many facets • Requires multifaceted solution • very large scale • Other facets • semantic matching (“WashSt” vs. “WashSt”) • networking issues Catharine M. Wyss

  10. Existing Solutions • FIS (Federated Information Systems) • case-by-case solutions • some overarching theories developed • structure of relations assumed known • Query Languages • HiLog, Ross Algebra • SchemaLog, SchemaSQL Catharine M. Wyss

  11. Specific Problem Aspects Chi::AvgSales Indy::Sales Run an Indianapolis summarization program on the Chicago data. Which Indianapolis departments made 40% less than one of their Chicago counterparts? Catharine M. Wyss

  12. The Relational Model • Universal standard for data storage and manipulation • Arose out of seminal work by E.F. Codd in the 1970s • SQL (Structured Query Language) • Oracle, DB2, SQLServer, Access • A database is a set of relations • relation: formalization of table Catharine M. Wyss

  13. Chi::AvgSales Indy::Sales Relation = pathname + schema + tuples + schema + tuples + schema metadata data Terminology Catharine M. Wyss

  14. SQL Which Indianapolis departments made more than $60,000 in this quarter? SELECT Dept FROM Indy::Sales WHERE AvgSales > 60000 Catharine M. Wyss

  15. The Relational Algebra • Operational semantics of SQL • Allows query optimization • manipulation using algebraic identities • Six parametrized operators • Limited to data querying/manipulation • Codd considered metadata querying (1979) Catharine M. Wyss

  16.  ( (Indy::Sales)) Dept AvgSales > 60,000 The Relational Algebra Which Indianapolis departments made more than $60,000 in this quarter? Catharine M. Wyss

  17. Lifecycle of an SQL Query optimization SQL query RA query SELECT ... FROM ... WHERE ... ((......) input database output relation Catharine M. Wyss

  18. Lifecycle of a FISQL Query optimization FISQL query FIRA query SELECT ... FROM ... WHERE ... ((......) output federation input federation Catharine M. Wyss

  19. FISQL Federated Interoperable SQL Which Indianapolis departments made 40% less than one of their Chicago counterparts? SELECT I.Dept FROM Indy::Sales AS I, Chi::AvgSales AS C, Chi::AvgSales -> A WHERE(I.Dept = A) AND (I.AvgSales < (0.6)* P.A) Catharine M. Wyss

  20. EBNF for Core FISQL <query> ::= SELECT <col_decls> INTO <name_term> WITHIN <name_term> FROM <variable_decl> {, <variable_decl>}* [WHERE <condition> {AND <condition>}*] <col_decls> ::= <col_decl> {, <col_decl>}* <col_decl> ::= <name_term> AS <string> | * [DROP <name_term> {, <name_term>}*] <variable_decl> ::= <base_var_decl> | (<query>) <base_var_decl> | (<fed_name>) <base_var_decl> <base_var_decl> ::=  <varname(db)>  <varname(rel)>  <varname(att)> AS <varname(tup)> <condition> ::= (<condition>) | <name_term> <cond_operator> <name_term> <cond_operator> ::= = | != | <= | < | > | >= <name_term> ::= <string> | <varname(meta)> | <varname(tup)>.<varname(meta)> | <varname(tup)>}.<dom_elt> | <varname(tup)>.<varname(tup).<dom_elt> <varname(meta)> ::= <varname(db)> | <varname(rel)> | <varname(att)> <varname(X)> ::= <dom_elt> // X is db, rel, att, or tup <fed_name> ::= <dom_elt> <dom_elt> ::= ( a - z | A - Z | 0 - 9 ) {(a - z | A - Z | 0-9 | - )}* Catharine M. Wyss

  21. FIRA Federated Interoperable RA • gives FISQL an operational semantics • RA is isomorphic to a proper sub-algebra • query optimization techniques apply directly • FISQL lifecycle parallels SQL lifecycle • Based on the relational model • no complex objects • legacy queries easily incorporated Catharine M. Wyss

  22. Terminology II Definition 1.4.1 A tuple is a mapping from dom to dom {} that is  almost everywhere. Definition 1.4.2 A (federated) relation is a pair pathn(R),R where R is a finite set of tuples and pathn(R) is a unique pathname. In addition, asch(R) denotes the active schema of R which is the union of the non-null-mapped domain elements of the tuples in R. Catharine M. Wyss

  23. FIRA: Base Operators • Basic terms are federations: F1, F2, ... • Meta-metadata: D = {di, ri, ai} for iN. Definition 3.1.1 (Lift Operator) Let R be a canonical relation and iN. Then the i-th lift of R, lifti(R) is {metadata(R) × R}. In addition, pathn(lifti(R)) = pathn(R). Note that asch(lifti(R)) = asch(R)  {di, ri, ai}. Catharine M. Wyss

  24. Example lift3(Milw::FreePk) Catharine M. Wyss

  25. FIRA: Relational Sub-Algebra • Federated counterparts for 6 RA operators: • Pathnames remain unchanged • Negative projection: • special counterparts: , ,  Definition (Federated Projection) Let Akdom D for 1kn. Then A1,...An(F) = {A1,...,An(R) | RF}.  . . . Catharine M. Wyss

  26. FIRA: Column Dereference Definition 3.1.6 Let R = pathn(R),R be a relation. Let asch(R) = {A1, ..., An} and Adom D, Xdom , X asch(R). Then AX(R) = {pathn(R), R }, where tR is obtained from tR as follows. X: aj, A1:a1, ..., An: an in case t[A] = Ajasch(R) and t[Aj] = aj t = X:, A1:a1, ..., An: an otherwise Catharine M. Wyss

  27. Example Chi::AvgSales Use a column of data as attributes Catharine M. Wyss

  28. FIRA: Generalized Union Definition 3.1.11 Let F be a federation and iN. Let d,r dom. Thend::ri(F) = { d::r, lifti(R)  }. . RF Catharine M. Wyss

  29. Milw::LincAve Milw::FreePk Milw::WashSt Catharine M. Wyss

  30. FIRA: Partition Operator Definition 3.1.12 (Relational Partition) Let R be a relation having asch(R)  dom = {A1, ..., An}. Let X,Y dom D. Let X,Y(R) = {X:x1, Y:y1, ..., X:xk, Y:yk} where xi xj and yiyj for 1  i,j k. Then PX::Y(R) = { x1::y1, A1,...,An(X=`x1'Y=`y1'(R)) , ..., xk::yk, A1,...,An(X=`xk'Y=`yk'(R))  | xi, yi } Catharine M. Wyss

  31. Example: Distribute Quotas Milw::FreePk Indy::PineSt Chi::CtrSq Catharine M. Wyss

  32. What’s Missing? Catharine M. Wyss

  33. The Indianapolis to Chicago translation involves transforming data to attributes What’s Missing? Chi::AvgSales Indy::Sales Catharine M. Wyss

  34. Data to Column Names Indy::Sales Catharine M. Wyss

  35. Data to Column Names Indy::Sales Catharine M. Wyss

  36. FIRA: Transpose Operator Definition 5.2.1 (Relational Transpose) Let R be a relation and A,BdomD. Then the transpose of A on B in R, denoted AB(R) is given as follows. Let asch(R)  {A,B,A1,...,An}. 1. The active schema of AB(R) is asch(R) {v: v  B(R), v }, and 2. Whenever t = A:a, B:b, A1:a1, ..., An:an R, b, then A:a, B:b, A1:a1, ..., An:an, b:aAB(R). Catharine M. Wyss

  37. Example AvgSalesDept(Indy::Sales) Catharine M. Wyss

  38. Data to Column Names • Optimal Tuple Merge: Can we “merge” tuples with null values to obtain a result that is minimal? • NP-Complete problem • FISQL counterpart to transpose • ON in SELECT clause Catharine M. Wyss

  39. Main Theorem Theorem 5.3.1 Let F be a relational federation. 1. For each FISQL+ON query, Q, there is an equivalent FIRA+ expression Q' such that Q(F) = Q' (F). 2. For each FIRA+ expression, Q', there is an equivalent FISQL+ON query Q such that Q' (F) = Q(F). Catharine M. Wyss

  40. Properties of FISQL/FIRA • General framework • don’t have to know exact structure in advance • Optimization • RA optimization techniques carry over • real time interoperation • FIRA complexity low • Ease of use • one FISQL query to assimilate each university’s data Catharine M. Wyss

  41. Properties of FISQL/FIRA (See table 7.1) Catharine M. Wyss

  42. Future Work • Implementation of FISQL/FIRA (chapter 6) • proceduralize FIRA operations • indirect index • Application domains • schema evolution • support web search engine for relational data • Extend pathnames • Multilevel pathnames • URLs Catharine M. Wyss

  43. Projects • Theory • Algebraic identities • Equivalent relational calculus • (generally) properties of the languages • Implementation • Aggregation (indirect, block, …) • FISQL query optimization • Prototype systems Catharine M. Wyss

More Related