270 likes | 356 Views
Relational-Style XML Query. Taro L. Saito, Shinichi Morishita University of Tokyo June 10 th , SIGMOD 2008 Vancouver, Canada Presented by Sangkeun-Lee Reference slides: www.xerial.org/pub/paper/2008/leo-sigmod2008-slides.pdf. Intelligent Database Systems Lab
E N D
Relational-Style XML Query Taro L. Saito, Shinichi Morishita University of Tokyo June 10th, SIGMOD 2008 Vancouver, Canada Presented by Sangkeun-Lee Reference slides: www.xerial.org/pub/paper/2008/leo-sigmod2008-slides.pdf Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea
If your Manager Says… It’s a tragedy… Center for E-Business Technology
Migration to XML Database • Benefits of using XML • XML is a portable text-data format • Tree-structured XML can reduce redundancy of relational data Co Emp Emp e1 e2 Relational Data Office Office NY NY XML Data Center for E-Business Technology
Problem • Querying relational data translated into XML • Q: Retrieve a node tuple (Co, Emp, Office) from the XML data • E.g. Xpath, a path expression query/Co/Emp/Office Co Emp Emp e1 e2 Relational Data Office Office NY NY XML Data Center for E-Business Technology
Structure Variations • Tree-representation of relational data is not unique Co Emp Emp Office Co e1 e2 NY Office Office Office Co Emp Emp NY NY NY e1 e2 Emp Emp e1 e2 Center for E-Business Technology
Inconvenience of Xpath Query • User must know the entire XML structures to produce correct path queries Co Emp Emp Office Co e1 e2 NY Office Office Office Co Emp Emp NY NY NY e1 e2 Emp Emp e2 /Co/Office/Emp //Office[Co]/Emp //Co/Emp[Office] e1 Center for E-Business Technology
Relational-Style XML Query • Query relations in XML • With an SQL-like syntax • SELECT Co, Emp, Office from (XML Data) Co Emp Emp Office Input XML Co e1 e2 NY Office Office Office Co Emp Emp e2 NY NY NY Result e1 e2 Emp Emp e1 Center for E-Business Technology
To Retrieve Relations in XML Center for E-Business Technology
Problem Definition • Convert an SQL query, SELECT A,B,C, into an XML structure query • There can be many structural variations of (A,B,C) • For N nodes, there exists N^(N-1) structural variations • 3^2 = 9 … Center for E-Business Technology
An Example This example involves various tree structures that denote data in the same relation. Center for E-Business Technology
Amoeba • A node tuple (A,B,C) is an amoeba iff one of the A,B and C is a common ancestor of the others • Amoeba join retrieves all amoeba structures in the XML data … Center for E-Business Technology
Relation in XML • A Key observation • Relation is simply embedded in XML Co Co Office Office NY Emp Emp NY Co Emp Emp e1 e2 Emp Emp Office Office e1 e2 e1 e2 NY NY /Co/Office/Emp //Office[Co]/Emp //Co/Emp[Office] Center for E-Business Technology
Hidden Semantics in XML • Some amoeba structure may not form a relation • Why this structure is not allowed? • Because there are functional dependencies (FD) implied in the XML structure Company 1 Company M Office Office Office 1 N Employee Emp Emp Emp Emp Center for E-Business Technology
Functional Dependencies (FD) • FD: X->Y (From a given X,Y is uniquely determined) • Employee -> office (Each employee belongs to an office) • Office -> company (Each office belongs to a company) • Relation in XML must have an amoeba structure corresponding to each FD • Relations and FDs are sufficient to describe a schema of XML Company 1 M Office Company Invalid structure!! 1 N Office Office Employee Emp Emp Emp Emp Center for E-Business Technology
Examples of FDs Center for E-Business Technology
If FDs are ignored… • The company has M offices, and each office has N employees: • # of (company, office, employee) tuples: • When M = 100, N=5 100x(100x5) • While, # of correct answers is only M*N = 500 Company Company 1 M Office Office Office Office 1 N Emp Emp Emp Emp Emp Emp Employee Center for E-Business Technology
Detecting FDs • A type of FDs required to determine XML structures to query is one-to-many(or one-to-one) relationships: • FD: Emp -> Office • Each employee belongs to an office • An office may have several employees (one-to-many) • We can observe these relationships by counting node occurrences or directory from the ER-diagram Company 1 Company M Office Office Office 1 Emp Emp Emp Emp N Employee Center for E-Business Technology
Avoiding Invalid Relations • However, an amoeba structure itself is a connected component of tree nodes, and thus invalid nodes may be connected, as illustrated in Figure 9. • To avoid these irrelevant node connections, while allowing various tree structures in describing XML data, we introduce a restricted class of XML structures, called a tree relation Center for E-Business Technology
Query Translation using FDs • Relational query into XML query • SELECT Co, Office, Emp • (with FDs: Emp-> Office, Office -> Co) • XML structures of interests are automatically determined from a relation and functional dependencies Center for E-Business Technology
FD-Aware Amoeba Join • FDs: Emp-> Office, Office -> Company • Bottom-up construction of query results • Amoeba Join (Employee, Office) • Amoeba Join (Office, Company) • FD-aware amoeba join avoids invalid XML structures Company Company 1 Office Office M Office Office 1 Emp Emp Emp Emp Emp Emp N Employee Center for E-Business Technology
Query Performance • FD-aware amoeba join scales well • For various sizes of XML data Center for E-Business Technology
Experiments – Query Set Center for E-Business Technology
Think in Relational-Style • First, consider • XML := Relations + their annotations • Steps • Detect relational part from XML data • Detect one-to-many(one) relationships (FDs) • Write relational queries • SELECT Co, Emp, Office • Things more in the paper • XML Algebra • Data Integration • Pushing Structural Constraints for better performance • Query Incomplete Relations • Other experiments Center for E-Business Technology
Contributions & Conclusions • Relation in XML • Defined using amoeba structure and FDs • XML Algebra • Relational-Style XML Query • Retrieves relations in XML with a SQL-like query syntax (SQL over XML) • Allows structural variations of XML data • Departure from path expression queries • Target XML structures are automatically determined • Applications • Database integration • Managing relational data enhanced with XML syntax • “It’s Just SQL” • A large number of XML data and queries are still relational Center for E-Business Technology
Contributions & Conclusions It’s not tragedy anymore… Center for E-Business Technology
Really? My thought on Center for E-Business Technology
Thoughts on The Paper • Good Points • Generally, I think it’s a very good paper • The paper shows us very interesting motivation and clearly define the problem • The approach in the paper is intuitive and well-explained • The authors performed well-designed experiments • The paper proposes many future works • Maybe, one of us can work on them • However, • I doubt if it is really useful – Can we use this for an important business project? • Although the author insists that their algorithm scales well, some queries caused out of memory and didn’t work • It’s a crucial problem to be used in real-life business • Isn’t Xpath good enough? • People who use XML docs usually know the structure of XML docs Center for E-Business Technology