190 likes | 305 Views
Storing and Querying Ordered XML Using a Relational Database System. By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas. Introduction.
E N D
Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas
Introduction • Researchers proposed using relational database system to decompose XML documents into relations and translate XML queries into SQL queries over these relation. • The paper will answer the question of how (and whether) the ordered XML data model can be effectively supported by the unordered relational data model.
Introduction (Cont.) • The paper proposes three order encoding methods (Global Order, Local Order, and Dewey Order) to represent XML order in the relational data model. • The paper will also answer the question of when and why to use which encoding method.
1. Ordered XML: Data Model, Query Language and Query Dimensions • 1.1 The XML Data Model • An XML can be viewed as a tree. • Leaf Nodes = Data Values (text). • Internal Nodes = XML elements. • Document Order = the orders of the elements in the XML document.
1. Ordered XML: Data Model, Query Language and Query Dimensions • 1.2 Order in XML Query Language • 1.2.1 XPath • is a language for specifying navigation within an XML document. • An XPath expression’s syntax: • Path ::= /Step1/Step2/…/Stepn • Path ::= /films/film/writers/writer/Murray Burnett • An XPath expression is evaluated sequentially, “step” by “step”.
1. Ordered XML: Data Model, Query Language and Query Dimensions (Cont.) • 1.2.2 XQuery • Is a more complex language based on XPath. • Has all the functionalities of XPath. • Includes Before and After operators that take two node sequences (XPath expressions) and • Return the nodes from the first sequence that are before or after some node in the second sequence.
1. Ordered XML: Data Model, Query Language and Query Dimensions (Cont.) • 1.3 Evaluation Modes for XML Queries • 1.3.1 Select Mode: • The nodes in an input XML document are assumed to have unique identifiers (IDs). • The results of executing an XPath expression is an ordered set of node IDs. • 1.3.2 Reconstruct Mode: • Combines selection and extraction. • The result of evaluating an XPath expression in reconstruct mode is an ordered set of XML elements.
2. XML Order Encoding Methods • To capture the document order in the relational data model is accomplished by encoding each node’s position in an XML document as a data value. • Unfortunately, there is no single encoding method, which is optimal for both queries and updates.
2. XML Order Encoding Methods (Cont.) • 2.1 Global Order Encoding: • Each node is assigned a number that represents the node’s absolute position in the document. • Poor insertion performance is the primary weakness. • To improve updates’ performance, use sparse numbering which does not require the remaining nodes be numbered when deleting XML fragments.
1 play 2 4 title act act 3 5 7 #text title scene 6 #text Global Order Global Order Encoding
2. XML Order Encoding Methods (Cont.) • 2.3 Local (Sibling) Order Encoding: • Each node is assigned a number that represents its relative position among its siblings. • Combining a node’s position with that of its ancestors yields a path vector that uniquely identifies the absolute position of the node within the document. • The advantage is the low overhead incurred by updates.
1 play 3 1 2 title act act 1 1 2 #text title scene 1 #text Local Order Local Order Encoding
2. XML Order Encoding Methods (Cont.) • 2.3 Dewey Order Encoding: • Each node is assigned a vector that represents the path from the document’s root to the node. • Is “lossless” because each path uniquely identifies the absolute position of the node within the document. • One potential disadvantage is the extra space required to store paths from the root to each node.
1 play 1.3 1.1 1.2 title act act 1.1.1 1.2.1 1.2.2 #text title scene 1.2.1.1 #text Dewey Order Dewey Order Encoding
3. Shredding Ordered XML into Relations • 3.1 The Schema-less Case: • The schema of input document is unknown. • The Edge shredding approach is proposed. • A single relation, the Edge table, is used to store an entire document. • The Edge table is defined • Edge(id, parent, name, value) • To reduce storage, a separate relation (the Path table) can be used to store paths and their identifiers.
3. Shredding Ordered XML into Relations (Cont.) • 3.1.1 Storing Order Information • Global Order • Edge(id, parent_id, end_des_id, path_id, value) • Local Order • Edge(id, parent_id, sIndex, path_id, value) • Dewey Order • Edge(dewey, path_id, value)
3. Shredding Ordered XML into Relations (Cont.) • 3.2 The Schema-aware Case: • Inlining shredding technique can be used when an XML Schema (DTD) is known. • One advantage is the possibility of more efficient navigation from an element to its sub-elements. • Another advantage is Inlining shreds and stores XML documents into a set of tables. Queries tend to access less data and perform better.
Summary • Relational database systems can support most ordered XML queries efficiently. • Global Order is best for query-mostly workloads. Dewey Order is best for a mix of queries and updates. Local Order is best for update-intensive environments. • Edge and Inlining methods are well known for translating ordered XML queries into SQL queries.