1 / 18

A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS

This paper introduces a Tree Algebra framework for processing and manipulating XML data as trees, detailing algebraic operators, relationships, and algorithms. Examples and application in domain-specific XML query language are discussed.

vleach
Download Presentation

A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk , nick.rossiter@unn.ac.uk

  2. Overview • Framework in algebra for processing XML data. • Review related work • Develop a simple algebra, called TA (Tree Algebra), for processing storing and manipulating XML data as trees • Describe input and output of the algebraic operators • Define the syntax of relationships/operators and their semantics in terms of algorithms. • Examples are given in the domain specific XML query language. • Discuss closure and application

  3. Related Work • IBM (Beech & Rys, 1999) • Lore (McHugh et al 1997) • YATL (Christophides et al 2000) • Niagara (Galanis et al 2001) • AT&T (W3C) • TAX (Jagadish et al 2001) • Problems identified in complexity and generality

  4. Tree Algebra • True tree • Each node one parent but many children • Root node • Leaves of tree • Correspond to different sources – object relational • Two types of operators • Algebraic operators • Relational operators

  5. Concepts in Tree Model • Root (ultimate ancestor or parent) • Node (parent or child) • Edge (link from a parent to a child) • Leaf (atomic values, nodes with no children) • Path (sequence of edges between nodes) • Descendants (all successor nodes for a node) • Ancestors (all parent nodes for a node)

  6. Mappings • XML Document  Tree • Element  Node (root, parent, child) • Leaf  child node, atomic values • Attribute  function, values

  7. Example XML Tree Root – collection element; object1, object3 – sub-elements;

  8. Algebraic Relationships • Comparison of two trees • Universal (unary) • Defines tree containing all information • Similarity (binary) • Two trees have the same structure • Equivalence (binary) • Two trees are indistinguishable • Subsumption (binary) • One tree is subsumed in another

  9. Example Equivalence Relationship XML Tree Collection3 is equivalent to Collection4: Same node structure, no mismatch in content

  10. Example Subsumption Relationship Collection3 is part of collection4 (structure and content)

  11. Algebraic Operators for Trees • Join (binary, input two trees, output one tree, commutative, associative) • Joined on a predicate • Union (binary, input two trees, output one tree, commutative, associative, disjoint) • Summing trees together • Complement (binary, input two trees, output one tree, not commutative, not associative) • Nodes in one tree not found in another

  12. Algorithm for Complement Operator • // Input two XML document or two DOC tree (DOCn Tree, DOCm Tree) • // Output DOCnm Tree = (DOCn Tree - DOCm Tree) • 1 Start from root node DOCn • If root node DOCn Tree and root node DOCm Tree has parent/child node • .1 Perform depth-first algorithm • .2 If DOCn Tree has parent node not existing in DOCm Tree • 2.2.1 set parent node DOCn Tree to the new DOCnm Tree • 2.2.2 while parent node DOCn Tree has child node not existing in DOCm Tree • 2.2.2.1 set child node DOCn Tree to DOCnm Tree • 2.2.2.2 if child node DOCn Tree has leaf node not existing in DOCm Tree • 2.2.2.2.1 set leaf node DOCn Tree to DOCnm Tree • 2.2.2.3 set null to DOCnm Tree • 2.2.3 repeat • 2.3 set null to DOCnm Tree • 3 Set root node to DOCnm Tree and terminate • end/terminate

  13. Projection Algebra Operator (unary, input one tree, output one tree): Example • Eliminates nodes other than those specified • Projection of object3

  14. Algebra Operators (continued) • Select (unary, input one tree, output one tree) • Filters nodes according to a predicate • Expose (unary, input one tree, output one tree) • Retrieve specific elements/nodes given by parent/child boundaries • Vertex (unary, input one tree, output one tree) • Creates the vertex encompassing all nodes created by the expose operator

  15. Algorithm for Complement Operator • // Input one DOC tree or one XML document • // Output one DOC tree or one XML document • 1 start with entry point, it is the root node • perform depth-first algorithm 2.1 if parameter is equal to the specific node needed to expose • .1.1 return the specific node • .1.2 set specific node in the new tree 2.2 if exposed element does not exist then terminate 3 end/terminate

  16. Results • Developed • Domain specific algebra • Tree algebra • Algebraic relationships • Universal, similarity, equivalence, subsumption • Algebraic operators • Join, union, complement, project, select, expose, vertex • Closure – output is always a tree

  17. Verification • All operators: • Presented as algorithms • Implemented in java • Case study: • Virtual museum application • Implemented code employed for satisfaction of museum requirements

  18. Further Work • Investigate • Extent to which limitations in operators affects usability • Does domain need extending? • Further experimentation • Examine feedback from museum study • Look at further areas

More Related