180 likes | 198 Views
This paper introduces a Tree Algebra framework for processing and manipulating XML data as trees, detailing algebraic operators, relationships, and algorithms. Examples and application in domain-specific XML query language are discussed.
E N D
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: ali.elbekai@unn.ac.uk , nick.rossiter@unn.ac.uk
Overview • Framework in algebra for processing XML data. • Review related work • Develop a simple algebra, called TA (Tree Algebra), for processing storing and manipulating XML data as trees • Describe input and output of the algebraic operators • Define the syntax of relationships/operators and their semantics in terms of algorithms. • Examples are given in the domain specific XML query language. • Discuss closure and application
Related Work • IBM (Beech & Rys, 1999) • Lore (McHugh et al 1997) • YATL (Christophides et al 2000) • Niagara (Galanis et al 2001) • AT&T (W3C) • TAX (Jagadish et al 2001) • Problems identified in complexity and generality
Tree Algebra • True tree • Each node one parent but many children • Root node • Leaves of tree • Correspond to different sources – object relational • Two types of operators • Algebraic operators • Relational operators
Concepts in Tree Model • Root (ultimate ancestor or parent) • Node (parent or child) • Edge (link from a parent to a child) • Leaf (atomic values, nodes with no children) • Path (sequence of edges between nodes) • Descendants (all successor nodes for a node) • Ancestors (all parent nodes for a node)
Mappings • XML Document Tree • Element Node (root, parent, child) • Leaf child node, atomic values • Attribute function, values
Example XML Tree Root – collection element; object1, object3 – sub-elements;
Algebraic Relationships • Comparison of two trees • Universal (unary) • Defines tree containing all information • Similarity (binary) • Two trees have the same structure • Equivalence (binary) • Two trees are indistinguishable • Subsumption (binary) • One tree is subsumed in another
Example Equivalence Relationship XML Tree Collection3 is equivalent to Collection4: Same node structure, no mismatch in content
Example Subsumption Relationship Collection3 is part of collection4 (structure and content)
Algebraic Operators for Trees • Join (binary, input two trees, output one tree, commutative, associative) • Joined on a predicate • Union (binary, input two trees, output one tree, commutative, associative, disjoint) • Summing trees together • Complement (binary, input two trees, output one tree, not commutative, not associative) • Nodes in one tree not found in another
Algorithm for Complement Operator • // Input two XML document or two DOC tree (DOCn Tree, DOCm Tree) • // Output DOCnm Tree = (DOCn Tree - DOCm Tree) • 1 Start from root node DOCn • If root node DOCn Tree and root node DOCm Tree has parent/child node • .1 Perform depth-first algorithm • .2 If DOCn Tree has parent node not existing in DOCm Tree • 2.2.1 set parent node DOCn Tree to the new DOCnm Tree • 2.2.2 while parent node DOCn Tree has child node not existing in DOCm Tree • 2.2.2.1 set child node DOCn Tree to DOCnm Tree • 2.2.2.2 if child node DOCn Tree has leaf node not existing in DOCm Tree • 2.2.2.2.1 set leaf node DOCn Tree to DOCnm Tree • 2.2.2.3 set null to DOCnm Tree • 2.2.3 repeat • 2.3 set null to DOCnm Tree • 3 Set root node to DOCnm Tree and terminate • end/terminate
Projection Algebra Operator (unary, input one tree, output one tree): Example • Eliminates nodes other than those specified • Projection of object3
Algebra Operators (continued) • Select (unary, input one tree, output one tree) • Filters nodes according to a predicate • Expose (unary, input one tree, output one tree) • Retrieve specific elements/nodes given by parent/child boundaries • Vertex (unary, input one tree, output one tree) • Creates the vertex encompassing all nodes created by the expose operator
Algorithm for Complement Operator • // Input one DOC tree or one XML document • // Output one DOC tree or one XML document • 1 start with entry point, it is the root node • perform depth-first algorithm 2.1 if parameter is equal to the specific node needed to expose • .1.1 return the specific node • .1.2 set specific node in the new tree 2.2 if exposed element does not exist then terminate 3 end/terminate
Results • Developed • Domain specific algebra • Tree algebra • Algebraic relationships • Universal, similarity, equivalence, subsumption • Algebraic operators • Join, union, complement, project, select, expose, vertex • Closure – output is always a tree
Verification • All operators: • Presented as algorithms • Implemented in java • Case study: • Virtual museum application • Implemented code employed for satisfaction of museum requirements
Further Work • Investigate • Extent to which limitations in operators affects usability • Does domain need extending? • Further experimentation • Examine feedback from museum study • Look at further areas