350 likes | 366 Views
Learn how to create good XML documents for information exchange following XNF principles. Discover redundancy elimination and alignment with natural hierarchies. Generate XML scheme-trees using algorithms.
E N D
Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored in part by the National Science Foundation under grant number IIS-0083127
“Good” ~ XNF • Motivation • XML is for Information Exchange. • What constitutes a “good” XML document for Information Exchange? • Principles • XML Document Properties • A Few Large Trees. • No Redundancy. • Information Modeling • Create a conceptual model. • Generate “good” XML. • XNF • Align XML trees with natural hierarchies in the data. • Base redundancy elimination on FDs, naturally occurring MVDs, and inclusion dependencies (IDs).
Example: XNF ( F D ( S P ( H )* )* ( H )* )* F D S P H H Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing
Example: More Trees Than Necessary ( S P F ( H )* )* ( D ( F ( H )* )* S P F D H F H Pat PhD Kelly Hiking CS Kelly Hiking Skiing Skiing Tracy MS Kelly Hiking Math Lynn Sailing Sailing Chris MS Kelly
Example: Redundancy H S P ( H ( S P )* )* S H F ( S ( H ( F )* )* )* Pat Hiking Kelly Skiing Kelly Tracy Hiking Kelly Sailing Lynn Chris Hiking Pat PhD Tracy MS Skiing Pat PhD Sailing Tracy MS
XNF → XML ( F D ( S P ( H )* )* ( H )* )* F D S P H H Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing
Naive DTD Generation <!DOCTYPE University[ <!ELEMENT University ( ( Faculty_Member, Department, ( Grad_Student, Program, ( Hobby )* )* ( Hobby )* )*, <!ELEMENT Faculty_Member (#PCDATA)> … ]> ( F D ( S P ( H )* )* ( H )* )* F D S P H H Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing
Naive DTD Generation <!DOCTYPE University[ <!ELEMENT University ( ( Faculty_Member, Department, ( Graduate_Student, Program, ( Hobby )* )* ( Hobby )* )*, <!ELEMENT Faculty_Member (#PCDATA)> … ]> <University> <Faculty_Member>Kelly</Faculty_Member> <Department>CS</Department> <Graduate_Student>Pat</Graduate_Student> <Program>PhD</Program> <Hobby_S>Hiking</Hobby_S> <Hobby_S>Skiing</Hobby_S> <Graduate_Student>Tracy</Graduate_Student> <Program>MS</Program> <Hobby_S>Hiking</Hobby_S> <Hobby_S>Sailing</Hobby_S> <Graduate_Student>Chris</Graduate_Student> <Program>MS</Program> <Hobby_F>Hiking</Hobby_F> <Hobby_F>Skiing</Hobby_F> <Faculty_Member>Lynn</Facutly_Member> <Hobby_F>Sailing</Hobby_F> </University> F D S P H H
Sophisticated DTD Generation <!DOCTYPE University[ <!ELEMENT University (Faculty_Members)> <!ELEMENT Faculty_Members (Faculty_Member)*> <!ELEMENT Faculty_Member (Department, Grad_Students, Hobbies)> <!ATTLIST Faculty_Member value CDATA #REQUIRED> <!ELEMENT Department (#PCDATA) <!ELEMENT Grad_Students (Grad_Student)*> <!ELEMENT Grad_Student (Program, Hobbies)> … ]> <University> <Faculty_Members> <Faculty_Member value=“Kelly”> <Department>CS</Department> <Grad_Students> <Grad_Student value=“Pat”> <Program>PhD</Program> <Hobbies> <Hobby>Hiking</Hobby> <Hobby>Skiing</Hobby> </Hobbies> </Grad_Student> <Grad_Student value=“Tracy”> … </Faculty_Members> </University> Faculty Members F D S P H H Grad_Students Hobbies Hobbies
→ XNF How do we generate XNF scheme-trees? ( F D ( S P ( H )* )* ( H )* )* F D S P H H Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing
Alg. 1 How do we generate XNF scheme-trees? F D S P H H Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
Alg. 1: Start 3 2 1 How do we generate XNF scheme-trees? 2 1 F D S P H H Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
Alg. 1: Start 3 2 1 How do we generate XNF scheme-trees? 2 1 F D S P H H Algorithm 1 Until all vertices and edges are included: Find a start vertix: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
Alg. 1: Grow How do we generate XNF scheme-trees? F D S P H H Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
Alg. 1: Grow √ How do we generate XNF scheme-trees? F D SP H H Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
Alg. 1: Grow √ √ How do we generate XNF scheme-trees? F D SPH H Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
What is this restriction? Can we enlarge the set of dependencies? Can we relax this constraint? Algorithm 1 Yields XNF Theorem. Given a canonical, binary conceptual-model (CM) hypergraph H, Algorithm 1 generates an XNF scheme-tree forest with respect to the FDs and MVDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996)
A CM hypergraph is canonical if: • No edge is redundant, • No edge is losslessly • decomposable, and • (3) No vertex is redundant. Non-Canonical CM Hypergraphs If the input CM hypergraph has redundancy, Algorithm 1 generates scheme trees with potential redundancy. F D S P D H H D F S S P H H A faculty member’s department is the same as the faculty member’s students’ department. The set of students must be the same for every department.
Not Canonical: Decomposable Non-Binary CM Hypergraphs
Generating Scheme Trees fromNon-Binary CM Hypergraphs C D T N A M C A P C D T or or …
Alg. 2 Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges N A M C A P C D T
Alg. 2: Start √ Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges N A M C A P C D T
Alg. 2: Grow Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges N A M C A P C D T
Alg. 2: Start Again & Grow √ Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges N A M C A P C D T
Alg. 2: Start Again and Grow √ Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges N A M C A P C D T
Algorithm 2 Yields XNF Theorem. Given a canonical conceptual-model (CM) hypergraph H, Algorithm 2 generates an XNF scheme-tree forest with respect to the FDs and MVDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996)
optional connections Inclusion Dependencies (IDs)
Inclusion Dependencies (IDs) This constraint makes this vertex redundant.
Generating Scheme Trees fromCanonical CM Hypergraph with IDs F D S P HF HS Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2
Alg. 3: Collapse F D S P HF HS Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2
Alg. 3: Collapse F D S P HF HS Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2
Alg. 3: Execute F D S P HF HS Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2
Algorithm 3 Yields XNF Theorem. Given a canonical conceptual-model (CM) hypergraph H, Algorithm 3 generates an XNF scheme-tree forest with respect to the FDs, MVDs, and IDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996)
Conclusions • XNF ~ “Good” XML • No redundancy • As few trees as possible • Elegant DTD generation • Algorithms to generate XNF • Proofs of correctness embley@cs.byu.edu mokw@email.uah.edu