1 / 35

Producing XML Documents with Guaranteed “Good” Properties

Producing XML Documents with Guaranteed “Good” Properties. David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville. Sponsored in part by the National Science Foundation under grant number IIS-0083127. “Good” ~ XNF. Motivation XML is for Information Exchange.

Download Presentation

Producing XML Documents with Guaranteed “Good” Properties

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored in part by the National Science Foundation under grant number IIS-0083127

  2. “Good” ~ XNF • Motivation • XML is for Information Exchange. • What constitutes a “good” XML document for Information Exchange? • Principles • XML Document Properties • A Few Large Trees. • No Redundancy. • Information Modeling • Create a conceptual model. • Generate “good” XML. • XNF • Align XML trees with natural hierarchies in the data. • Base redundancy elimination on FDs, naturally occurring MVDs, and inclusion dependencies (IDs).

  3. Example: XNF ( F D ( S P ( H )* )* ( H )* )* F D S P H H Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing

  4. Example: More Trees Than Necessary ( S P F ( H )* )* ( D ( F ( H )* )* S P F D H F H Pat PhD Kelly Hiking CS Kelly Hiking Skiing Skiing Tracy MS Kelly Hiking Math Lynn Sailing Sailing Chris MS Kelly

  5. Example: Redundancy H S P ( H ( S P )* )* S H F ( S ( H ( F )* )* )* Pat Hiking Kelly Skiing Kelly Tracy Hiking Kelly Sailing Lynn Chris Hiking Pat PhD Tracy MS Skiing Pat PhD Sailing Tracy MS

  6. XNF → XML ( F D ( S P ( H )* )* ( H )* )* F D S P H H Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing

  7. Naive DTD Generation <!DOCTYPE University[ <!ELEMENT University ( ( Faculty_Member, Department, ( Grad_Student, Program, ( Hobby )* )* ( Hobby )* )*, <!ELEMENT Faculty_Member (#PCDATA)> … ]> ( F D ( S P ( H )* )* ( H )* )* F D S P H H Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing

  8. Naive DTD Generation <!DOCTYPE University[ <!ELEMENT University ( ( Faculty_Member, Department, ( Graduate_Student, Program, ( Hobby )* )* ( Hobby )* )*, <!ELEMENT Faculty_Member (#PCDATA)> … ]> <University> <Faculty_Member>Kelly</Faculty_Member> <Department>CS</Department> <Graduate_Student>Pat</Graduate_Student> <Program>PhD</Program> <Hobby_S>Hiking</Hobby_S> <Hobby_S>Skiing</Hobby_S> <Graduate_Student>Tracy</Graduate_Student> <Program>MS</Program> <Hobby_S>Hiking</Hobby_S> <Hobby_S>Sailing</Hobby_S> <Graduate_Student>Chris</Graduate_Student> <Program>MS</Program> <Hobby_F>Hiking</Hobby_F> <Hobby_F>Skiing</Hobby_F> <Faculty_Member>Lynn</Facutly_Member> <Hobby_F>Sailing</Hobby_F> </University> F D S P H H

  9. Sophisticated DTD Generation <!DOCTYPE University[ <!ELEMENT University (Faculty_Members)> <!ELEMENT Faculty_Members (Faculty_Member)*> <!ELEMENT Faculty_Member (Department, Grad_Students, Hobbies)> <!ATTLIST Faculty_Member value CDATA #REQUIRED> <!ELEMENT Department (#PCDATA) <!ELEMENT Grad_Students (Grad_Student)*> <!ELEMENT Grad_Student (Program, Hobbies)> … ]> <University> <Faculty_Members> <Faculty_Member value=“Kelly”> <Department>CS</Department> <Grad_Students> <Grad_Student value=“Pat”> <Program>PhD</Program> <Hobbies> <Hobby>Hiking</Hobby> <Hobby>Skiing</Hobby> </Hobbies> </Grad_Student> <Grad_Student value=“Tracy”> … </Faculty_Members> </University> Faculty Members F D S P H H Grad_Students Hobbies Hobbies

  10. → XNF How do we generate XNF scheme-trees? ( F D ( S P ( H )* )* ( H )* )* F D S P H H Kelly CS Pat PhD Hiking Hiking Skiing Skiing Tracy MS Hiking Sailing Chris MS Lynn Math Sailing

  11. Alg. 1 How do we generate XNF scheme-trees? F D S P H H Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

  12. Alg. 1: Start 3 2 1 How do we generate XNF scheme-trees? 2 1 F D S P H H Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

  13. Alg. 1: Start 3 2 1 How do we generate XNF scheme-trees? 2 1 F D S P H H Algorithm 1 Until all vertices and edges are included: Find a start vertix: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

  14. Alg. 1: Grow How do we generate XNF scheme-trees? F D S P H H Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

  15. Alg. 1: Grow √ How do we generate XNF scheme-trees? F D SP H H Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

  16. Alg. 1: Grow √ √ How do we generate XNF scheme-trees? F D SPH H Algorithm 1 Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

  17. What is this restriction? Can we enlarge the set of dependencies? Can we relax this constraint? Algorithm 1 Yields XNF Theorem. Given a canonical, binary conceptual-model (CM) hypergraph H, Algorithm 1 generates an XNF scheme-tree forest with respect to the FDs and MVDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996)

  18. A CM hypergraph is canonical if: • No edge is redundant, • No edge is losslessly • decomposable, and • (3) No vertex is redundant. Non-Canonical CM Hypergraphs If the input CM hypergraph has redundancy, Algorithm 1 generates scheme trees with potential redundancy. F D S P D H H D F S S P H H A faculty member’s department is the same as the faculty member’s students’ department. The set of students must be the same for every department.

  19. Not Canonical: Decomposable Non-Binary CM Hypergraphs

  20. Generating Scheme Trees fromNon-Binary CM Hypergraphs C D T N A M C A P C D T or or …

  21. Alg. 2 Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges N A M C A P C D T

  22. Alg. 2: Start √ Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges N A M C A P C D T

  23. Alg. 2: Grow Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges N A M C A P C D T

  24. Alg. 2: Start Again & Grow √ Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges N A M C A P C D T

  25. Alg. 2: Start Again and Grow √ Algorithm 2 Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges N A M C A P C D T

  26. Algorithm 2 Yields XNF Theorem. Given a canonical conceptual-model (CM) hypergraph H, Algorithm 2 generates an XNF scheme-tree forest with respect to the FDs and MVDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996)

  27. optional connections Inclusion Dependencies (IDs)

  28. Inclusion Dependencies (IDs) This constraint makes this vertex redundant.

  29. Canonical CM Hypergraph with IDs

  30. Generating Scheme Trees fromCanonical CM Hypergraph with IDs F D S P HF HS Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

  31. Alg. 3: Collapse F D S P HF HS Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

  32. Alg. 3: Collapse F D S P HF HS Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

  33. Alg. 3: Execute F D S P HF HS Algorithm 3 Collapse G/S hierarchies If the edges are all binary Execute Algorithm 1 Else Execute Algorithm 2

  34. Algorithm 3 Yields XNF Theorem. Given a canonical conceptual-model (CM) hypergraph H, Algorithm 3 generates an XNF scheme-tree forest with respect to the FDs, MVDs, and IDs of H. Proof: Based on NNF (Mok, et al., TODS, 1996)

  35. Conclusions • XNF ~ “Good” XML • No redundancy • As few trees as possible • Elegant DTD generation • Algorithms to generate XNF • Proofs of correctness embley@cs.byu.edu mokw@email.uah.edu

More Related