1 / 13

Converting Disjunctive Data to Disjunctive Graphs

Converting Disjunctive Data to Disjunctive Graphs. Lars Olson Data Extraction Group Funded by NSF. Introduction. Disjunctive databases Needed to represent disjunctive data Queries are CoNP-complete in general [Imielinski and Vadaparty, 1989] Transitive closure in disjunctive graphs

isla
Download Presentation

Converting Disjunctive Data to Disjunctive Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Converting Disjunctive Data to Disjunctive Graphs Lars Olson Data Extraction Group Funded by NSF

  2. Introduction • Disjunctive databases • Needed to represent disjunctive data • Queries are CoNP-complete in general [Imielinski and Vadaparty, 1989] • Transitive closure in disjunctive graphs • CoNP-complete in general • Polynomial time, under certain circumstances [Lobo et. al, 1995]

  3. The Problem • How do we convert the data into a disjunctive graph? • What is the complexity of the conversion? • Time • Space / Memory

  4. Implementation • XML data repository • Shore / Niagara (Univ. of Wisconsin) • Xerces XML parser (Apache.org) • How do we represent a disjunctive database in storage? • Needs to be easy to convert to disjunctive graph • Needs to minimize the changes to the DTD and thus, the existing data

  5. :B :A XML → Graph Conversion doc • XML → DOM tree Node <doc> <Node name=“A”> <EdgeTo ref=“B”/> </Node> <Node name=“B”></Node> ... </doc> Node EdgeTo A B B • Use primary key to distinguish doc→Node edges • Use foreign key to perform join (EdgeTo.ref = Node.name)

  6. Disjunctions in XML, 1st Case <Node name=“A”> <EdgeTo ref=“B”/> <Disj> <EdgeTo ref=“C”/> <EdgeTo ref=“D”/> </Disj> </Node> ... B A C D …but how do we represent a disjunctive tail?

  7. E G F H E G doc H F Disjunctions in XML, 1st Case <Node name=“A”> <EdgeTo ref=“B”/> <Disj> <EdgeTo ref=“C”/> <EdgeTo ref=“D”/> </Disj> </Node> <Disj> <Node name=“E”> <EdgeTo ref=“G”/> <EdgeTo ref=“H”/> </Node> <Node name=“F”> <EdgeTo ref=“G”/> <EdgeTo ref=“H”/> </Node> </Disj> ... or…

  8. Disjunctions in XML, 2nd Case <Disj> <Tail> <Node name=“E”/> <Node name=“F”/> </Tail> <Head> <EdgeTo ref=“G”/> <EdgeTo ref=“H”/> </Head> </Disj> ... E G F H What if the disjunction isn’t the full cross-product?

  9. I K J L Disjunctions in XML, 3rd Case <Disj> <Tail> <Node name=“I”/> </Tail> <Head> <EdgeTo ref=“K”/> </Head> <Tail> <Node name=“J”/> </Tail> <Head> <EdgeTo ref=“K”/> <EdgeTo ref=“L”/> </Head> </Disj> ...

  10. Time and Space Complexity • n = # of nodes in DOM tree • counts edges as well • not necessarily proportional to # of values in the database • Ordinary XML: traverse tree, add edges. Distinguish records with primary keys, add edges for foreign keys. O(n) time, O(n) space.

  11. Time and Space Complexity • <Disj>: same, except only one edge to all children. O(n), O(n). • <Disj> with <Tail> and <Head>: traverse tree, add <Tail> and <Head> elements to a list, add one edge, repeat for each Tail/Head pair. O(n), O(n).

  12. Summary • We need to introduce new XML constructs: • <Disj> • Helper constructs <Tail> and <Head> • Three cases • simple tail, compound head • full cross-product • partial cross-product • Time and space requirements consistent with the transitive closure algorithm

  13. Future Work • Solving path queries • Adding XML constructs for more complicated disjunctions e.g. Tail (A or B), Head ((C and D) or E) • Determining frequency of disjunctive data in real-world data • Developing a normal form for disjunctive XML • Minimize redundancy • Minimize disjunctive tails

More Related