130 likes | 330 Views
Converting Disjunctive Data to Disjunctive Graphs. Lars Olson Data Extraction Group Funded by NSF. Introduction. Disjunctive databases Needed to represent disjunctive data Queries are CoNP-complete in general [Imielinski and Vadaparty, 1989] Transitive closure in disjunctive graphs
E N D
Converting Disjunctive Data to Disjunctive Graphs Lars Olson Data Extraction Group Funded by NSF
Introduction • Disjunctive databases • Needed to represent disjunctive data • Queries are CoNP-complete in general [Imielinski and Vadaparty, 1989] • Transitive closure in disjunctive graphs • CoNP-complete in general • Polynomial time, under certain circumstances [Lobo et. al, 1995]
The Problem • How do we convert the data into a disjunctive graph? • What is the complexity of the conversion? • Time • Space / Memory
Implementation • XML data repository • Shore / Niagara (Univ. of Wisconsin) • Xerces XML parser (Apache.org) • How do we represent a disjunctive database in storage? • Needs to be easy to convert to disjunctive graph • Needs to minimize the changes to the DTD and thus, the existing data
:B :A XML → Graph Conversion doc • XML → DOM tree Node <doc> <Node name=“A”> <EdgeTo ref=“B”/> </Node> <Node name=“B”></Node> ... </doc> Node EdgeTo A B B • Use primary key to distinguish doc→Node edges • Use foreign key to perform join (EdgeTo.ref = Node.name)
Disjunctions in XML, 1st Case <Node name=“A”> <EdgeTo ref=“B”/> <Disj> <EdgeTo ref=“C”/> <EdgeTo ref=“D”/> </Disj> </Node> ... B A C D …but how do we represent a disjunctive tail?
E G F H E G doc H F Disjunctions in XML, 1st Case <Node name=“A”> <EdgeTo ref=“B”/> <Disj> <EdgeTo ref=“C”/> <EdgeTo ref=“D”/> </Disj> </Node> <Disj> <Node name=“E”> <EdgeTo ref=“G”/> <EdgeTo ref=“H”/> </Node> <Node name=“F”> <EdgeTo ref=“G”/> <EdgeTo ref=“H”/> </Node> </Disj> ... or…
Disjunctions in XML, 2nd Case <Disj> <Tail> <Node name=“E”/> <Node name=“F”/> </Tail> <Head> <EdgeTo ref=“G”/> <EdgeTo ref=“H”/> </Head> </Disj> ... E G F H What if the disjunction isn’t the full cross-product?
I K J L Disjunctions in XML, 3rd Case <Disj> <Tail> <Node name=“I”/> </Tail> <Head> <EdgeTo ref=“K”/> </Head> <Tail> <Node name=“J”/> </Tail> <Head> <EdgeTo ref=“K”/> <EdgeTo ref=“L”/> </Head> </Disj> ...
Time and Space Complexity • n = # of nodes in DOM tree • counts edges as well • not necessarily proportional to # of values in the database • Ordinary XML: traverse tree, add edges. Distinguish records with primary keys, add edges for foreign keys. O(n) time, O(n) space.
Time and Space Complexity • <Disj>: same, except only one edge to all children. O(n), O(n). • <Disj> with <Tail> and <Head>: traverse tree, add <Tail> and <Head> elements to a list, add one edge, repeat for each Tail/Head pair. O(n), O(n).
Summary • We need to introduce new XML constructs: • <Disj> • Helper constructs <Tail> and <Head> • Three cases • simple tail, compound head • full cross-product • partial cross-product • Time and space requirements consistent with the transitive closure algorithm
Future Work • Solving path queries • Adding XML constructs for more complicated disjunctions e.g. Tail (A or B), Head ((C and D) or E) • Determining frequency of disjunctive data in real-world data • Developing a normal form for disjunctive XML • Minimize redundancy • Minimize disjunctive tails