390 likes | 648 Views
Keyword Proximity Search on XML Graphs. Vagelis Hristidis Yannis Papakonstatinou Andrey Balmin @UCSD Presenter: Feng Shao. Outline . Introduction Proximity Keyword Query Semantics Architecture XML Decompositions Execution Experiment Conclusion. Introduction .
E N D
Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Balmin @UCSD Presenter: Feng Shao
Outline • Introduction • Proximity Keyword Query Semantics • Architecture • XML Decompositions • Execution • Experiment • Conclusion
Introduction • Keyword search is easy-to-use • No need to know the structure and query language • XML: labeled graph, representing semistructured self-describing data. • Feb.10, 5th birthday of XML From www.w3c.org
Problem--Keyword proximity query • Input: a set of keywords • Results: trees of XML fragments(called target objects) that contains all the keywords, ranked according to their size • Assume the existence of schema, facilitates the presentation of the results and used in optimizing the performance of the system.
Name[John]personsupplierlineitemlinepartproductdescr[set of VCR and DVD] , size 6 Name[John]personsupplierlineitemlinepartpartsubpartpartname[VCR], size 8
Challenges • Presentation of result graphs: • Semantically meaningful • Avoid a huge number of trivial results
Challenges • Presentation of result graphs: • Semantically meaningful • Avoid a huge number of trivial results • Providing fast response time • Efficient storage of data • On-demand execution, guided according to user’s navigation
Outline • Introduction • Proximity Keyword Query Semantics • Architecture • XML Decompositions • Execution • Experiment • Conclusion
Semantics • XML Graph: a labeled graph • Node v: id(v), label λ(v),value val(v) • Edge: containment and reference edges • Schema graph: a directed graph • Node vs: labelλ(vs), content type type(vs)(all orchoice) • Edge es: containment or refrence, annotated with a maximum occurrence occ(es) • A XML graph conforms to a schema graph
schema graph XML Graph
Query semantics • Result: the set of all possible Minimal Total Target Object Networks(MTTON’s) • What’s MTTON? • Node network j: an uncycled subgraph of G, such that each edge in j is an edge in G • Total node network j of keyword {k1,…,km}: a node network where every keyword is contained at least one node n of j • Minimal Total Node Network(MTTN):a total node network j where no node can be removed and j still be a total node network. Score : number of edges • Target object of node n: a segment of XML graph, large enough to be meaningful and semantically identify the node n, and as small as possible.
MTTON(cont.) • Given a MTNN j with nodes v1, . . . , vn there is a corresponding MTTON t, which is a tree whose • nodes is a minimal set of target objects {t1, . . . , tm} such that for every node nk ∈ j there is a tl ∈ t such that target(nk) = tl. • There is an edge from a target object ti to a target object tj if there is an edge ( or a path) from a node that belongs to ti to a node that belongs to tj . • The score of a MTTON j is the score of its corresponding MTNN. MTNN:namepersonnation MTNN: name
MTTN & MTTON Name[John]personsupplierlineitemlinepartpartsubpartpartname[VCR]
Target object • Defined from an administrator using the Target Schema Segment (TSS) graph • TSS graph: a partial mapping of nodes in G • A node tSis created in GTSSfor each set S = {s1, . . . , sw} of nodes of G that are mapped to tS. • An edge (tS, tS’) is created in GTSSif the schema graph has nodes s ∈ S and s ‘∈ S’, that are connected directly through an edge (s,s’) or indirectly through a path of dummy schema nodes. • Target decomposition: given the TSS graph, decompose XML graph into target objects, connected to each other
MTTN & MTTON Name[John]personsupplierlineitemlinepartpartsubpartpartname[VCR]
Presentation Graph • Naïve method: multiple threads, evaluating various plans for producing MTTON’s, and outputs as they come. • Pro: fast response time • Con: many trivial results • Interactive interface: allows navigation and hides the trivial results
Outline • Introduction • Proximity Keyword Query Semantics • Architecture • XML Decompositions • Execution • Experiment • Conclusion
Load Stage Keyword: <TO_id,node_id, schema_node> The number of nodes of each type and etc. A decomposition of the TSS graph into fragments, which correspond to connection relations that allow efficient retrieval of MTTON’s. Given an object id instantly return the whole target object
Query processing Keyword: TV, VCR Keyword: <TO_id,node_id, schema_node>
Execution Plan Candidate Network Schema graph and TSS graph Candidate TSS Network Connection relations schema Execution Plan TSS graph Connection relations Schema graph
Outline • Introduction • Proximity Keyword Query Semantics • Architecture • XML Decompositions • Execution • Experiment • Conclusion
XML Decomposition • Decompose TSS graph into fragments • Determines how the connections are stored in the database • Dramatically change the performance • Example: a a
Decomposition Tradeoff • # fragments v.s. performance • Minimal decomposition • A fragment is built for each edge of TSS graph • Candidate TSS network C of size S, requires S-1 joins • Maximal decomposition • A fragment F is built for every possible candidate TSS network C • C requires zero joins. • Not feasible in practice
Tradeoff (cont.) • Clustering and indexing are critical • Maximal decomp.: multi-attribute indices • Non-maximal decomp.: a connection relation R is clustered on the direction that R is used • Example • Classify TSS graph, based on the storage redundancy in the corresponding connection relations. • 4NF, inlined( non-MVD,no-4NF) • Decomposition Algorithm • See paper
Outline • Introduction • Proximity Keyword Query Semantics • Architecture • XML Decompositions • Execution • Experiment • Conclusion
Execution • Goal: fast response time • Web search engine-like presentation • Use inlined decomposition • Use thread pool • Use nest-loop joins • Example: Outmost loop: over TSS partVCR,name • Optimization: store partial results
Execution • Presentation graphs(on-demand) • Initially, Xkeyword decomposition is used to retrieve the top result of each CN. • Then use a combination of decompositions to find the minimal connection of the expanded nodes.
Outline • Introduction • Architecture • Proximity Keyword Query Semantics • XML Decompositions • Execution • Experiment • Conclusion
Experiments • Measure various decompositions , for top-K and full results • Evaluate the performance of algorithm for search engine-like presentation method and on-demand expansion method • Data: DBLP XML database, 2 keywords Maximum size of CTSSN: M = 6 Max size of fragments: L = 2
Execution algorithm Speedup = optimized algorithm / naïve, non-caching algorithm
Execution algorithm Keyword queries: the names of two authors, k1 and k2 Candidate Network: Authork1 Paper Authork2 Time measured: average time to expand a Paper node
Outline • Introduction • Architecture • Proximity Keyword Query Semantics • XML Decompositions • Execution • Experiment • Conclusion
Conclusion • Xkeyword is built on a relational database and, hence, can accommodate very large graphs. • Present keyword proximity search semantics, extended to capture the novel result presentation method. • Present an architecture allowing for choosing which connections will be precomputed • Address on-demand performance requirement • Demo: http://www.db.ucsd.edu/Xkeyword