1 / 39

Keyword Proximity Search on XML Graphs

Keyword Proximity Search on XML Graphs. Vagelis Hristidis Yannis Papakonstatinou Andrey Balmin @UCSD Presenter: Feng Shao. Outline . Introduction Proximity Keyword Query Semantics Architecture XML Decompositions Execution Experiment Conclusion. Introduction .

chip
Download Presentation

Keyword Proximity Search on XML Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Balmin @UCSD Presenter: Feng Shao

  2. Outline • Introduction • Proximity Keyword Query Semantics • Architecture • XML Decompositions • Execution • Experiment • Conclusion

  3. Introduction • Keyword search is easy-to-use • No need to know the structure and query language • XML: labeled graph, representing semistructured self-describing data. • Feb.10, 5th birthday of XML From www.w3c.org

  4. Problem--Keyword proximity query • Input: a set of keywords • Results: trees of XML fragments(called target objects) that contains all the keywords, ranked according to their size • Assume the existence of schema, facilitates the presentation of the results and used in optimizing the performance of the system.

  5. Name[John]personsupplierlineitemlinepartproductdescr[set of VCR and DVD] , size 6 Name[John]personsupplierlineitemlinepartpartsubpartpartname[VCR], size 8

  6. Challenges • Presentation of result graphs: • Semantically meaningful • Avoid a huge number of trivial results

  7. Challenges • Presentation of result graphs: • Semantically meaningful • Avoid a huge number of trivial results • Providing fast response time • Efficient storage of data • On-demand execution, guided according to user’s navigation

  8. Outline • Introduction • Proximity Keyword Query Semantics • Architecture • XML Decompositions • Execution • Experiment • Conclusion

  9. Semantics • XML Graph: a labeled graph • Node v: id(v), label λ(v),value val(v) • Edge: containment and reference edges • Schema graph: a directed graph • Node vs: labelλ(vs), content type type(vs)(all orchoice) • Edge es: containment or refrence, annotated with a maximum occurrence occ(es) • A XML graph conforms to a schema graph

  10. schema graph XML Graph

  11. Query semantics • Result: the set of all possible Minimal Total Target Object Networks(MTTON’s) • What’s MTTON? • Node network j: an uncycled subgraph of G, such that each edge in j is an edge in G • Total node network j of keyword {k1,…,km}: a node network where every keyword is contained at least one node n of j • Minimal Total Node Network(MTTN):a total node network j where no node can be removed and j still be a total node network. Score : number of edges • Target object of node n: a segment of XML graph, large enough to be meaningful and semantically identify the node n, and as small as possible.

  12. MTTON(cont.) • Given a MTNN j with nodes v1, . . . , vn there is a corresponding MTTON t, which is a tree whose • nodes is a minimal set of target objects {t1, . . . , tm} such that for every node nk ∈ j there is a tl ∈ t such that target(nk) = tl. • There is an edge from a target object ti to a target object tj if there is an edge ( or a path) from a node that belongs to ti to a node that belongs to tj . • The score of a MTTON j is the score of its corresponding MTNN. MTNN:namepersonnation MTNN: name

  13. MTTN & MTTON Name[John]personsupplierlineitemlinepartpartsubpartpartname[VCR]

  14. Target object • Defined from an administrator using the Target Schema Segment (TSS) graph • TSS graph: a partial mapping of nodes in G • A node tSis created in GTSSfor each set S = {s1, . . . , sw} of nodes of G that are mapped to tS. • An edge (tS, tS’) is created in GTSSif the schema graph has nodes s ∈ S and s ‘∈ S’, that are connected directly through an edge (s,s’) or indirectly through a path of dummy schema nodes. • Target decomposition: given the TSS graph, decompose XML graph into target objects, connected to each other

  15. Example

  16. MTTN & MTTON Name[John]personsupplierlineitemlinepartpartsubpartpartname[VCR]

  17. Presentation Graph • Naïve method: multiple threads, evaluating various plans for producing MTTON’s, and outputs as they come. • Pro: fast response time • Con: many trivial results • Interactive interface: allows navigation and hides the trivial results

  18. Presentation Graph

  19. Outline • Introduction • Proximity Keyword Query Semantics • Architecture • XML Decompositions • Execution • Experiment • Conclusion

  20. Architecture

  21. Load Stage Keyword: <TO_id,node_id, schema_node> The number of nodes of each type and etc. A decomposition of the TSS graph into fragments, which correspond to connection relations that allow efficient retrieval of MTTON’s. Given an object id instantly return the whole target object

  22. Example of decomposition

  23. Query processing Keyword: TV, VCR Keyword: <TO_id,node_id, schema_node>

  24. Execution Plan Candidate Network Schema graph and TSS graph Candidate TSS Network Connection relations schema Execution Plan TSS graph Connection relations Schema graph

  25. Outline • Introduction • Proximity Keyword Query Semantics • Architecture • XML Decompositions • Execution • Experiment • Conclusion

  26. XML Decomposition • Decompose TSS graph into fragments • Determines how the connections are stored in the database • Dramatically change the performance • Example: a a

  27. Decomposition Tradeoff • # fragments v.s. performance • Minimal decomposition • A fragment is built for each edge of TSS graph • Candidate TSS network C of size S, requires S-1 joins • Maximal decomposition • A fragment F is built for every possible candidate TSS network C • C requires zero joins. • Not feasible in practice

  28. Tradeoff (cont.) • Clustering and indexing are critical • Maximal decomp.: multi-attribute indices • Non-maximal decomp.: a connection relation R is clustered on the direction that R is used • Example • Classify TSS graph, based on the storage redundancy in the corresponding connection relations. • 4NF, inlined( non-MVD,no-4NF) • Decomposition Algorithm • See paper

  29. Outline • Introduction • Proximity Keyword Query Semantics • Architecture • XML Decompositions • Execution • Experiment • Conclusion

  30. Execution • Goal: fast response time • Web search engine-like presentation • Use inlined decomposition • Use thread pool • Use nest-loop joins • Example: Outmost loop: over TSS partVCR,name • Optimization: store partial results

  31. Execution • Presentation graphs(on-demand) • Initially, Xkeyword decomposition is used to retrieve the top result of each CN. • Then use a combination of decompositions to find the minimal connection of the expanded nodes.

  32. Outline • Introduction • Architecture • Proximity Keyword Query Semantics • XML Decompositions • Execution • Experiment • Conclusion

  33. Experiments • Measure various decompositions , for top-K and full results • Evaluate the performance of algorithm for search engine-like presentation method and on-demand expansion method • Data: DBLP XML database, 2 keywords Maximum size of CTSSN: M = 6 Max size of fragments: L = 2

  34. Decompositions

  35. Execution algorithm Speedup = optimized algorithm / naïve, non-caching algorithm

  36. Execution algorithm Keyword queries: the names of two authors, k1 and k2 Candidate Network: Authork1 Paper  Authork2 Time measured: average time to expand a Paper node

  37. Outline • Introduction • Architecture • Proximity Keyword Query Semantics • XML Decompositions • Execution • Experiment • Conclusion

  38. Conclusion • Xkeyword is built on a relational database and, hence, can accommodate very large graphs. • Present keyword proximity search semantics, extended to capture the novel result presentation method. • Present an architecture allowing for choosing which connections will be precomputed • Address on-demand performance requirement • Demo: http://www.db.ucsd.edu/Xkeyword

More Related