1 / 43

Effective Keyword Search for Valuable LCAs over XML Documents

Effective Keyword Search for Valuable LCAs over XML Documents. Guoliang Li Jianhua Feng Jianyong Wang Lizhu Zhou . Lin Shao XML und Datenbanksysteme. Content. Introduction Background and Motivation Valuable LCA Meaningful Dewey Code (MDC)

fola
Download Presentation

Effective Keyword Search for Valuable LCAs over XML Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EffectiveKeywordSearch for Valuable LCAs over XML Documents Guoliang Li JianhuaFeng Jianyong Wang Lizhu Zhou Lin Shao XML und Datenbanksysteme

  2. Content • Introduction • Backgroundand Motivation • Valuable LCA • Meaningful Dewey Code (MDC) • The Stack-Based Algorithm • Experimental Study • Conclusion

  3. Introduction • Existing proposals on keyword search over XML databases suffer from two problems • Meaningfulness and completeness of answers, and the scope of the search • The answer of keyword search should not be limited to the LCAs of the keyword

  4. Introduction • To solve the problem • Valuable LCA • Compact VLCA • devise an efficient stack-based algorithm

  5. Background and Motivation • Notations • u v u is an ancestor of node v • u < v u precedes v in the XML Document • u _ v denotes that u v or u = v

  6. Background and Motivation • Notations • u v u is an ancestor of node v • u < v u precedes v in the XML Document • u _ v denotes that u v or u = v • For example • conf(2) paper(15) • author(17) _ paper(15) • title (6) < author(17)

  7. Background and Motivation Example False positive problem of LCA • Search for: {“IR”, “Tom”}

  8. Background and Motivation Example False positive problem of LCA • Search for: {“IR”, “Tom”} false answer conf(2) • Solutions • Meaningful LCA (MLCA) • Smallest LCA (SLCA) • XRank

  9. Background and Motivation Example False negative problem of SLCA • Search for: {“XML”, “Bob”}

  10. Background and Motivation Example False negative problem of SLCA • Search for: {“XML”, “Bob”} paper(5) will not be in SLCAset

  11. Background and Motivation Example False positive problem of SLCA • Search for: {“XML”, “John”}

  12. Background and Motivation Example False positive problem of SLCA • Search for: {“XML”, “John”} false answer conf(2)

  13. Content • Introduction • Backgroundand Motivation • Valuable LCA • Meaningful Dewey Code (MDC) • The Stack-Based Algorithm • Experimental Study • Conclusion

  14. Valuable LCA • Based on the homogenous / heterogenous concept • Given two nodes u, v, and w=LCA(u,v) uSet and vSet are two sets of nodes in the parths of wu and wv respectively. • If u and v having the same elementary type, they are homogenous (denoted u ~ v)

  15. Valuable LCA • Avoid the false positives and false negatives introduced by SLCA • Definition: Given m nodes n1,n2, … , nm, v=LCA(n1,n2, ... , nm). VLCA(n1,n2, ... ,nm) = v, iff, these m nodes are homogenous, that is, A 1 i < j m, ni~ nj.

  16. Valuable LCA Example heterogenous / homogenous: • Search for: {“XML”, “John”}

  17. Valuable LCA Example heterogenous / homogenous: • Search for: {“XML”, “John”} conf(2) heterogenous paper(23) homogenous

  18. Content • Introduction • Backgroundand Motivation • Valuable LCA • Meaningful Dewey Code (MDC) • The Stack-Based Algorithm • Experimental Study • Conclusion

  19. Meaningful Dewey Code (MDC) • Novel numbering scheme • Inspired form Dewey Code • Number/encode the nodes based on the corresponding DTD • Deduce ancestors and elementary types

  20. Meaningful Dewey Code (MDC) <!ELEMENT bib (conf)*> <!ELEMENT conf (name,year,paper*,chair)> <!ELEMENT paper (title,author+,bib?)> <!ELEMENT name (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT chair (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)>

  21. Meaningful Dewey Code (MDC) • Ɛ  Root Element • CnMDC of the node n • On ordered number of the node n • To encode a node: • author(0.2.1)

  22. Meaningful Dewey Code (MDC) • k  k-thlable • m  number of children in DTD of parent(n)

  23. Meaningful Dewey Code (MDC) MDC example • Given MDC = 0.6.1 • Level 0 (root) = bib m = 1 • Level 1 = conf m = 4 • Level 2 = paper m = 3 <!ELEMENT bib (conf)*> <!ELEMENT conf (name,year,paper*,chair)> <!ELEMENT paper (title,author+,bib?)> <!ELEMENT name (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT chair (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)>

  24. Meaningful Dewey Code (MDC) To check homogenous or heterogenous nodes • Proof. If u and v have the same elementary type, λ(u) = λ(v)  |{λ(u) ∩ λ(v)}|= 1 • Heterogenous |wSet| - |{λ(u) ∩ λ(v)}| > |lSet|  wSet = uSet ᴜ vSet, lSet = {λ(u)|u ϵwSet} • Check u(0.2.0) and v(0.6.4) • wSet{conf(0), paper(0.2), title(0.2.0), paper(0.6), author(0.6.4)} • |wSet|= 5, |lSet|= 4, and |{λ(u) ∩ λ(v)}|= 0

  25. Content • Introduction • Backgroundand Motivation • Valuable LCA • Meaningful Dewey Code (MDC) • The Stack-Based Algorithm • Experimental Study • Conclusion

  26. The Stack-Based Algorithm • VLCAStack to improve the search efficiency • Algorithm for structure join and twig join • Different from the existing studies (CVLCA)

  27. The Stack-Based Algorithm • Compact VLCA (CVLCA) • Is more compact than VLCA • Answer is more meaningful • Connected subtree rooted at CVLCA • Idea behind compact connected tree • Since node v is in a compact connected tree, it will not be in another looser one, which contain some other irrelevant nodes

  28. The Stack-Based Algorithm • Compact VLCA vs. SLCA • Example False negative problem of SLCA • Search for: {“XML”, “Bob”}

  29. The Stack-Based Algorithm • Compact VLCA vs SLCA • Example False negatives problem of SLCA • Search for: {“XML”, “Bob”} SLCAset = {paper(12)} CVLCAset ={paper(5), paper(12)}

  30. The Stack-Based Algorithm • VLCAStack • Input Elements are sorted in order by their MDCs • VLCAStack maintains another stack to preserve current LCAs

  31. The Stack-Based Algorithm • Example: Search for = {“XML”, “John”} • sVLCA is empty • nMin = 0.2.0

  32. The Stack-Based Algorithm • Example: Search for = {“XML”, “John”} • sVLCA = 0.2.0 • nMin = 0.6.4

  33. The Stack-Based Algorithm • Example: Search for = {“XML”, “John”} • sVLCA = 0.6.4 • nMin = 1.2.0

  34. The Stack-Based Algorithm • Example: Search for = {“XML”, “John”} • sVLCA = 0 • nMin = 1.2.0

  35. The Stack-Based Algorithm • Example: Search for = {“XML”, “John”} • sVLCA = 1.2.0 • nMin = 1.2.1

  36. The Stack-Based Algorithm • Example: Search for = {“XML”, “John”} • sVLCA = 1.2 • nMin is empty

  37. The Stack-Based Algorithm • Example: Search for = {“XML”, “John”} • Answer of the keyword query = {(paper(1.2);title:XML(1.2.0);author:John(1.2.1))}

  38. Content • Introduction • Backgroundand Motivation • Valuable LCA • Meaningful Dewey Code (MDC) • The Stack-Based Algorithm • Experimental Study • Conclusion

  39. Experimental Study • Efficiency and Effectiveness Test • Datasets • Real Dataset: DBLP, SIGMOD Record, TreeBank • Synthetic Dataset: XMark • Tested Methods • Brute-Force • XSEarch • SLCA • GDMCT

  40. Experimental Study • Efficiency

  41. Experimental Study • Effectiveness • Precision • Recall • F-measure

  42. Conclusion • Demonstration of the problems of keyword search over XML documents • Proposed VLCA and CVLCA to obtain meaningful results of keyword queries • Present an optimization technique to compute CVLCAs and devise an efficient stack-based algorithm to identify meaningful compact connected trees

  43. Thank you for your attention

More Related