1 / 17

Processing XML Keyword Search by Constructing Effective Structured Queries

Processing XML Keyword Search by Constructing Effective Structured Queries. Jianxin Li , Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology, Australia. Outline. Motivation of Keyword Search in XML Brief Review of Related Work Existing Problems

ivy
Download Presentation

Processing XML Keyword Search by Constructing Effective Structured Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology, Australia

  2. Outline • Motivation of Keyword Search in XML • Brief Review of Related Work • Existing Problems • Construct Structured Query Templates • Ranking Function • Processing Algorithms • Conclusions

  3. Motivation of XML Keyword Search • Keyword search is easy-to-use • Users don’t need to know the structure of XML data and specific query languages. • The XML data with different structures can be searched equivalently by a keyword query because it doesn’t specify the structures of the retrieved results.

  4. Brief Review of Related Work • We focus on 4 references using label and term as keyword query format: • [YunyaoLi2004VLDB] Schema-Free XQuery. • [DanielaFlorescu2002ComputerNetworks] Integrating keyword search into XML query processing. • [SaraCohen2003VLDB] XSEarch: A semantic search engine for XML. • [WeidongYang2007CIT] Schema-aware keyword search over xml streams. • Other relevant work can be found in our paper.

  5. Brief Review of Related Work • All the four work utilized label and term as keyword query format. • The difference: the first three work shared the similar basic strategy that first retrieves the relevant keyword lists and then merges them into the results; while the last one first generate a big template that covers all the kinds of results w.r.t. XML schema and then cache the possible results over xml streams. Template-based strategy can obtain better performance[WeidongYang2007CIT]!

  6. Existing Problems • [WeidongYang2007CIT] was used to query over XML streams, which is not enough because of the challenges: • Different templates may exist in one XML data repository. • Users prefer to see part of the results, e.g., top k results. • Domain knowledge can be helped to process the labels with the same meaning. • Therefore, it is required to study the problem of applying template-based keyword search strategy to XML data repository.

  7. Construct Structured Query Templates • Example: There are two data sources that conform to t1 and t2 respectively. Schema t1 Schema t2 Keyword query – (year:2006, title:xml, author:philip)

  8. Construct Structured Query Templates • Identifying context of keywords • Determine master entities using labels in keyword query and XML schema. • Generate FOR clause for each entity. • Judge the occurrences of every label under each master entity. • Once a time – Generate WHERE clauses • More than once – First cluster and then generate WHERE clauses.

  9. Step 1: determine master entity and its corresponding label set • Q1 = “For $b in bibliography/books/book” • Q2 = “For $a in bibliography/articles/article” • Step 2: only one occurrence of each label in each master entity. • Q1 += “Where $b/year=‘2006’ and $b/title.contains(xml) and $b/author.contains(philip)” • Q2 += “Where $a/year=‘2006’ and $a/title.contains(xml) and $a/author.contains(philip)” Schema t1 Keyword query – (year:2006, title:xml, author:philip)

  10. Step 1: determine master entity and its corresponding label set • Q = “For $bi in bibliography/bib” • Step 2: only two occurrences of each label in the master entity. Cluster title and author using book and article respectively • Q1 += Q + “For $bo in $bi/book” • Q2 += Q + “For $a in $bi/article” • Step 3: only one occurrence of each label in each cluster. • Q1 += “Where $bi/year=‘2006’ and $bo/title.contains(xml) and $bo/author.contains(philip)” • Q2 … Schema t2 Keyword query – (year:2006, title:xml, author:philip)

  11. Construct Structured Query Templates • Identifying returned nodes • Step1: If the cardinality of a master entity satisfies “*” and no cluster operation is activated, we take the master entity as a return node in constructed queries; • Step 2: If the cardinality of a master entity satisfies “*” and clusters are generated, we first check the root node of each cluster in a recursive procedure (back to step 1); • Step 3: If the cardinality of a master entity does not satisfy “*”, we will probe its ancestor nodes one by one until this kind of node exists or the root of the xml schema.

  12. Schema t1 Schema t2 • Master entities are the returned nodes. • Q1 += “$b” • Q2 += “$a” • Roots of clusters are the returned nodes. • Q1 += “$bo” • Q2 += “$a” The constructed queries can be read in our paper! Keyword query – (year:2006, title:xml, author:philip)

  13. Ranking Function • vm is the master entity nodes; • ω(vi, ti) is calculated by using tf*idf weight model. Feature of the function: The Score() consists of two parts ContextScore() and tf*idf weight, and the former is the upper bound of the score of the results.

  14. Processing Strategy • Algorithm 1 is used to generate structured queries with their corresponding context score. • Algorithm 2 is used to schedule the query plan according to the conditions: • Users’ requirements, e.g., number of results; • Context scores of all generated queries; • And the intermediate results.

  15. Experiments • Dataset: • Sigmod record • three variant of DBLP • Keyword Queries: • q1 (author:David, title:XML) • q2 (year:2002, title:XML)

  16. Experimental Results q1 q2 q2(k = 20) q1(k = 10)

  17. Conclusions • XBridge is proposed to process keyword query over XML data repository, which can efficiently find the top k results by evaluating generated structured queries. • A precise ranking function is provided to evaluate the relevance of the results. • Limitation of this work: • We take XML schema as tree patterns; • We didn’t consider reference relationships of XML data.

More Related