10 likes | 155 Views
eXtract: A Snippet Generation System for XML Search. Yu Huang, Ziyang Liu, Yi Chen Arizona State University . http://eXtract.asu.edu. Motivation: . Good snippets help users to easily judge the relevance and find desired results. Problem: How to generate good snippets for XML search?.
E N D
eXtract: A Snippet Generation System for XML Search Yu Huang, Ziyang Liu, Yi Chen Arizona State University http://eXtract.asu.edu Motivation: Good snippets help users to easily judge the relevance and find desired results. Problem: How to generate good snippets for XML search? No existing work on XML snippet generation yet. Contributions: eXtract - the first system on snippet generation for XML search[Huang et al, SIGMOD ’08] Challenge: What are good snippets? Challenge: What information in result is significant to achieve the properties? Solution: Designed an algorithm to generate IList Solution: Identified desirable properties • Self-contained • Distinguishable • Representative • Small • The entities involved in the query result • Keys of the query result • Dominant features 0 Challenge: How to select instances in the result when generating a snippet to maximally cover IList within a size bound? Solution: Designed an efficient and effective algorithm that generates good snippets from IList • Defined Instance Selection Problem: how to select node instances in a query result to cover as many items in IList as possible in the ranked order to generate a snippet within a bound? • Theorem: The Instance Selection Problem is NP-hard. • Designed a greedy algorithm that generates good snippets efficiently. retailer apparel Texas Sample Query: Sample Snippet (of size 11) Find the apparel retailers in Texas. A Query Result retailer retailer Features and their occurrences entity: store: clothes: clothes: clothes: attribute: city: fitting: situation: category: value: occurrences Houston:2 Dallas: 1 men: 146 women: 101 children: 53 casual: 223 formal: 77 outwear: 116 suit: 92 pants: 43 shirts: 39 shorts: 10 … name product store store store name product store apparel Brook Brothers Brook Brothers apparel name state city merchandises state merchandises Texas Houston Galleria Bad Texas … clothes clothes clothes clothes clothes Good … situation category category fitting situation category situation fitting situation category fitting men casual outwear suit casual men casual outwear men formal suit Dominance score (DS): DS (Houston) = 2/(3/2) = 1.33, DS (children) = 53/(300/3) = 0.53 IList : Texas, apparel, retailer, store, Brook Brothers, outwear, suit, casual, men Keywords Related entities Key Dominant features Experiments: • Comparison of Google Desktop, Greedy (eXtract), Optimal algorithm for instance selection. • User study scores are 2.3, 3.9 and 4.2 out of 5, respectively. Quality Speed Precision Recall Time(s) 34th International Conference on Very Large Data Bases, August 23th-28th, 2008, Auckland, New Zealand