Extracting Query Facets From Search Results

Extracting Query Facets From Search Results Date:2013/08/20 Source:SIGIR’13 Authors : Weize Kong and James Allan Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang

Outline • Introduction • Approach • Experiment • Conclusion

What is query facet ? • Definition : query facet a set of coordinate terms ( terms that share a semantic relationship by being grouped under a relationship ) a query facet (Mars rovers)

What can we do with query facets ? • Flight type • Domestic • International • Travel Class • First • Business • Economy

Goal • Extract query facets from the top-k web search results D={, , … , }

Outline • Introduction • Approach • Step 1 : Extracting candidate lists • Step 2 : Finding query facets from candidate lists • Experiment • Conclusion

pattern-based semantic class extraction • Reference from : Z. Dou, S. Hu, Y. Luo, R. Song, and J.-R. Wen. Finding dimensions for queries. • For example : • There are many Mars rovers, such as Curiosity, Opportunity, and Spirit. • <ul> <li>first class</li> <li>business class</li> <li>economy class</li> </ul>

Candidate lists • All the list items are normalized by converting text to lowercase and removing non-alphanumeric characters. • Then, we remove stopwords and duplicate items in each lists. • Finally, we discard all lists that contain fewer than two item or more than 200 items. • The candidate lists are usually noisy, and could be non-relevant to the issued query. • To address this problem, we use a supervised method.

Note : What is Supervised Method Example : LA-100 LA-99(Training Data)

Note : What is Supervised Learning Training data (with features) Training Model Prediction New Data Model

Outline • Introduction • Approach • Step 1 : Extracting candidate lists • Step 2 : Finding query facets from candidate lists • Experiment • Conclusion

Problem Definition • Whether a list item is a facet term • Whether a pair of list items is in one query facet

Features

Graph

logistic-based conditional probability distributions

Parameter Estimation Maximizing the log-likelihood using gradient descent.

Inference • The training is finished. • The graphical model does not enforce the labeling to produce strict partitioning for facet terms. For example, when=1,=1, we may have = 0.

Rephrase the optimization problem The optimization target becomes , where is the set of all possible query facet sets that can be generated from L with the strict partitioning constraint. This optimization problem is NP-hard, which can be proved by a reduction from the Multiway Cut problem. Therefore, we propose two algorithms, QF-I and QF-J, to approximate the results.

QF-I Select list items with as facet terms.

QF-J

Ranking Query Facets • score for a query facet : • score for a facet term :

Outline • Introduction • Approach • Step 1 : Extracting candidate lists • Step 2 : Finding query facets from candidate lists • Experiment • Evaluation • Experiment Result • Conclusion

Data Using Top 10 query facets generated by different models.

Evaluation Metrics • Using “∗” to distinguish between system generated results and human labeled results, which we used as ground truth.

Clustering quality

Overall quality fp-nDCG is weighted by rp-nDCGis weighted by

Facet terms

Clustering facet terms

Overall

Conclusion • We developed a supervised method based on a graphical model to recognize query facets from the noisy facet candidate lists extracted from the top ranked search results. • We proposed two algorithms for approximate inference on the graphical model. • We designed a new evaluation metric for this task to combine recall and precision of facet terms with grouping quality. • Experimental results showed that the supervised method significantly outperforms other unsupervised methods, suggesting that query facet extraction can be effectively learned.

Extracting Query Facets From Search Results

Extracting Query Facets From Search Results

Presentation Transcript

Search Results

Search Results

FACETS

Extracting Videos from YouTube

Extracting structure from reactions

Extracting fact from fiction

Extracting Opinions from Reviews

Extracting Energy from Wind

Mining Query Subtopics from Search Log Data

Publishing Search Query logs

Extracting Tables from ERD

Intent Mining from Search Results

Search Query Disambiguation from Short Sessions

Improving web image search results using query-relative classifiers

Extracting Value from SOA

Predictive Caching and Prefetching of Query Results in Search Engines

Search Results

*Search results from clinicaltrials

Search Query Log Analysis

SEO (Getting Results from Organic Search)

Scrubbing Query Results from Probabilistic Databases

FACETS