Compact Query Term Selection Using Topically Related Text

Compact Query Term SelectionUsing Topically Related Text K. TamsinMaxwell, W. Bruce Croft SIGIR 2013

Outline • Introduction • Related Work • Principle for Term Selection • PhRank Algorithm • Evaluation Framework • Experiments • Conlusion

Introduction • Recent query reformulation techniques usually uses pseudo relevant feecback in their approaches. But since they consider words which not in the original query, the expansion may include peripheral words and causes query drift • PhRank also uses PRF, but uses them for in-query term selection. Each indicate term include 1-3 words, and ranked with score which from a co-occurrence graph • Here we list advantages of PhRank • It’s the first method to use PRF for in-query term selection • Only small number of terms are selected, so it retaining the flexibility for more or longer terms if required • The affinity graph captures aspects of both syntactic and non-syntactic word associations

Related Work • Markov chain framework • The Markov chain framework uses the stationary distribution of a random walk over an affinity graph G to estimate the importance of vertices in the graph • A random walk describes a succession of random or semi-random steps between vertices and in • If we define transition probability between and as , and as affinity score of at time t, then is the sum of scores for each connect to

Related Work • Sometimes step to some that may be unconnected, so we often define a minimum probability , where is the number of vertices in then we uses a factor to control the balance between transition probability and minimum probability

Principle for Term Selection • For an informative word • Is informative relative to a query：a word should represent the meaning of query, but query usually doesn’t have enough information. PRF is used to enhancing a query representation • Is related to other informative words：The Association Hypothesis states that, “if one index term is good at discriminating relevant from non-relevant documents, then any closely associated index term is also likely to be good at this”. With a affinity graph, we can get the information above by estimate the number of word connects to a target word and the value

Principle for Term Selection • For a informative term • Contains informative words：We deduce all terms must contain informative words, so we consider individual words when ranking terms • Is discriminative in retrieval collection：A term that occurs many times within a small number of documents gives a pronounced relevance signal. So we weights terms with a normalized tf.idf inspired weight

The PhRank Algorithm • Graph construction • For a query, we first retrieve top documents. Then we define set as set of query itself and its relevant documents • Do stemming for documents in . Each unique word is now a vertex in graph • Edges between vertices and are connected if word and is adjacent in • Edge weights • Transition probability is based on linear combination of word and co-occur in window size of 2 and 10

The PhRank Algorithm • Edge weights are defined by is the probability of document in which word and co-occur given , and and is the count of co-occur in window 2 and 10 • is the style weight confirms importance between and in

The PhRank Algorithm • Random walk • A random walk of is proceed as we represent in related work • The edge weights are normalized to sum to one • The iteration stopped when the difference between any vertex dies not exceed 0.0001 • Vertex weights • The word are also weighted to exhaustiveness represent the query. Some words like “make ” would high score in affinity graph, but it is not more informative

The PhRank Algorithm • We define as factor to balance exhaustively with global saliency to identify stems that are poor discriminators been relevant and non-relevant documents • For a word , is the frequency of in , and is of in

The PhRank Algorithm • Term ranking • For a term , Factor represents the degree to which the term is discriminative in a collection. is defined by is the frequency of words in co-occur in 4*number of term window in collection, defined just like , and • Finally, the rank of a term for is defined as

The PhRank Algorithm • After finish the rank, we still have some terms that includes uninformative words. This is because we rank terms by the whole score, so some terms would contain the similar words and decrease the diversity • We apply a simple filtering with top-down constraints • For term , If a higher rank term contains all words in or contains all words in higher rank term, we discard

Evaluation Framework • Robustness • Compare with sequential dependence of Markov random field model. This model uses linear combine for query likelihood, 2 and 8 window sized bigram • Precision • The subset distribution model achieves high mean average precision • Succinctness • We use Key Concepts as the succinctness approach. This approach linear combined bag-of-words query representation and weighted bag-of-words query representation

Evaluation Framework • Word dependence • We refers four models of phrase belief as the figure

Experiments • We use Indri on Robust04, WT10G and GOV2 for evaluate • Feature analysis • Here we list the results of using the features in PhRank

Experiments

Experiments • Compare with other model

Conclusion • PhRank is a novel method to select succinct term within a query which works on Markov chain frameworks • Although the term is succinct, but its risky strategy and causes the decreasing of mAP compared with sequential dependence

Compact Query Term Selection Using Topically Related Text

Compact Query Term Selection Using Topically Related Text

Presentation Transcript

Typography -- Using Text

OPTIMAL TEXT SELECTION ALGORITHM

Using Abbreviated Text

Professional Sales Term Project THE COMPACT CAR

Interactively Co- segmentating Topically Related Images with Intelligent Scribble Guidance

Topically Applied Corticosteroids

Compact Query Term Selection Using Topically Related Text

Dynamic Sample Selection for Approximate Query Processing

The Min-dist Location Selection Query

Query Answering using Views

Quicklink Selection for Navigational Query Results

Text (Term 2)

Materialized View Selection and Maintenance using Multi-Query Optimization

Extension of disease-related pathway using text mining

Feature Selection on Chinese Text Classification Using Character N-grams

Term Selection

TQL (text query language)

Query Suggestions Using Query-Flow Graphs

Automatic Indexing (Term Selection)

Using Text Components

Quicklink Selection for Navigational Query Results