280 likes | 375 Views
Integrating term dependencies according to their utility. Jian-Yun Nie University of Montreal. Need for term dependency. The meaning of a term often depends on other terms used in the same context Term dependency E.g. computer architecture, hot dog, …
E N D
Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal
Need for term dependency • The meaning of a term often depends on other terms used in the same context • Term dependency • E.g. computer architecture, hot dog, … • Unigram model is unable to capture term dependency • hot + dog ≠ "hot dog" • Dependency: a group of terms (a pair of terms)
Previous approaches • Phrase + unigram • 2 representations: phrase model and unigram model • Interpolation (each model with a fixed weight) • Assumption: phrases represent useful dependencies between terms for IR • E.g. Q = the price of hot dog • PUnigram: price, hot, dog • PPhrase: price, hot_dog • P(price hot dog|D) = aPphrase(price hot dog|D)+ (1-a)PUnigram(price hot dog|D) • or score = a scorephrase + (1-a) scoreunigram • Effect: documents with the phrase “hot dog” have a higher score
Dependency model • Dependency language model (Gao et al. 2005) • Determine the strongest dependencies among query terms (a parsing process): • price hot dog • The determined dependencies define an additional requirement for documents: • Documents have to contain the unigrams • Documents have to contain the required dependencies • The two criteria are linearly interpolated
Markov Random Field (MRF) (Metzler&Croft) • Sequential Full • Potential function • Sequential model: Interpolation of unigram model, ordered bigram and unordered bigram
Limitations • The importance of a (type of) dependency is fixed in the combined model in the same way for all the queries • A fixed weight is assigned to each component model • price-dog is as important as hot-dog (dependency model) • price-hot is as important as hot-dog (MRF) in the ordered model • Are they equally strong dependencies? • hot-dog > price-dog, price-hot • Intuition: a stronger dependency forms a stronger constraint
Limitations • Can a phrase model solve this problem? • Some phrases form a semantically stronger dependency than some others • hot-dog > cute-dog • Sony digital-camera > Sony-digital camera, Sony-camera digital • Is a semantically stronger dependency more useful for IR? • Not necessarily • digital-camera could be less useful than Sony-camera • The importance of a dependency in IR depends on its usefulness to retrieve better documents.
Limitations • MRF sequential model • Only consider consecutive pairs of terms • No dependency between distant terms • Sony digital camera: Sony-digital, digital-camera • Full model • Can cover long distance dependencies • But large increase in complexity
Proximity: more flexible dependency • Tao&Zhai, 2007 • Zhao&Yun 2009 • ProxB(wi): proximity centrality • Min/average/sum dist. to the other query terms • However, l is still fixed.
A recent extension to MRF model • Bendersky, Metzler, Croft, 2010 • Weighted dependencies • wjuni and wjbi: the importance of different features • gjuniand gjbi: the weight of each unigram and bigram according to its utility • However • foand fu are mixed up • Only consider dependency between pairs of adjacent terms
Go further • Using discriminative model instead of MRF • Can consider dependencies between more distant terms, without having the exponential complexity growth • We only consider pair-wise dependencies • Assumption: pair-wise dependencies capture the most important part of dependencies • Consider several types of dependencies between query terms • Ordered bigram • Unordered pair of terms within some distance (2, 4, 8, 16) • Dependencies at different distances have different strengths • Co-occurrence dependency ~ variable proximity
General discriminative model • Breaking down each component model to consider the strength/usefulness of a term dependency • lU, lB, lCw: importance of a unigram, a bigram and a co-occurrence pair within distance w in documents
An example • corporate pension plans funds .07 .60 .80 .70 .50 .80 corporate pension plans funds .08 .60 .20 .35 .07 bico2 co4 co8 (co16 omitted)
Further development • Set lU at 1 and vary the other l • Features:
How to determine the usefulness of a bigram and a co-occurrence pair lB and lCw ?- Using a learning method based on some features- Cross-validation
Learning method • Parameters • Goal: • Ti: Training data • Rli: Document ranking using the parameters • E: measure of effectiveness (MAP) • Training data: • {xi, zi} a bigram or a pair of term within distance w and its best value for the query • Finding the best value by coordinate-level ascendent search • Epsilon SVM with radial basis kernel function
Analysis • Some intuitively strong dependencies should not be considered as important in the retrieval process • Disk1-query 088:“crude oil price trends” • Ideal weights (bi,co2,4,8,16)=0, AP=0.103 • leant bi=0.2, co2..16=0, AP=0.060 • Disk1-query 003: “joint ventures” • Ideal weights (bi,co2,4,8,16)=0, AP=0.086 • leant bi=0.07,co2..16=0, AP=0.084 • Disk1-query 094: “computer aided crime” • Ideal weights (bi,co2,4,8,16) =0, AP=0.223 • leant bi=0.3, co2..16=0, AP=.158
Analysis • Some intuitively weakly connected words should be considered as strong dependencies: • Disk1-query184: “corporate pension plans funds” • Ideal wt.bi=0.5, co2=0.7, co4=0.2, AP=0.253 • Learnt wt.bi=0.2,co8=0.01, co16=0.001, AP=0.201 (Uni=0.131) • Disk1-query115: “impact 1986 immigration law” • Ideal wt.co2=0.1, co4=0.35, co8=0.05, AP=0.511 • Learnt wt.bi=0, co16=0.01, AP=0.492 (Uni=0.437)
Disk1-query115:“impact 1986 immigration law” .10 .01 Ideal AP =0.511, uni=0.437, learnt=0.492 .03 impact 1986 immigr. law .35 .01 .01 .05 .35 bico2 co4 co8 (co16 omitted)
Disk1-query184:“corporate pension plans funds” • AP ideal=0.253, uni=0.132, learnt=0.201 .07 .60 .80 .70 .50 .80 corporate pension plans funds .08 bico2 co4 co8 (co16 omitted) .60 .20 .35 .07
Typical case 1: weak bigram dependency, weak co-occurrence dependency
Typical case 3: Weak bigram dependency, strong co-occurrence dependency
Conclusions • Different types of dependency between query terms to be considered • They have variable importance/usefulness for IR, and should be integrated in IR model with different weights. • Not necessarily correlate with semantic dependency • The new model is better than the existing models in most cases (stat. significance in some cases)