180 likes | 274 Views
Exploiting Semantics with Structured Queries. Jose Ramón Pérez-Agüera & Hugo Zaragoza U. Complutense de Madrid Yahoo! Research (Barcelona). Query expansion makes term independance a big issue… we are double counting “meanings” !!!.
E N D
Exploiting Semantics with Structured Queries Jose Ramón Pérez-Agüera & Hugo Zaragoza U. Complutense de Madrid Yahoo! Research (Barcelona) CLEF 2008
Query expansion makes term independance a big issue… we are double counting “meanings” !!! CLEF 2008
Term independance assumption gets worse with query expansion… (example 1) Verde que te quiero verde.Verde viento. Verdes ramas.El barco sobre la mary el caballo en la montaña.Con la sombra en la cinturaella sueña en su baranda verde carne, peloverde, con ojos de fría plata. Bajo la luna gitana, las cosas la están mirando y ella no puede mirarlas. […] verde3 que te quiero verde2.verde3 viento. verde1 ramas.El barco sobre la mary el caballo en la montaña.Con la sombra en la cinturaella sueña en su baranda verde5 carne, peloverde1, con ojos de fría plata. Bajo la luna gitana, las cosas la están mirando y ella no puede mirarlas. […] q:verdepelo [CLEF EFE94, 2001 Spanish topics] q1:verde1pelo q2:verde1 verde2 pelo q2:verde1 verde2 verde3 verde4 verde5 pelo CLEF 2008
Term independance assumption gets worse with query expansion… (example 2) - 46% !!! [CLEF EFE94, 2001 Spanish topics] [Pérez-Agüera , Zaragoza and Araujo, NLDB 2008] CLEF 2008
Term independance assumption gets worse with query expansion… (example 3) • BM25 dependance model: tf = 1 2 3 4 … 10 CLEF 2008
Query Expansion (example of state of the art) • Term Selection: • Divergence From Randomness Expansion Model (DFR) Bo1 Model [8,6]: • Term Weighting: • Rochio [9]: P(term) tf in top x=1 document top 40 terms document • Perf. Prediction: • AvICTF [5] (cheap) > 9.0 0.3 CLEF 2008
Results in CLEF 2008 Robust-WSD Task: • Standard Query Expansion: • 3rd team in CLEF Robust out of 8. 1st team well ahead of everyone. • It seems no one improved GMAP so they reported MAP CLEF 2008
Query expansion makes term independance a big issue… we are double counting “meanings” !!! CLEF 2008
Query Clauses Idea: “Cheap Barcelona Italian Restaurants” {cheap, barcelona, italian, restaurant } Expansion: {cheap, barcelona, italian, restaurant, inexpensive, affordable, Sagrada Familia, Ramblas, Gràcia, Barceloneta, pizzeria, trattoria, café } Strcuture: collect related meanings in clauses { {cheap, inexpensive, affordable}, {Barcelona, Sagrada Familia, Ramblas, Gràcia, Barceloneta, …}, {Italian_restaurant, pizzeria, trattoria, café} } c1 c2 c3 Clause independance, not term independance CLEF 2008
Query Clauses Idea term 1 term 2 term e1 CLEF 2008
Query Clauses Idea c1 term1 term e1 term2 c2 term e2 term e3 c3 term e4 (same idea as BM25-F on fields [10]) CLEF 2008
Example: Matrix notation: let , then redefine each document as Query Clauses Model Bag of words: Query clauses : (bag of bags of weighted words): CLEF 2008
Query Clauses Implementation of W1 and W2 In general projection is query-dependent and needs to be done online: clause term frequency: clause collection frequency: clause document likelihood: clause collection lihelihood: CLEF 2008
Query Clauses Implementation of W1 and W2 IDF is not straight-forward, there are several possibilities: Some possibilities: • min, max, avg (leads to inconsistent situations for small weights) • expected clause idf: CLEF 2008
How can we construct the clauses? • Idea: use WordNet to expand each term in the query as a clause. • Idea: use statistical methods to expand each term in the query. • Idea: use query expansion to find terms, use statistical methods to group the, into clauses. • Idea: use query expansion to find terms, use WordNet to group them into clauses. • There exist several semantic similarity measures based on WordNet [11]: WN(s1,s2) • We construct a clause for every original query term, and we add to it expanded terms with: WN(s1,s2) < k • To be conservative, all terms not in an original clause are added together to a new “Other” clause. CLEF 2008
Results in CLEF 2008 Robust-WSD Task: • Implementation: DFR Expansion: 40 new terms extracted for each query. Query Clauses: Ranking: BM25 with standard params, on clauses: WordNet Similarity DFR Query Clauses CLEF 2008
Results in CLEF 2008 Robust-WSD Task: clauses 4% rel. impr. • 2nd team in CLEF Robust, 1st team well ahead without use of WSD. (overall results) CLEF 2008
Biblio [10] H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Text REtrieval Conference (TREC-13), 2004. [11]Z. Wu and M. Palmer, Verb semantics and lexical selection, 32nd. Annual Meeting of the Association for Computational Linguistics, ACL 1991. CLEF 2008