Recognition and Classification of Noun Phrases in Queries for Effective Retrieval

Recognition and Classification ofNoun Phrases in Queries for Effective Retrieval Wei Zhang1 Shuang Liu2 Clement Yu1 wzhang@cs.uic.edu shuang.liu@ask.com yu@cs.uic.edu Chaojing Sun3 Fang Liu4 Weiyi Meng5 chaojing@gmail.com fangliu@microsoft.com meng@cs.binghamton.edu 1 Department of Computer Science, University of Illinois at Chicago 2 Ask.com 3 Broadcom Corporation 4 Microsoft 5 Department of Computer Science, Binghamton University CIKM 2007 1

Outline • Motivation • Our definitions of the phrases • Proper noun and dictionary phrase recognition • Simple and complex phrase recognition • Experimental results CIKM 2007 2

Motivation • Terms in a query are related semantically • “John Smith” • Recognize this relationship • Partition the query terms to groups (phrases) • Document retrieval using phrases • Adding phrases into searching and ranking

Types of Noun Phrases • Phrases that have fixed writing formats • Names of Locations, people, companies, … • Well defined concepts. E.g. “computer science” • Freely written phrases • Not formally defined but used in the real language

Four Types of Noun Phrases • Proper Noun (PN) • A noun phrase that names a specific person, place or thing. • First letters of the content words are capitalized • E.g. “John Smith”, “Atlantic Ocean” • Dictionary Phrase (DP) • A phrase that has a definition in a dictionary, excluding PN • These two types may overlap • “Atlantic Ocean” • They can not replace each other • E.g. “Lina’s Pizza”, “public transportation”

Four Types of Noun Phrases • Simple Noun Phrase (SNP) • A grammatically valid noun phrase other than PN and DP • 2 words • E.g. “white car”, “good hotel” • Complex Noun Phrase (CNP) • A grammatically valid noun phrase other than PN and DP • 3 or more words • May contain PN/DP/SNP • E.g. “small white car”, “city public transportation”

Noun Phrase Recognition • General procedure • Recognize PN and dictionary phrases first • Then simple and complex noun phrases • A n-word query • Check the original query • Check the 2 (n-1)-term arrays • … • Check the (n-1) 2-term arrays • Totally n*(n-1)/2 candidates • E.g. “World Trade Organization” • “World Trade” and “Trade Organization”

Noun Phrase Recognition • Tools for phrase recognition • Dictionaries (Wikipedia, WordNet) • Large text corpus (Google for experiments) • Parsers (Minipar, Collins parser) and POS tagger

PN and DP Recognition • Wekipedia • For proper nouns and dictionary phrases • DP: existence of the entry page • PN: content words in the first instance of the phrase in the main text should be capitalized

PN and DP Recognition • WordNet • For PN and DP recognition • DP: defined in a dictionary • PN: has a hypernym of city, province, country, organization, geographic area, person, syndrome, region, building, or nation.

PN and DP Recognition • Minipar • For PN recognition only • “PN” label in the parse tree • Semantic label of person, country, corpname, location, corpdesig, fname, gname, or date

PN and DP Recognition • List of first names, last names and rules • First_initial last_name • First_initial mid_initial last_name • First_name middle_initial last_name • First_name last_name

PN and DP Recognition • Text corpus • For less well-known PNs • Three instances, first letters of the content words capitalized • Not a sub-phrase of a longer PN • “if you choose windows by Vista Window Company, …” • “if you choose windows by Super Vista Window Company, …”

PN and DP Recognition • Overlapped phrases • Search all words together • Count the instances of each phrase in the returned documents • e.g. “Native American Casino” • “Native American” and “American Casino” • Compare ( Count(“Native American”), Count(“American Casino”) )

SNP and CNP Recognition • Only check the phrase candidates that • are not sub-phrases of a recognized PN/DP • do not overlap with a recognized PN/DP

SNP and CNP Recognition • Implicit phrases • “and” / “or” • “main and contributing factor”  • “main factor” • “contributing factor”

SNP and CNP Recognition • Head word replacement • Replace the whole phrase by its head word • Collins parser • Label the noun phrases NP/sedan(head word) NP/sedan(head word) Best/JJS Compact/JJ Sedan/NN

SNP and CNP Recognition • Phrase verification • To verify that a phrase is used in the world • For CNP: it also means to find all the words in a text window • “Colin Farrell wallpaper” and “wallpaper of Colin Farrell”

SNP and CNP Recognition • Overlapped phrases • Two potential SNP/CNP: Search all words, compare the numbers of the instances. • “sony dvd handyam”  “sony dvd” and “dvd handycam”

Document Retrieval Using Phrases • Search a phrase in a document • Exact match: PN/DP • Search all words in a text window: SNP/CNP

Document Retrieval Using Phrases • Sim(Query, Doc) = <Sim_P, Sim_T> • Phrase similarity • Sim_P(P_i) = idf(P_i) • Sim_P = sum ( sim_P(P_i) ) • Term similarity • Okapi/BM-25 similarity • Document ranking • D1 is ranked higher than D2, if • (Sim_P1>Sim_P2) OR (P1=P2 AND T1>T2)

Experimental Results • Phrase recognition experiments • Tuned by using TREC queries

Experimental Results • Phrase recognition experiments • Tested by using Web queries

Experimental Results • Performance of individual tools • Wikipedia is better than WordNet and Minipar • Need for a complete dictionary • Collins parser alone is not enough for SNP/CNP recognition • Lack of real world usage information

Experimental Results • Document retrieval experiments • Ad-hoc TREC 6, 7 and 8, robust TREC 12, 13 and 14 • Retrieval without using phrases • Using Wikipedia for PN/DP and just collins parser for SNP/CNP • Using phrases from the full recognition algorithm • 33% MAP increase and 44.27% GMAP increase from 1 to 2 • 5.8% MAP increase and 12.58% GMAP increase from 2 to 3

Conclusions • Our algorithm can effectively recognize the four types of phrases in the short Web queries • The recognized phrases help improve the retrieval effectiveness

Questions? • wzhang@cs.uic.edu • http://www.cs.uic.edu/~wzhang/

Recognition and Classification of Noun Phrases in Queries for Effective Retrieval

Recognition and Classification of Noun Phrases in Queries for Effective Retrieval

Presentation Transcript

Complex noun phrases

Audio Segmentation, Classification, and Retrieval

Scoping and the Interpretation of Noun Phrases

Scoping and the Interpretation of Noun Phrases

Recognition Using Visual Phrases

Personal Name Classification in Web queries

Face Recognition and Retrieval in Video

3D Visual Phrases for Landmark Recognition

Unit 9 Modifying Noun Phrases

Noun Phrases in Chinese and English

Recognition and Classification of Noun Phrases in Queries for Effective Retrieval

Noun phrases – See 3.3.1

ORGANISATION OF DATA FOR EFFECTIVE RETRIEVAL

AUTOMATIC CLASSIFICATION AND RECOGNITION OF SHOEPRINTS

Noun Phrases

WEAK NOUN PHRASES: SEMANTICS AND SYNTAX

Types of Noun Phrases Referential vs. Quantified NPs

Lesson 27 Noun phrases between commas

Disambiguating Queries for Geographic Information Retrieval

Presentation on Noun Phrases

Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments

Noun and noun Phrases