An Information Retrieval Approach based on Discourse Type

NLDB 2006 An Information Retrieval Approach based on Discourse Type Department of Computing The Hong Kong Polytechnic University 1Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2Department of Computer Science City University of New York D. Y. Wang, R. W. P. Luk, K.F. Wong1 and K.L. Kwok2 DY Wang @ 2006

Content • Introduction • Motivation • Discourse Type • Information Unit • Problem Formulation • Score of topic terms • Score of discourse type • Document Re-ranking • Experimental Results • Conclusion DY Wang @ 2006

Motivation • The effectiveness of information retrieval (IR) systems varies substantially from one topic to another. • One reason: Users’ Information need is very diverse • Our approach: finding the discourse type of the topic and adopt appropriate strategy DY Wang @ 2006

Discourse Type • Definition of discourse type: The functions (including properties and relations that cannot exist independently) of the independent entities DY Wang @ 2006

Performance Difference Average =0.2768 DY Wang @ 2006

Why Choose “Advantage / Disadvantage” as our example? • Its performance is worse than the average • 0.204 v.s. 0.277 • It is relatively abstract and therefore it is unlikely to be investigated before. • Compared with concrete things (e.g. people, country) • It is related to some cue phrases (e.g., “more than”) that are composed of stop words. • Conventional IR ignores stop words DY Wang @ 2006

Why Choose “Advantage / Disadvantage” as example? (cont.) • It is a popular discourse type of information need. • we found that there are at least 40 questions that are asking about advantages and disadvantages of something at a website (http://www.answerbag.com). • It has a reasonable amount (i.e., eight) of TREC topics for investigation • See next slide DY Wang @ 2006

Eight Queries with discourse type Advantage / Disadvantage DY Wang @ 2006

Information Unit (IU) w words w words t A document …………........................ term1........................ ……………............................................................. ……………................................... term2................. ……………...... term1.............................................. DY Wang @ 2006

Why IU? • Assumption: terms inside an IU (around topic terms) are more important to relevance of document than the terms outside the IU • Simplify the processing of the documents • Compute score for each IU • Aggregate the scores of all IU as the score of the document DY Wang @ 2006

Score of Topic Terms • sumtf = 4 • Dtf = 3 (d: distinct) Graph-based Model: • atS3 = 1/1+1/5+1/3 • atS4 = 1/5+1/3 1 5 3 DY Wang @ 2006

Example: Score of Discourse Type • more (comparative words)=3 support=[' back ',' confirm ',' contest ',' contrari ',' defend ',' encourag ',' endors ',' object ',' oppon ',' oppos ',' opposit ',' prove ',' quibbl ',' refer ',' sponsor ',' support '] ( from www.answers.com ) • support=2 DY Wang @ 2006

Documents Re-ranking • IU score before re-ranking: S0 • S0: similarity score of the document that contains the IU • IU re-ranking score S’ • S’= S0* score of topic terms • S’= S0 * score of discourse type • S’= S0 * score of topic term* score of discourse type • Aggregate the re-ranking score of all IUs in a document as the final score of the document. • Re-rank the documents by the final score. DY Wang @ 2006

Re-ranking Results in MAP DY Wang @ 2006

Conclusion • Re-ranking based on topic terms and discourse type can both improve the retrieval performance. • Combining above two can improve the results most significantly (at 95% confidence level, already considering the sample size). • This approach is promising and is worth further investigation. Acknowledgement: We thank the Center for Intelligent Information Retrieval, University of Massachusetts, for facilitating Robert Luk to develop the basic IR system, when he was on leave there. This work is supported by the CERG Project # PolyU 5226/05E. DY Wang @ 2006

DY Wang @ 2006

An Information Retrieval Approach based on Discourse Type

An Information Retrieval Approach based on Discourse Type

Presentation Transcript

Text Based Information Retrieval - Text Mining

An Interactive Learning Approach to Optimizing Information Retrieval Systems

An Overview of Information Retrieval

An Active Learning Framework for Content-Based Information Retrieval

10.0 Speech-based Information Retrieval

An F-Measure for Context-Based Information Retrieval

PRAMEHA – AN APPROACH Based On Pathophysiological Mechanisms

Set-Based Model: A New Approach for Information Retrieval

Multifaceted Approach to Biomedical Information Retrieval

An Unsupervised Learning Approach to Content-Based Image Retrieval

A Discourse-based Information Retrieval Approach

Speech-based Information Retrieval

An information-pattern-based approach to novelty detection

Towards Compression-based Information Retrieval

An Overview of Information Retrieval

AN INFORMATION-BASED APPROACH TO CREDIT-RISK MODELLING

Evaluation of N-grams Conflation Approach in Text-based Information Retrieval

Information Retrieval for Evidence-based Practice

Model-based Feedback in the Language Modeling Approach to Information Retrieval