140 likes | 256 Views
A Discourse-based Information Retrieval Approach. Guided Study Presentation. WANG Da Yu 22 Dec 2005. Motivation. IR’s Task: to fill the two gaps: Between query and documents Between information need and query. Documents in collection. information need in mind. Query.
E N D
A Discourse-based Information Retrieval Approach Guided Study Presentation WANG Da Yu 22 Dec 2005
Motivation • IR’s Task: to fill the two gaps: • Between query and documents • Between information need and query Documents in collection information need in mind Query
Query and Documents • Assume no gap, use the query directly. • Different IR models: Boolean model, VSM, probability model, 2-Poisson model, language model • Query is not adequate • query expansion: PRF, a lot of methods for term selection… • Query uses different words with collection • Interactive retrieval
Query and Information need • Shannon: information depends not only on message but also on receivers • Language model: words generated from mind • System-oriented and user-oriented relevance ( K. L. Maglaughlin and D. H. Sonnenwald 2002) • Between information space and cognitive space(G. B. Newby 2001)
TREC Queries • Long and structured queries can present information need better than short and unstructured ones. • TREC ad hoc queries (T, D, N parts) • We assume that we can obtain the information need from TREC ad hoc queries • Study information need based on 250 TREC queries
Concept of Discourse Discourse of information need: - Properties and relations that can not exist independently in the description of information need.
Discourse Performance Average =0.2768
8 queries in the category of Advantage/Disadvantage Category Example
Problems • Query need to contain “advantage” • Containing “advantage” as a query term is not enough because: • Not all text containing “advantage” talks about advantage. e.g. take advantage of • The text talking about advantage is not necessary to contain term “advantage” • E.g. “it contains no chemicals capable of triggering an adverse reaction from the body's immune system.”
Observation I • the space-frame structure has the advantage ofbeing able to maintain rigidity despite the adoption of materials less rigid than steel • The major advantage is that it can be manufactured domestically. The disadvantage is that its capacity is too low and the total installed capacity is also too low. • They held that the disadvantage of the plan is that it cannot solve all the problems at one fell swoop; nor can it demonstrate the superiority of socialism.
Observation II • Example text: The theoretical advantages of fusion are that it uses virtually inexhaustible raw materials (deuterium extracted from sea-water and tritium made inside the fusion reactor from the light metal lithium), it produces far less radioactive waste than fission and it is inherently safe because the reaction stops as soon as anything goes wrong.
Concept of IU • Assumption: terms around topic terms are more important to relevance of document than other terms elsewhere. • Definition of IU: Given a set of topic terms and a document, information units (IU) are the sliding windows that have 2w+1 words and the (w+1)-th word is one of the topic terms. wwords w words t Seq No: 1, 2, …………… w w+1 w+2,………………2w+1