110 likes | 205 Views
Supporting Effective Access through User- and Topic-Based Language Models. Who we are…. =. +. Specifics…. People Nick Belkin , Principle Investigator, Rutgers Bruce Croft & James Allan , Principle Investigators, U Mass Diane Kelly, Graduate Assistant, Rutgers
E N D
Supporting Effective Access through User- and Topic-Based Language Models
Who we are… = +
Specifics… • People • Nick Belkin, Principle Investigator, Rutgers • Bruce Croft & James Allan, Principle Investigators, U Mass • Diane Kelly, Graduate Assistant, Rutgers • Jeong Hyun (Annie) Kim, Yang-woo Kim, Hyuk-Jin Lee, Anne Washington,Rutgers • Funding • NSF #99-11942, Rutgers awarded $243,771 for a 3-year period
The Problem • Information retrieval and filtering tools make mistakes that are obvious and aggravating to users. • Queries are often totally misinterpreted by search tools and, even when relevant documents are found, they are usually mixed with many others that are totally unrelated. • WHY? Lack of adequate models of the user and of the domain.
General Solution • Language Models • Assume that associated with every document or group of documents there are one or more probability distributions that model how the text in the document can be generated. • Can be viewed as models of the important topics that are covered by a document or a group of documents.
Have the potential for representing topics of interest to a user or a group of users. • Assume that different types of information problems are expressed in language in different ways. These different ways could be characterized by language models understood as models of the user.
Types of Language Models • Topic-based • Identify characteristics of the language that is used in a particular domain or topic. • User-based • Identify characteristics of the language that a user or a group of users associate with a particular domain or topic.
An Example • Consider the following information needs that indicate a user’s interest in Java: • I want to find the best and least expensive way to have a 2 week holiday in Java. • I need to know where Java is. • I need to write a brief survey of the current political situation in Java.
I need to learn the programming language Java. • I want to buy some dark roast Java coffee. • Consider the following premise: Documents that are relevant to each of these kinds of queries will have different language characteristics associated with them.
Our Goal • To identify the language characteristics of each of these types of needs (user) • To identify the language characteristics of the groups of documents that satisfy each of these types of needs (topic) • To match these needs and document clusters accordingly