170 likes | 183 Views
Learn about Boolean and Ranked Retrieval Models along with insights from top sources in information retrieval. Explore the features, challenges, and success stories in this lecture.
E N D
INFORMATION RETRIEVAL TECHNIQUESBYDR. ADNAN ABID Lecture # 3 Boolean Retrieval Model Rank Retrieval Model
ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the following sources • “Introduction to information retrieval” by PrabhakarRaghavan, Christopher D. Manning, and Hinrich Schütze • “Managing gigabytes” by Ian H. Witten, Alistair Moffat, Timothy C. Bell • “Modern information retrieval” by Baeza-Yates Ricardo, • “Web Information Retrieval” by Stefano Ceri, Alessandro Bozzon, Marco Brambilla
Outline • Boolean Retrieval Model • Information Retrieval Ingredients • Westlaw • Ranked retrieval models
Boolean Retrieval Model D1 = {This is a pen} D2 = {It is a pen} Set (D1, D2) = {This, It, is, a, pen} Set = {a, b, c} = {b, a, c} bag = {a, a, b, c}
Boolean queries • The Boolean retrieval model can answer any query that is a Boolean expression. • Boolean queries are queries that use AND, OR and NOT to join • query terms. • Views each document as a set of terms. • Is precise: Document matches condition or not. • Primary commercial retrieval tool for 3 decades • Many professional searchers (e.g., lawyers) still like Boolean queries. • You know exactly what you are getting. • Many search systems you use are also Boolean: spotlight, email, intranet etc.
Information Retrieval Ingredients • Documents representation • Query formulation • Query processing
Commercially successful Boolean retrieval: Westlaw • Largest commercial legal search service in terms of the number of paying subscribers • Over half a million subscribers performing millions of searches a day over tens of terabytes of text data • The service was started in 1975. • In 2005, Boolean search (called “Terms and Connectors” by Westlaw) was still the default, and used by a large percentage of users . . . • . . . although ranked retrieval has been available since 1992.
Westlaw: Example queries • Information need: Information on the legal theories involved in preventing the disclosure of trade secrets by employees formerly employed by a competing company • Query: “trade secret” /s disclos! /s prevent /s employe! • Information need: Requirements for disabled people to be able to access a workplace • Query: disab! /p access! /s work-site work-place (employment /3 place) • Information need: Cases about a host’s responsibility for drunk guests • Query: host! /p (responsib! liab!) /p (intoxicat! drunk!) /p guest
Problem with Boolean search:feast or famine • Requires query writing skills • Boolean queries often result in either too few (=0) or too many (1000s) results. • It takes a lot of skill to come up with a query that produces a manageable number of hits. • AND gives too few; OR gives too many
Ranked retrieval models • Rather than a set of documents satisfying a query expression, in ranked retrieval, the system returns an ordering over the (top) documents in the collection for a query • Free text queries: Rather than a query language of operators and expressions, the user’s query is just one or more words in a human language • In principle, there are two separate choices here, but in practice, ranked retrieval has normally been associated with free text queries and vice versa
Feast or famine: not a problem in ranked retrieval • When a system produces a ranked result set, large result sets are not an issue • Indeed, the size of the result set is not an issue • We just show the top k ( ≈ 10) results • We don’t overwhelm the user • Premise: the ranking algorithm works
Scoring as the basis of ranked retrieval • We wish to return in order the documents most likely to be useful to the searcher • How can we rank-order the documents in the collection with respect to a query? • Assign a score – say in [0, 1] – to each document • This score measures how well document and query “match”.
Resources • Chapter 1 of IIR • Resources at http://ifnlp.org/ir • Boolean Retrieval