1 / 17

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Learn about Boolean and Ranked Retrieval Models along with insights from top sources in information retrieval. Explore the features, challenges, and success stories in this lecture.

tashae
Download Presentation

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INFORMATION RETRIEVAL TECHNIQUESBYDR. ADNAN ABID Lecture # 3 Boolean Retrieval Model Rank Retrieval Model

  2. ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the following sources • “Introduction to information retrieval” by PrabhakarRaghavan, Christopher D. Manning, and Hinrich Schütze • “Managing gigabytes” by Ian H. Witten, ‎Alistair Moffat, ‎Timothy C. Bell • “Modern information retrieval” by Baeza-Yates Ricardo, ‎ • “Web Information Retrieval” by Stefano Ceri, ‎Alessandro Bozzon, ‎Marco Brambilla

  3. Outline • Boolean Retrieval Model • Information Retrieval Ingredients • Westlaw • Ranked retrieval models

  4. Boolean Retrieval Model

  5. Boolean Retrieval Model D1 = {This is a pen} D2 = {It is a pen} Set (D1, D2) = {This, It, is, a, pen} Set = {a, b, c} = {b, a, c} bag = {a, a, b, c}

  6. Boolean queries • The Boolean retrieval model can answer any query that is a Boolean expression. • Boolean queries are queries that use AND, OR and NOT to join • query terms. • Views each document as a set of terms. • Is precise: Document matches condition or not. • Primary commercial retrieval tool for 3 decades • Many professional searchers (e.g., lawyers) still like Boolean queries. • You know exactly what you are getting. • Many search systems you use are also Boolean: spotlight, email, intranet etc.

  7. Information Retrieval Ingredients

  8. Information Retrieval Ingredients • Documents representation • Query formulation • Query processing

  9. Westlaw

  10. Commercially successful Boolean retrieval: Westlaw • Largest commercial legal search service in terms of the number of paying subscribers • Over half a million subscribers performing millions of searches a day over tens of terabytes of text data • The service was started in 1975. • In 2005, Boolean search (called “Terms and Connectors” by Westlaw) was still the default, and used by a large percentage of users . . . • . . . although ranked retrieval has been available since 1992.

  11. Westlaw: Example queries • Information need: Information on the legal theories involved in preventing the disclosure of trade secrets by employees formerly employed by a competing company • Query: “trade secret” /s disclos! /s prevent /s employe! • Information need: Requirements for disabled people to be able to access a workplace • Query: disab! /p access! /s work-site work-place (employment /3 place) • Information need: Cases about a host’s responsibility for drunk guests • Query: host! /p (responsib! liab!) /p (intoxicat! drunk!) /p guest

  12. Problem with Boolean search:feast or famine • Requires query writing skills • Boolean queries often result in either too few (=0) or too many (1000s) results. • It takes a lot of skill to come up with a query that produces a manageable number of hits. • AND gives too few; OR gives too many

  13. Ranked retrieval models

  14. Ranked retrieval models • Rather than a set of documents satisfying a query expression, in ranked retrieval, the system returns an ordering over the (top) documents in the collection for a query • Free text queries: Rather than a query language of operators and expressions, the user’s query is just one or more words in a human language • In principle, there are two separate choices here, but in practice, ranked retrieval has normally been associated with free text queries and vice versa

  15. Feast or famine: not a problem in ranked retrieval • When a system produces a ranked result set, large result sets are not an issue • Indeed, the size of the result set is not an issue • We just show the top k ( ≈ 10) results • We don’t overwhelm the user • Premise: the ranking algorithm works

  16. Scoring as the basis of ranked retrieval • We wish to return in order the documents most likely to be useful to the searcher • How can we rank-order the documents in the collection with respect to a query? • Assign a score – say in [0, 1] – to each document • This score measures how well document and query “match”.

  17. Resources • Chapter 1 of IIR • Resources at http://ifnlp.org/ir • Boolean Retrieval

More Related