1 / 16

Information Retrieval

Information Retrieval. Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow). Course Text. Modern Information Retrieval, R. Baeza-yates and B. Ribeiro-Neto., Addison-Wesley and ACM Press, 1999, ISBN: 0-201-39829-X. Introduction.

luigi
Download Presentation

Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

  2. Course Text • Modern Information Retrieval, • R. Baeza-yates and B. Ribeiro-Neto., • Addison-Wesley and ACM Press, 1999, • ISBN: 0-201-39829-X

  3. Introduction • Example of information need in the context of the world wide web: • “Find all documents containing information on computer courses which: (1) are offered by universities in South England, and (2) are accredited by the BCS/IEE bodies. To be relevant, the document must include information on admission requirements, and e-mail and phone number for contact purpose.” •  Information Retrieval

  4. Information Retrieval • Representation, storage, organisation, and access to information items • (Usually) keyword-based representation Information Need Query Documents Set of retrieved documents Useful or relevant information to the user Search Engine Retrieval System Primary goal of an IR system • “Retrieve all the documents which are relevant to a user query, while retrieving as few non-relevant documents as possible.”

  5. Pull technology User requests information in an interactive manner 3 retrieval tasks Browsing (hypertext) Retrieval (classical IR systems) Browsing and retrieval (modern digital libraries and web systems) Push technology automatic and permanent pushing of information to user software agents example: news service filtering (retrieval task) relevant information for later inspection by user User tasks

  6. Documents • Unit of retrieval • A passage of free text • composed of text, strings of characters from an alphabet • composed of natural language • newspaper article, a journal paper, a dictionary definition, email messages • size of documents • arbitrary • newspaper article vs. journal paper vs. email

  7. What is a document?

  8. Representation of documents • Set of index terms or keywords • extracted directly form text • specified by human subjects (information science)  metadata • Most concise representation • Poor quality of retrieval • Full text representation • Most complete representation • High computational cost • Large collections • Reduce set of representative keywords • Elimination of stop words • Stemming • Identification of noun phrases • Further compression • Structure representation • Chapter, section, sub-section, etc Document term descriptors to access texts • Generation of descriptors for text • By hand • By analysing the text

  9. The retrieval process Information need Documents Formulation Indexing Document representation Query Relevance feedback Retrievalfunctions Retrieved documents

  10. Queries User term descriptors characterising the user need • Information Need • Simple queries • composed of two or three, perhaps even dozens, of keywords • e.g., as in web retrieval • Boolean queries • “neural networks AND speech recognition” • Context Queries • Proximity search, phrase queries

  11. Best-Match Retrieval Document term descriptors to access texts • Compare the terms in a document and query • Compute similarity between each document in the collection and the query based on the terms that they have in common • Sorting the documents in order of decreasing similarity with the query • The outputs are a ranked list and displayed to the user - the top ones are more relevant as judged by the system User term descriptors characterising the user need

  12. Conceptual View of Text Retrieval Queries Similarity Computation Documents Retrieved Documents

  13. Expanded view of text retrieval system Queries Indexed Documents Documents Indexing Similarity Computation Retrieved Documents Ranked Documents

  14. Process of retrieving info User Interface Text User feedback User need Text Text Operations Logical view Logical view Document Repository Manager Query Operations Indexing Inverted file Query Similarity Computation Index Text repository Retrieved docs Ranked docs Ranking

  15. Key Topics • Indexing text documents • Retrieving text documents • Evaluation • Query reformulations Search Engines = IR + Link Structure + Name Interpretation

  16. Information Retrieval vs Information Extraction • Information Retrieval • Given a set of query terms and a set of document terms select only the most relevant documents [precision], and preferably all the relevant [recall]. • Information Extraction • Extract from the text what the document means. • IR systems can FIND documents but need not “understand” them

More Related