1 / 14

Information Retrieval and Web Search

What is this course about?. ProcessingIndexingRetrieving? textual dataFits in four lines, but much more complex and interesting than that. Need for IR. With the advance of WWW - more than 8 Billion documents indexed on Yahoo, GoogleVarious needs for information:Search for documents that f

clyde
Download Presentation

Information Retrieval and Web Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea

    2. What is this course about? Processing Indexing Retrieving … textual data Fits in four lines, but much more complex and interesting than that

    3. Need for IR With the advance of WWW - more than 8 Billion documents indexed on Yahoo, Google Various needs for information: Search for documents that fall in a given topic Search for a specific information Search an answer to a question Search for information in a different language … Search for images Search for music Search for a (candidate) friend

    4. Some definitions of Information Retrieval (IR)

    5. Examples of IR systems Conventional (library catalog) Search by keyword, title, author, etc. E.g. : You are probably familiar with www.library.unt.edu Text-based (Lexis-Nexis, Google, FAST). Search by keywords. Limited search using queries in natural language. Multimedia (QBIC, WebSeek, SaFe) Search by visual appearance (shapes, colors,… ). Question answering systems (AskJeeves, Answerbus) Search in (restricted) natural language Other: cross language information retrieval, music retrieval

    8. IR systems on the Web Search for Web pages http://www.google.com Search for images http://www.picsearch.com Search for image content http://wang14.ist.psu.edu/ Search for answers to questions http://www.askjeeves.com Music retrieval http://www.rotorbrain.com/foote/musicr/

    9. Course information Instructor: Rada Mihalcea Contact info: F228, rada@cs.unt.edu Teaching assistant: Veronica Pérez-Rosas Class meets TTh, 12:30-1:50pm Office hours Instructor: Th 3:00-5:00pm Teaching assistant: TBA Any time electronically

    10. Course resources Class webpage: http://www.cs.unt.edu/~rada/CSCE5200 check periodically for updates, announcements, etc. Textbook: Introduction to Information Retrieval Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze Recommended: Readings in Information Retrieval K.Sparck Jones and P. Willett Modern Information Retrieval Ricardo Baeza-Yates and Berthier Ribeiro-Neto

    11. Grading (tentative) Homeworks: 30% Start early! Some may be time consuming 3 days late policy Exam I: 20% Exam II: 20% Project: 25% Class participation: 5% No final – final is replaced by the project

    12. Programming language All assignments / project will be in Perl Makes life much much more easier for text processing problems and for Web based applications Information Retrieval involves a lot of text processing, and often involves Web access Code reusability Code must run on the CSP Linux machines.

    13. Tentative schedule Course Overview Introduction to IR models and methods Perl tutorial Text analysis / Web spidering Text properties Boolean model Vector-based model Probabilistic model; other IR models IR evaluation and IR test collections Relevance feedback, query expansion

    14. Tentative schedule Web search: link based and content based Query-based and content sensitive link analysis Search engine technologies Text classification and clustering Question Answering on offline and online collections Cross Language IR Personalized IR Web 2.0: wikis, blogs, etc. …. Exam I, Exam II, Project presentations

More Related