140 likes | 282 Views
What is this course about?. ProcessingIndexingRetrieving? textual dataFits in four lines, but much more complex and interesting than that. Need for IR. With the advance of WWW - more than 8 Billion documents indexed on Yahoo, GoogleVarious needs for information:Search for documents that f
E N D
1. Information Retrieval and Web Search Course overview
Instructor: Rada Mihalcea
2. What is this course about? Processing
Indexing
Retrieving
… textual data
Fits in four lines, but much more complex and interesting than that
3. Need for IR With the advance of WWW - more than 8 Billion documents indexed on Yahoo, Google
Various needs for information:
Search for documents that fall in a given topic
Search for a specific information
Search an answer to a question
Search for information in a different language
…
Search for images
Search for music
Search for a (candidate) friend
4. Some definitions of Information Retrieval (IR)
5. Examples of IR systems Conventional (library catalog)Search by keyword, title, author, etc. E.g. : You are probably familiar with www.library.unt.edu
Text-based (Lexis-Nexis, Google, FAST).Search by keywords. Limited search using queries in natural language.
Multimedia (QBIC, WebSeek, SaFe)Search by visual appearance (shapes, colors,… ).
Question answering systems (AskJeeves, Answerbus)Search in (restricted) natural language
Other:
cross language information retrieval, music retrieval
8. IR systems on the Web Search for Web pages http://www.google.com
Search for images http://www.picsearch.com
Search for image content http://wang14.ist.psu.edu/
Search for answers to questions http://www.askjeeves.com
Music retrieval http://www.rotorbrain.com/foote/musicr/
9. Course information Instructor: Rada Mihalcea
Contact info: F228, rada@cs.unt.edu
Teaching assistant: Veronica Pérez-Rosas
Class meets TTh, 12:30-1:50pm
Office hours
Instructor: Th 3:00-5:00pm
Teaching assistant: TBA
Any time electronically
10. Course resources Class webpage:
http://www.cs.unt.edu/~rada/CSCE5200
check periodically for updates, announcements, etc.
Textbook:
Introduction to Information Retrieval
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze
Recommended:
Readings in Information RetrievalK.Sparck Jones and P. Willett
Modern Information Retrieval
Ricardo Baeza-Yates and Berthier Ribeiro-Neto
11. Grading (tentative) Homeworks: 30%
Start early! Some may be time consuming
3 days late policy
Exam I: 20%
Exam II: 20%
Project: 25%
Class participation: 5%
No final – final is replaced by the project
12. Programming language All assignments / project will be in Perl
Makes life much much more easier for text processing problems and for Web based applications
Information Retrieval involves a lot of text processing, and often involves Web access
Code reusability
Code must run on the CSP Linux machines.
13. Tentative schedule Course Overview
Introduction to IR models and methods
Perl tutorial
Text analysis / Web spidering
Text properties
Boolean model
Vector-based model
Probabilistic model; other IR models
IR evaluation and IR test collections
Relevance feedback, query expansion
14. Tentative schedule Web search: link based and content based
Query-based and content sensitive link analysis
Search engine technologies
Text classification and clustering
Question Answering on offline and online collections
Cross Language IR
Personalized IR
Web 2.0: wikis, blogs, etc.
…. Exam I, Exam II, Project presentations