330 likes | 575 Views
Course Overview: An Introduction to Information Retrieval and Applications. J. H. Wang Feb. 23, 2011. Instructor & TA. Instructor J. H. Wang ( 王正豪 ) Assistant Professor, CSIE, NTUT Office: R1534, Technology Building E-mail: jhwang@csie.ntut.edu.tw Tel: ext. 4238
E N D
Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 23, 2011
Instructor & TA • Instructor • J. H. Wang (王正豪) • Assistant Professor, CSIE, NTUT • Office: R1534, Technology Building • E-mail: jhwang@csie.ntut.edu.tw • Tel: ext. 4238 • Office Hour: 10:00-12:00 am, every Wednesday and Thursday • TA • Mr. Lin (林承翰): 2011.ir.ta@gmail.com • R1424, Technology Building NTUT CSIE
Course Description • Course Web Page • http://www.ntut.edu.tw/~jhwang/IR/ • Time: 13:10-16:00pm, Wed. • Classroom: R327, 6th Teaching Building • Textbook: • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze, Introduction to Information Retrieval, Cambridge University Press, 2008. • Available online • International Student Edition, imported by Kai-Fa (開發) Publishing • Prerequisites: • Basic knowledge of data structures and algorithms, linear algebra, and probability theory • Programming experience is necessary for projects NTUT CSIE
Additional References • References: • Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology behind Search, Addison-Wesley, 2011. • This is the second edition of their book Modern Information Retrieval in 1999. (華通) • Stefan Buettcher, Charles L.A. Clarke, and Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010. • Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley, 2010. (全華) NTUT CSIE
More Books on IR • Gerald Salton, Automatic information organization and retrieval, McGraw-Hill, 1968. • Gerald Salton and M.J. McGill, Introduction to modern information retrieval, McGraw-Hill, 1983. • Two classics, but out-of-print. • C. J. van Rijsbergen, Information Retrieval, Butterworths, 1979. • The classic. More than 40 years old, but still worth reading. • K. Sparck Jones, P. Willett, Readings in Information Retrieval, Morgan Kaufmann, 1997. • A collection of classical IR papers. (out of print) • I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann, Managing Gigabytes, 1999. • The authority on index construction and compression. NTUT CSIE
Grading Policy • Homework assignments and programming exercises: 40% • Mid-term exam: 25% • Term project (including the proposal): 35% NTUT CSIE
Programming Exercises and Term Project • At least two programming exercises • Team-based (at most 4 persons per team) • You can either write your own code or reuse existing open source code • Topics: (to be announced…) • The term project • Either team-based system development (the same as programming exercises) • Or academic paper presentation • But, you should do it on your own (only 1 person), NOT team-based • A proposal is required around midterm (Apr. 2011) • Introduction, methods, experiment designs NTUT CSIE
Online Submission • Submission instructions • Programs, project proposals, and project reports in electronic files must be submitted to the TA online at: • http://140.124.183.39/ir/ • Before submission: • User name: Your student ID • Please change your default password at your first login NTUT CSIE
What this Course is NOT about • This course will NOT tell you • The tips and tricks when using search engines, although power users might have better ideas on how to improve them • There’re plenty of books and websites on that… • How to find books in libraries, although it’s somewhat related to the basic concepts of IR • How to make money on the Web, although the currently largest search engine did it NTUT CSIE
What’s Information Retrieval NTUT CSIE
On Wikipedia NTUT CSIE
On GeoNet NTUT CSIE
On Google Maps NTUT CSIE
On Google News NTUT CSIE
On Blogs NTUT CSIE
Or More Related Keywords • South Island • Christchurch • Canterbury • Christchurch Cathedral • … NTUT CSIE
What if We Search in Chinese NTUT CSIE
And More… • 南島 • 第二大城 • 基督城 • 大教堂 • … • And other languages… NTUT CSIE
What Is Information Retrieval? • “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968) NTUT CSIE
Goal • Information retrieval (IR): a research field that targets at effectively and efficiently searching information in text and multimedia documents • In this course, we will introduce the basic text and query models in IR, retrieval evaluation, indexing and searching, and applications for IR NTUT CSIE
A Big Picture NTUT CSIE
User Interface user need Text Text Operations Doc representation logical view Query Expansion Indexing user feedback inverted file query Inverted Index Retrieval Document Collection retrieved docs Ranking ranked docs NTUT CSIE
Topics • Text IR • Indexing and Searching • Query Languages and Operations • Retrieval Evaluation • Modeling • Boolean model • Vector space model • Probabilistic model • Applications for IR • Multimedia IR • Web Search • Digital Libraries NTUT CSIE
Organization of the Textbook • Basics in IR (focus) • Inverted indexes for boolean queries (Ch.1-5) • Term weighting and vector space model (Ch. 6-7) • Evaluation in IR (Ch. 8) • Advanced Topics • Relevance feedback (Ch. 9) • XML retrieval (Ch. 10) • Probabilistic IR (Ch. 11) • Language models (Ch. 12) • Machine learning in IR • Text classification (Ch. 13-15) • Document clustering (Ch. 16-18) • Web Search • Web crawling and indexes (Ch. 19-20) • Link analysis (Ch. 21) NTUT CSIE
Pointers to Other Topics • Cross-language IR • Image, video, and multimedia IR • Speech retrieval • Music retrieval • User interfaces • Parallel, distributed, and P2P IR • Digital libraries • Information science perspective • Logic-based approaches to IR • Natural language processing techniques NTUT CSIE
Tentative Schedule • Before midterm • Boolean retrieval (1 wk) • Indexing (2 wks) • Vector space model and evaluation (2 wk) • Relevance feedback (1 wk) • Probabilistic IR (2 wk) • After midterm • Text classification (1 wk) • Document clustering (1 wk) • Web search (2 wks) • Advanced topics: CLIR, IE, … (2 wks) • Term Project Presentation (3 wks) NTUT CSIE
Generic Resources • Wikipedia page on Information Retrieval: http://en.wikipedia.org/wiki/Information_retrieval • Information Retrieval Resources: http://www-csli.stanford.edu/~hinrich/information-retrieval.html NTUT CSIE
Academic Resources • Journals • ACM TOIS: Transactions on Information Systems • JASIST: Journal of the American Society of Information Sciences • IP&M: Information Processing and Management • Conferences • ACM SIGIR: International Conference on Information Retrieval • ACM CIKM: Conference on Information Knowledge and Management • JCDL: ACM/IEEE Joint Conference on Digital Libraries • TREC: Text Retrieval Conference NTUT CSIE
Thanks for Your Attention! NTUT CSIE