130 likes | 206 Views
CSM06 Information Retrieval. Lecture 1c – Module Overview Dr Andrew Salway a.salway@surrey.ac.uk. AIM.
E N D
CSM06 Information Retrieval Lecture 1c – Module Overview Dr Andrew Salway a.salway@surrey.ac.uk
AIM • The module introduces a range of data structures and algorithms for information retrieval, as well as some of the underlying theory and approaches. The emphasis will be on information retrieval for the web, but organisation-wide archives and personal media collections will also be considered. • Students are encouraged to develop practical experience through case studies of current information retrieval systems and applications, and through the development and evaluation of part of an information retrieval system.
LEARNING OUTCOMES By the end of the module you should be able to: • Compare and contrast the theory, techniques and applicability of the Boolean Model of IR, the Vector Space Model of IR and the ranking techniques used by web search engines • Apply a range of techniques for information retrieval and justify the selection of techniques for a particular information retrieval application • Demonstrate an appreciation for the challenges currently facing information retrieval the developers of information retrieval systems for the web, and the extent to which current web search-engines, address these challenges • Explain how visual data presents different challenges than text data, and compare and contrast approaches and systems for image and video retrieval • Discuss current research themes in the field of information retrieval, and research and evaluate advanced IR techniques
How the module runs… • Lectures: Tuesdays 10-1: reading the slides should not be a substitute for attending • Assessment: 60% unseen written examination; 40% one piece of coursework (set next week) • Coursework tutorials: some of the scheduled lecture time will be used to give feedback on coursework – extra coursework tutorials will be arranged if required
Outline of Module Content Text Information Retrieval Lecture 2 · Tokenization, stemming, stop lists à inverted index; STAIRS data structure · Boolean Model: underlying set theory · The challenges of synonymy and polysemy · Vector space model: texts and queries as vectors; cosine distance Lecture 3 Enhancing the vector space model: · Term Frequency – Inverse Document Frequency · Relevance Feedback · Latent Semantic Indexing · Generating clusters for query expansion
Outline of Module Content Web Search Lecture 4 · Characteristics of information on the web · Ranking techniques for web search engines · Link analysis - Page Rank algorithm Lecture 5 · Finding similar pages: companion algorithm and co-citation algorithms · Enhancing users’ queries: geographic queries; questions into queries - TRITUS Lecture 6 · Visualising the results set: InfoCrystal, TileBars, Vivismo, KartOO. · Further features of web search engines
Outline of Module Content Visual Information Retrieval Lecture 7 · Kinds of image metadata · The challenge of the semantic gap and the sensory gap · Content-based Image Retrieval – QBIC, BlobWorld · Web image retrieval using collateral text – WebSEEK, Google, Munson, Yanai Lecture 8 · Video retrieval: Informedia, Google Video, Blinkx TV · Learning associations between visual features and words using web data
Outline of Module Content Lecture 9 · Group coursework presentations Lecture 10 · REVISION
How the module runs… “Resource based, independent learning” • Consider spending about 3 hours per week on reading and doing exercises, in addition to the coursework • Resources include: • Lectures and lecture slides, and some additional notes • Exercises • Set reading and further reading • Your lecturer (a.salway@surrey.ac.uk) • Module web-page: bookmark this and check it regularly. Lecture slides and other important information will normally be available in advance of the lecture. • Any problems? Let me know!
What is ‘Set Reading’? • Set Reading is considered essential for the module • You are expected to make your own notes from the set reading to give yourself a broader and deeper understanding of the material covered in lectures • Key sections / pages / ideas will be pointed out when the reading is set • All the Set Reading will be available either online or from the Library Article Collection service
What is ‘Further Reading’? • It is NOT essential to do Further Reading • The Further Reading is given if you want more details, and / or alternative explanations for the topics covered in the lecture • The alternative explanations may be useful to help you understand the lecture better • The extra detail may be appropriate if you are doing coursework in the area, or if you are just interested! • BUT, be careful not to overload yourself with ‘Further Reading’ – it is NOT essential for the module
Is there a set textbook? • There is NOT a textbook that you have to buy for this module. • Some reading will be set from the books listed on the next slide, but copies will be available in the library article collection.
Books Referred To During the Module… • Baeza-Yates and Ribeiro-Neto (1999), Modern Information Retrieval, Addison Wesley. • Baldi, Frasconi and Smyth (2003), Modeling the Internet and the Web, Wiley. • Belew, R. K. (2000), Finding Out About, Cambridge University Press. • Weiss et al. (2005), Text Mining: predictive methods for analysing unstructured information, Springer. • Hock (2001), The Extreme Searcher’s Guide to Web Search Engines, 2nd Edition, CyberAge Books. • Del Bimbo (1999), Visual Information Retrieval, Morgan Kaufmann. • G. J. Kowalski and M. T. Maybury (2000), Information and Storage Retrieval Systems. Kluwer Academic Publishers. • K. Sparck Jones and Peter Willett (1997), Readings in Information Retrieval. Morgan Kaufmann.