90 likes | 106 Views
A comprehensive overview of the field of information retrieval, covering topics such as document representation, query analysis, retrieval models, user interface, evaluation, and machine learning. The course aims to provide students with the necessary knowledge and skills to conduct research in information retrieval or apply advanced techniques to real-world applications.
E N D
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign
Course Goal • Advanced (graduate-level) introduction to the field of information retrieval (IR) • Goal • Provide an overview of IR research in the past several decades • Systematically review the core research topics in IR • Discuss the most recent research progress (customized toward the interests of students) • Give students enough training for doing research in IR or applying advanced IR techniques to applications • More in-depth treatment of topics than CS410: less emphasis on practical skills, more on understanding of principles, models, and algorithms
IR Research Topics (Broad View) Users Retrieval Applications Summarization Visualization Analytics Applications Filtering Mining Information Organization Information Access Text Mining Search Extraction Categorization Clustering Natural Language Content Analysis Text Text Acquisition
IR Topics (narrow view) docs 4. Efficiency & scalability INDEXING Query Rep query 3. Document representation/structure Doc Rep 6. User interface (browsing) User Ranking SEARCHING 1. Evaluation 2. Retrieval (Ranking) Models results 5. Search result summarization/presentation INTERFACE Feedback judgments 7. Feedback/Learning QUERY MODIFICATION LEARNING Our focus: 1, 2, 7
IR Topics covered & Related Topics Parallel Prog. docs HCI 4. Efficiency & scalability INDEXING Query Rep query 3. Document representation/structure Doc Rep 6. User interface (browsing) User Ranking SEARCHING 1. Evaluation 2. Retrieval (Ranking) Models results 5. Search result summarization/presentation INTERFACE Feedback judgments 7. Feedback/Learning QUERY MODIFICATION LEARNING Our focus: 1, 2, 7 ML NLP
Core Knowledge that You Should Know • IR Evaluation Methodology (Cranfield Lab Test) • Emphasizes on realistic task modeling • Test set creation/sharing • Measures • Comparative analysis of components • Statistical significance test • Retrieval Models • Vector-Space (retrieval heuristics) • Probabilistic (language models, statistical estimation) • Machine learning (basic idea) • Topic models • EM algorithm Check out the midterm topics for details You’ll likely find these to be useful for your research in general
Be Familiar with Some Frontier Topics • Document Representation and Content Analysis (e.g., text representation, document structure, linguistic analysis, non-English IR, cross-lingual IR, information extraction, sentiment analysis, clustering, classification, topic models, facets) • Queries and Query Analysis (e.g., query representation, query intent, query log analysis, question answering, query suggestion, query reformulation) • Users and Interactive IR (e.g., user models, user studies, user feedback, search interface, summarization, task models, personalized search) • Retrieval Models and Ranking (e.g., IR theory, language models, probabilistic retrieval models, feature-based models, learning to rank, combining searches, diversity) • Search Engine Architectures and Scalability ( e.g., indexing, compression, MapReduce, distributed IR, P2P IR, mobile devices) • Filtering and Recommending (e.g., content-based filtering, collaborative filtering, recommender systems, profiles) • Evaluation (e.g., test collections, effectiveness measures, experimental design) • Web IR and Social Media Search (e.g., link analysis, query logs, social tagging, social network analysis, advertising and search, blog search, forum search, CQA, adversarial IR, vertical and local search) • IR and Structured Data (e.g., XML search, ranking in databases, desktop search, entity search) • Multimedia IR (e.g., Image search, video search, speech/audio search, music IR) • Other Applications (e.g., digital libraries, enterprise search, genomics IR, legal IR, patent search, text reuse) Learn more about this from project presentation!
Beyond Information Retrieval: Take Other Related Courses Applications Models Applications Web, Bioinformatics… Machine Learning Pattern Recognition Data Mining Human-Computer Interaction Library & Info Science Statistics Optimization Information Retrieval Computer Vision Databases Natural Language Processing Software engineering Computer systems Algorithms Systems
Remaining Tasks for You • Work on your projects: Let me know ASAP if you need help • Present your project at 1:30-4:30pm on Friday, Dec. 12 (room 1304 SC) • Submit your reports by Dec. 19, Friday