230 likes | 413 Views
WIRED Week 3. Syllabus Update (next week) Readings Overview Quick Review of Last Week’s IR Models (if time) Evaluating IR Systems Understanding Queries Assignment Overview & Scheduling Leading WIRED Topic Discussions Web Information Retrieval System Evaluation & Presentation
E N D
WIRED Week 3 • Syllabus Update (next week) • Readings Overview • Quick Review of Last Week’s IR Models (if time) • Evaluating IR Systems • Understanding Queries • Assignment Overview & Scheduling • Leading WIRED Topic Discussions • Web Information Retrieval System Evaluation & Presentation • Projects and/or Papers Discussion • Initial Ideas • Evaluation • Revise & Present
Evaluating IR Systems • Recall and Precision • Alternative Measures • Reference Collections • What • Why • Trends
Why Evaluate IR Systems? • Leave it to the developers? • No bugs • Fully functional • Let the market (users) decide? • Speed • (Perceived) accuracy • Relevance is relevant • Different types of searches, data and users • “How precise is the answer set?” p 73
Retrieval Performance Evaluation • Task • Batch or Interactive • Each needs a specific interface • Setting • Context • New search • Monitoring • Usability • Lab tests • Real world (search log) analysis
Recall and Precision • Basic evaluation measurement for IR system performance • Recall: the fraction of relevant documents retrieved • 100% is perfect recall • Every document that is relevant is found • Precision: the fraction of retrieved documents which are relevant • 100% relevancy is perfect precision • How good the recall is
Recall and Precisions goals • Everything is found (recall) • The right set of documents is pulled from the found set (precision) • What about ranking? • Ranking is an absolute measure of relevance for the query. • Ranking is Ordinal in almost all cases
Recall and Precision Considered • 100 documents have been analyzed • 10 documents relevant to the query in the set • 4 documents are found and all are relevant • ??% recall, ??% precision • 8 documents are found, but 4 are relevant • ??% recall, ??% precision • Which is more important?
Recall and Precision Appropriate? • Disagreements over perfect sets • User errors in using results • Redundancy of results • Result diversity • Metadata • Dynamic data • Indexable • Recency of information may be key • A single measure is better • Combinatory • User evaluation
Back to the User • User evaluation • Is one answer good enough? Rankings • Satisficing • Studies of Relevance are key
Other Evaluation Measures • Harmonic Mean • Single, combined measure • Between 0 (none) & 1 (all) • Only high when both P & R are high • Still a percentage • E measure • User determines (parameter) value of R & P • Different tasks (legal, academic) • An interactive search?
Coverage and Novelty • System effects • Relative recall • Relative effort • sMore natural, user understandable measure • User knows some % documents are relevant • Coverage = % documents user expects • Novelty = % of documents user didn’t know of • Content of document • Document itself • Author of document • Purpose of document
Reference Collections • Testbeds for IR evaluation • TREC (Text Retrieval Conference) set • Industry focus • Topic-based or General • Summary tables for tasks (queries) • R & P averages • Document analysis • Measures for each topic • CACM (general CS) • ISI (academic, indexed, industrial)
Trends in IR Evaluation • Personalization • Dynamic Data • Multimedia • User Modeling • Machine Learning (CPU/$)
Understanding Queries • Types of Queries: • Keyword • Context • Boolean • Natural Language • Pattern Matching • More like this… • Metadata • Structural Environments
Boolean • AND, OR, NOT • Combination or individually • Decision tree parsing for the system • Not so easy for the user when advanced queries • Hard to backtrack and see differences in results
Keyword • Single word (most common) • Sets • “Phrases” • Context • “Phrases” • Near (# value in characters, words, documents links)
Natural Language • Asking • Quoting • Fuzzy matches • Different evaluation methods might be needed • Dynamic data “indexing” problematic • Multimedia challenges
Pattern Matching • Words • Prefixes “comput*” • Suffixes “*ology” • Substrings “*exas*” • Ranges “four ?? years ago” • Regular Expressions (GREP) • Error threshold • User errors
Query Protocols • HTTP • Z39.50 • Client – Server API • WAIS • Information/ database connection • ODBC • JDBC • P2P
Assignment Overview & Scheduling • Leading WIRED Topic Discussions • # in class = # of weeks left? • Web Information Retrieval System Evaluation & Presentation • 5 page written evaluation of a Web IR System • technology overview (how it works) • a brief history of the development of this type of system (why it works better) • intended uses for the system (who, when, why) • (your) examples or case studies of the system in use and its overall effectiveness
How can (Web) IR be better? Better IR models Better User Interfaces More to find vs. easier to find Scriptable applications New interfaces for applications New datasets for applications Projects and/or Papers Overview
Project Idea #1 – simple HTML • Graphical Google • What kind of document? • When was the document created?
Project Ideas • Google History: keeps track of what I’ve seen and not seen • Searching when it counts: Financial and Health information requires guided, quality search