WIRED Week 3

WIRED Week 3 • Syllabus Update (next week) • Readings Overview • Quick Review of Last Week’s IR Models (if time) • Evaluating IR Systems • Understanding Queries • Assignment Overview & Scheduling • Leading WIRED Topic Discussions • Web Information Retrieval System Evaluation & Presentation • Projects and/or Papers Discussion • Initial Ideas • Evaluation • Revise & Present

Evaluating IR Systems • Recall and Precision • Alternative Measures • Reference Collections • What • Why • Trends

Why Evaluate IR Systems? • Leave it to the developers? • No bugs • Fully functional • Let the market (users) decide? • Speed • (Perceived) accuracy • Relevance is relevant • Different types of searches, data and users • “How precise is the answer set?” p 73

Retrieval Performance Evaluation • Task • Batch or Interactive • Each needs a specific interface • Setting • Context • New search • Monitoring • Usability • Lab tests • Real world (search log) analysis

Recall and Precision • Basic evaluation measurement for IR system performance • Recall: the fraction of relevant documents retrieved • 100% is perfect recall • Every document that is relevant is found • Precision: the fraction of retrieved documents which are relevant • 100% relevancy is perfect precision • How good the recall is

Recall and Precisions goals • Everything is found (recall) • The right set of documents is pulled from the found set (precision) • What about ranking? • Ranking is an absolute measure of relevance for the query. • Ranking is Ordinal in almost all cases

Recall and Precision Considered • 100 documents have been analyzed • 10 documents relevant to the query in the set • 4 documents are found and all are relevant • ??% recall, ??% precision • 8 documents are found, but 4 are relevant • ??% recall, ??% precision • Which is more important?

Recall and Precision Appropriate? • Disagreements over perfect sets • User errors in using results • Redundancy of results • Result diversity • Metadata • Dynamic data • Indexable • Recency of information may be key • A single measure is better • Combinatory • User evaluation

Back to the User • User evaluation • Is one answer good enough? Rankings • Satisficing • Studies of Relevance are key

Other Evaluation Measures • Harmonic Mean • Single, combined measure • Between 0 (none) & 1 (all) • Only high when both P & R are high • Still a percentage • E measure • User determines (parameter) value of R & P • Different tasks (legal, academic) • An interactive search?

Coverage and Novelty • System effects • Relative recall • Relative effort • sMore natural, user understandable measure • User knows some % documents are relevant • Coverage = % documents user expects • Novelty = % of documents user didn’t know of • Content of document • Document itself • Author of document • Purpose of document

Reference Collections • Testbeds for IR evaluation • TREC (Text Retrieval Conference) set • Industry focus • Topic-based or General • Summary tables for tasks (queries) • R & P averages • Document analysis • Measures for each topic • CACM (general CS) • ISI (academic, indexed, industrial)

Trends in IR Evaluation • Personalization • Dynamic Data • Multimedia • User Modeling • Machine Learning (CPU/$)

Understanding Queries • Types of Queries: • Keyword • Context • Boolean • Natural Language • Pattern Matching • More like this… • Metadata • Structural Environments

Boolean • AND, OR, NOT • Combination or individually • Decision tree parsing for the system • Not so easy for the user when advanced queries • Hard to backtrack and see differences in results

Keyword • Single word (most common) • Sets • “Phrases” • Context • “Phrases” • Near (# value in characters, words, documents links)

Natural Language • Asking • Quoting • Fuzzy matches • Different evaluation methods might be needed • Dynamic data “indexing” problematic • Multimedia challenges

Pattern Matching • Words • Prefixes “comput*” • Suffixes “*ology” • Substrings “*exas*” • Ranges “four ?? years ago” • Regular Expressions (GREP) • Error threshold • User errors

Query Protocols • HTTP • Z39.50 • Client – Server API • WAIS • Information/ database connection • ODBC • JDBC • P2P

Assignment Overview & Scheduling • Leading WIRED Topic Discussions • # in class = # of weeks left? • Web Information Retrieval System Evaluation & Presentation • 5 page written evaluation of a Web IR System • technology overview (how it works) • a brief history of the development of this type of system (why it works better) • intended uses for the system (who, when, why) • (your) examples or case studies of the system in use and its overall effectiveness

How can (Web) IR be better? Better IR models Better User Interfaces More to find vs. easier to find Scriptable applications New interfaces for applications New datasets for applications Projects and/or Papers Overview

Project Idea #1 – simple HTML • Graphical Google • What kind of document? • When was the document created?

Project Ideas • Google History: keeps track of what I’ve seen and not seen • Searching when it counts: Financial and Health information requires guided, quality search

WIRED Week 3

WIRED Week 3

Presentation Transcript

HepRep/WIRED

Wired

Week 3

Week 3

Week 3

Week 3

Week 3

Week 3

WEEK #3

WIRED Week 2

Week # 3

HepRep/WIRED

WIRED - Web Analytics Week

WIRED Week 3

Week 3

Cable: A Wired versus Wired World

WIRED Week 2

WIRED Week 7

WIRED - Web Analytics Week

Wired

Wired LAN

Week 3