2ID10: Information Retrieval Lecture 2: IR Evaluation & Queries

2ID10: Information RetrievalLecture 2: IR Evaluation & Queries Lora Aroyo 4 April 2006

User Query Lecture 1 Summary Compare the information need with the information generate a ranking which reflects relevance Information Need Ranked list of documents IR System feedback Lecture 2: Query Languages & Operations

Lecture 1: Summary • IR Classic Models • Document Representation • Query representation • Indexing • Weighting & Similarity • TF-IDF Lecture 2: Query Languages & Operations

Lecture 2: Overview • Types of evaluation • Relevance and test collections • Effectiveness measures • Recall and Precision • Significance tests • Query languages Lecture 2: Query Languages & Operations

Types of IR Evaluation • Assistance in formulating queries • Speed of retrieval • Resources required • Presentation of documents • Ability to find relevant documents • Appealing to users (market evaluation) • Evaluation generally comparative • System A vs. B or A vs A’ • Cost-benefit analysis possible • Most common evaluation: retrieval effectiveness Lecture 2: Query Languages & Operations

IR Evaluation • Functional analysis • Test system each functionality (includes error analysis) • Performance analysis • Response Time & Space Required (balance/ tradeoffs) • short response time  smaller space used  better system • Performance evaluation • Performance of indexing structures, OS interactions, delays • Retrieval performance evaluation • How precise is the answer set • On a given retrieval strategy S similarity between retrieved docs & expert docs  goodness of S Lecture 2: Query Languages & Operations

IR Evaluation • Effectiveness • the ability of IR system to retrieve relevant documents and suppress non-relevant documents • related to relevancy of retrieved items • Relevancy • typicallynot binary • Subjective: Depends upon a specific user’s judgment • Situational: Relates to user’s current needs • Cognitive: Depends on human perception and behavior • Dynamic: Changes over time Lecture 2: Query Languages & Operations

Relevancy • Relevant (not relevant) according to User • Relevant (not relevant) according to System • Four main situations: • User – Relevant & System – Not Relevant • User – Not Relevant & System – Relevant • User – Not Relevant & System – Not Relevant • User – Relevant & System – Relevant Lecture 2: Query Languages & Operations

Relevancy Aspects • Logical relevancy • “Bosch” (trade mark) vs. “Den Bosch” • Usability • Date and origin of the document • Format of the document • Other users Lecture 2: Query Languages & Operations

Test collection • Real collections • never know full set of relevant documents • Compare retrieval performance with a Test collection • set of documents • set of queries • set of relevance judgments (which docs relevant to each query) Lecture 2: Query Languages & Operations

Test Collections • To compare the performance of two techniques • each technique used to evaluate test queries • results (set or ranked list) compared using some performance measure • most common measures - precision and recall • Usually - use multiple measures to get different views of performance • Usually - test with multiple collections - performance is collection dependent Lecture 2: Query Languages & Operations

Sample Test Collection Lecture 2: Query Languages & Operations

Test collection creation • Manual method • Every document judged against every query by experts • Pooling method • Queries run against several IR systems first • Results pooled, top proportion chosen for judging • Only top documents are judged Lecture 2: Query Languages & Operations

Text REtrieval Conference (TREC) • Established in 1992 to evaluate large-scale IR • Retrieving documents from a gigabyte collection • Run by NIST’s Information Access Division • Initially sponsored by DARPA as part of Tipster program • Now supported by many, including DARPA, ARDA, and NIST • Most well known IR evaluation setting • Proceedings available at http://trec.nist.gov Lecture 2: Query Languages & Operations

Text REtrieval Conference (TREC) • Consists of IR research tracks • Ad-hoc retrieval, routing, cross-language, scanned documents, speech recognition, query, video, filtering, Spanish, question answering, novelty, Chinese, high precision, interactive, Web, database merging, NLP, … • Each track works on roughly the same model • NIST carries out evaluation • How well your site did • How others tackled the problem • Successful approaches generally adopted in next cycle Lecture 2: Query Languages & Operations

Lecture 2: Overview • Types of evaluation • Relevance and test collections • Effectiveness measures • Recall and Precision • Significance tests • Query languages Lecture 2: Query Languages & Operations

whole document collection Relevant documents Retrieved documents The ability of the search to retrieve top-ranked documents that are mostly relevant Retrieved documents that are relevant The ability of the search to find all of the relevant documents in the corpus Relevant documents that are retrieved Precision & Recall Purpose of all IRS is to retrieverelevant information Lecture 2: Query Languages & Operations

retrieved & irrelevant Not retrieved & irrelevant irrelevant retrieved & relevant not retrieved but relevant relevant retrieved not retrieved Query Match • Match =retrieved document satisfying (relevant to) the information need • character strings in descriptor and query keywords match • Miss =not retrieveddocument satisfying (relevant to) the information need • character strings in descriptor and query keywordsdo not match(semantically similar) • False match =retrieved document which satisfies the query but is not relevant to the information need • character strings in descriptor and query keywords matchbut are semantically different Lecture 2: Query Languages & Operations

Retrieval Evaluation Setting • Q - query • R – set of relevant documents • |R| - number of relevant documents • S(Q)  A – answer set • |A| - number of answer set documents • Ra – relevant documents in answer set • |Ra| - number of docs in R  A Relevant Documents in Answer Set |Ra| Relevant Docs |R| Answer Set |A| Lecture 2: Query Languages & Operations

|Ra| |A| Precision • Fraction of the retrieved documents (A), which are relevant • high precision • when there are relatively few False Matches • can be determined exactly Precision = (System & User: Yes) Precision = (User: No & System: Yes) (System & User: Yes) Relevant documents retrieved Precision = All documents retrieved Lecture 2: Query Languages & Operations

|Ra| |R| Recall • Fraction of the relevant documents (R), which are retrieved • high recall • when there are relatively few Misses • cannot be determined exactly - requires knowledge of all relevant documents in a collection Recall = (System & User: Yes) Recall = (User: Yes & System: No) (System & User: Yes) Relevant documents retrieved Recall = All relevant documents Lecture 2: Query Languages & Operations

Determining Recall is Difficult • Total number of relevant items is sometimes not available: • Sample across the database and perform relevance judgment on these items • Apply different retrieval algorithms to the same database for the same query. The aggregate of relevant items is taken as the total relevant set Lecture 2: Query Languages & Operations

Returns relevant documents but misses many useful ones too The ideal 1 Precision 0 1 Recall Returns most relevant docs but includes lots of junk Trade-off between Recall & Precision We aim to obtain the highest for both • IR trying to increase the number of relevant docs will also retrieve increasing numbers of non-relevant • efforts to increase one measure tend to decrease the other Lecture 2: Query Languages & Operations

Computing Recall/Precision Points • For a givenquery • produce the ranked list of retrievals • Adjust a thresholdon this ranked list • produces different sets of retrieved documents • and therefore different recall/precision measures • Markeach document in the ranked list that isrelevant • Compute a recall/precision pair for each position • in the ranked list that contains a relevant document Lecture 2: Query Languages & Operations

Computing Example Let total # of relevant docs = 6 Check each new recall point: R=1/6=0.167; P=1/1=1 R=2/6=0.333; P=2/2=1 R=3/6=0.5; P=3/4=0.75 R=4/6=0.667; P=4/6=0.667 Missing one relevant document. Never reach 100% recall R=5/6=0.833; p=5/13=0.38 Lecture 2: Query Languages & Operations

Example http://www.googlewhack.com/ findthat elusive query (two words - no quote marks)with a single, solitary result! http://www.webology.ir/2005/v2n2/a12.html comparison of precision and recall in Search Engines Lecture 2: Query Languages & Operations

Low Recall& Solutions • Words exist in several forms e.g. limit, limits, limited, limitation • Stemmingto increase recall • Suffix removal allows word variants to match • e.g. word roots often precede modifiers • Boolean systems often allow manual truncation • Stemming does automatic truncation Lecture 2: Query Languages & Operations

Low Recall& Solutions • Synonymy • Many words with similar meanings: • Synonym(w1, w2)  m [w1Meansm  w2Meansm] • Recall increased by: • Thesaurus-based query expansion • Latent semantic indexing • Polysemy • One word has dissimilar meanings • PolySem(w)  m1m2 [wMeansm1  wMeansm2] • Recall increased by word sense disambiguation • Indexing word meanings rather than words • Context provides clues to word meaning Lecture 2: Query Languages & Operations

Query Languages (QL) • Which queries can be formulated • Dependent on the underlying IR model • Use: • content (semantics) • content structure (text syntax) • to find relevant documents • Query enhancement techniques • e.g. synonyms, thesauri, stemming, etc. • Query • Formulation of the user’s info need • Words or combination of words & operations Lecture 2: Query Languages & Operations

Keyword-based Querying • Keywords: • contained in documents • Retrieval Unit: • retrieved document • contains the answer to the query • Intuitive • Easy to express • Allow for fast ranking • Basic queries (single & multiple words) Lecture 2: Query Languages & Operations

Single-word queries • Text documents  search for the keywords • Set of docs – ranked according to the degree of similarity to the query • Ranking • word occurrences inside the text • term frequency - counts the number of times a word appears inside a document Lecture 2: Query Languages & Operations

Context queries • Complement single-word queries with search for ‘context’ – word, which are near to other words • Phrase context query • Sequence of single-word queries • Proximity context query • More relaxed version of phrase query • Sequence of single-word queries with a max allowed distance between them • Distance – in characters or words Lecture 2: Query Languages & Operations

Examples Context Queries • Phrase • ‘information retrieval’ • ‘information about retrieval’ • ‘information with respect to the retrieval’ • Distance • 1 • 4 • Ranking similar to single-word queries Lecture 2: Query Languages & Operations

Boolean Queries • Oldest form of keyword query • words + operators • atoms (basic queries) + Boolean operators • A or B, A and B, A not B • Query syntax tree AND OR white paper chocolate Lecture 2: Query Languages & Operations

Boolean Query Mechanics • Basic Query: • Find X  return all documents containing term X • X = Single words or phrases • Simple text or string matching • Complex Query: • boolean connectors and, or, not Lecture 2: Query Languages & Operations

Boolean IR • Boolean operators approximate natural language • e.g. find documents about a colour printers that are not made by Hewlett-Packard • AND can denote relationships between concepts • e.g. colour AND printer • OR can denote alternate terminology • e.g. colour AND(printer OR laser-printer) • NOT can exclude alternate meanings • e.g. colour AND(printer OR laser-printer) NOT (Hewlett-Packard OR HP) Lecture 2: Query Languages & Operations

Google Search • Google basic search • http://www.google.com/help/basics.html • Google advanced search • http://www.google.com/help/refinesearch.html Lecture 2: Query Languages & Operations

Natural Language Queries • Enumeration of words & context queries • All docs matching a portion of the query are retrieved • Higher rankingto all docs matching more parts of query • Negation- user determines words to be eliminated  lower ranking • Threshold for too low ranked docs • Boolean queries a simplified version of NL queries • Vector of term weights (doc & query) Lecture 2: Query Languages & Operations

Natural Language Queries Lecture 2: Query Languages & Operations

Lecture 2: Query Languages & Operations

Pattern Matching • More specific query formulation • Based on concept of pattern • a set of syntactic features that occur in a text segment • segments that fulfils the pattern specifications – pattern match • Retrieve pieces of text that have some property • Useful for linguistics, text statistics, data extraction • Pattern types: • Words, prefixes, suffixes, substrings, ranges, errors, regular expressions, extended patterns Lecture 2: Query Languages & Operations

Examples Pattern Matching • Words – string – sequence of chars • Prefixes – ‘program’ programmer • Suffixes – ‘er’  computer, monster, poster • Substrings – ‘tal’  coastal, talk, matallic • any flow  will match ‘many flowers’ • Ranges – a pair of strings which matches any word lying between them in lexicographical order – eg. range between words held and hold will retrieve strings such as hoax, hissing, helm, help, etc. (lexicographical order) Lecture 2: Query Languages & Operations

Examples Pattern Matching • Allowing errors • word together with an error threshold • retrieves all text words similar to a given word • errors are caused by typing, spelling, etc. • most accepted model is the Levenshtein distance or edit distance Lecture 2: Query Languages & Operations

Examples Pattern Matching • Regular expression – general pattern build up by simple strings & operators (, , ) • “pro (blem|tein) (s|ε) (0|1|2)*” • will match words like: • problem02 • proteins • Extended patterns • subset of the regular expressions • conditional expressions (part of the pattern may not appear always • wild characters matching any sequence in the text Lecture 2: Query Languages & Operations

Example • distance between: • COLOR and COLOUR is 1 • SURVEY and SURGERY is 2 • in the query, must be specified the maximum number of allowed errors for a word to match the pattern Lecture 2: Query Languages & Operations

Structural Queries • Based on structure of the text • Structure in text usually very restrictive • Languages to represent structured documents (HTML) • 3 structures • Fixed (form-like) • Hypertext • Hierarchical • Current query languages integrate both contents and structural queries Lecture 2: Query Languages & Operations

Fixed Structure • Docs have fixed set of fields • Some fields are not present in all docs • No nesting or overlap between fields is allowed • Each model refers to a concrete structure of a collection Lecture 2: Query Languages & Operations

Hypertext • Max freedom with respect to structuring power • Directed graph where thenodeshold some text and thelinksrepresent connection between nodes or positions outside of nodes • User manually traverses the hypertext nodes following links to search • http://xanadu.com/zigzag/ Lecture 2: Query Languages & Operations

Hierarchical Structure • Intermediate model • between fixed and hypertext • Recursive decomposition of text • typical for many text collections • Simplification from hypertext to a hierarchy • allows for faster algorithms to solve queries • The more powerful the mode – the less efficiency implemented • Example: • retrieve a figure on a page with structure • Title: car • Introduction: blue in with figure with section introduction title Lecture 2: Query Languages & Operations

2ID10: Information Retrieval Lecture 2: IR Evaluation & Queries