170 likes | 431 Views
Precision & Recall. Relevancy: the likelihood or probability that a search result (i.e. “a hit”) meets the user’s expected information need; the result fulfills or partially satisfies the need or answers the question at hand
E N D
Precision & Recall Relevancy: the likelihood or probability that a search result (i.e. “a hit”) meets the user’s expected information need; the result fulfills or partially satisfies the need or answers the question at hand Precision: the ratio of retrieved documents (or a pre-defined subset) that are relevant, contraposed to those that are irrelevant Recall: the portion of the total number (or a pre-defined subset) of relevant documents that are successfully retrieved
Precision & Recall Retrieved Relevant Documents
High Precision, Low Recall Retrieved Relevant Documents Retrieved but not relevant
Low Precision, High Recall Relevant but not retrieved Retrieved Relevant Documents
Precision & Recall • There is a trade-off between precision and recall. Greater precision decreases recall and greater recall leads to decreased precision • Both rely on a good query • To get best recall and precision the user must have: • An understanding of the optimal search syntax accepted by the system • An understanding of what types of content is stored in the system
Indexing • Manual: done by professional indexers with subject knowledge; intellectually labor intensive; expensive • experience with using thesauri • understand formatting styles to be extracted from and conformed to • Machine: done with computer algorithms; can process much more data but prone to mistakes which cause loss of recall/precision on the user’s end; little or no labor; relatively inexpensive
Search Types • Thesaurus • Controlled vocabulary, uniform headings, etc. • Fields • Author, Title, Publication Year, etc. • Full-Text • Natural Language, Keyword (Google) • Cited Reference • Related (Similar) • Find documents related to or similar to target • A comprehensive search requires all of these
Specialized Search Types • Image matching • Find documents with similar or matching images • Chemical Structure • Find documents with similar or matching chemical structures • LaTex • Scientific & mathematical formulas • Video & Audio
Search Modifiers & Operators • Boolean (AND, OR and NOT) • Combinations of search terms • Rifle AND pistol; rifle OR pistol; rifle NOT pistol • Correct syntax important • Nesting • Combining terms in one search field with terms in another search field • SU=(Biology OR Ecology) AND AU=White
Boolean Concepts Each circle represents a different subject concept Finds citations containing both Finds citations containing at least one Finds citations which contain A but not B Stimulate 4, October 2004, VUB Brussel
Search Modifiers & Operators • Exact Phrase • Usually quotation marks; “pride and prejudice” • Proximity • Search for terms adjacent to each other or within a specified proximity of each other; order may or may not matter • Christmas ADJ Eve • Adam NEAR\7 Eve • Meteorite SAME Earth • Emergency BEFORE\2 responder
Search Modifiers & Operators • Truncation (wildcard) • Search for variant forms of a word or stem • Some databases offer auto-stemming; similar but not the same as truncation
Search Modifiers & Operators • Truncation: right, left and center; not all databases offer all types • Right: environment?, hydrolog? • Left: ?phobia, • Center: Wom?n, Organi?ation, Col?r • Some databases allow for mixed truncation • ?librar? • Some have special symbols to represent • Exactly one character • One or more characters • Zero or more characters
Search Logs The practice of recording search strategies, search strings, keywords used, fields searched, search modifiers, etc. Many modern databases have features built in to help you keep track