330 likes | 482 Views
Controlled Vocabularies in Searching. Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih.gov. Definition Purpose and Role A Brief History Who is in Control? Spell Checkers. Folksonomies Tagging Search Focus Search refinement Web X.Y. Controlled Vocabularies.
E N D
Controlled Vocabularies in Searching Tamas Doszkocs, Ph.D.Computer Scientistdoszkocs@nlm.nih.gov
Definition Purpose and Role A Brief History Who is in Control? Spell Checkers Folksonomies Tagging Search Focus Search refinement Web X.Y Controlled Vocabularies
Definition and Purpose • Controlled vocabulary is a list of terms that have been enumerated explicitly. • In Library and Information Science Controlled vocabulary is a carefully selected list of words and phrases, which are used to tag units of information so that they may be more easily retrieved by a search. The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's text. Fully developed controlled vocabulary systems, such as the Library of Congress Subject Headings, are often published in a reference work that is called a thesaurus. Controlled vocabularies form part of a larger universe of nomenclatural approaches to data classification called metadata. (Wikipedia)
More Information • Bridging the gap between languages used by authors, search systems and users: • http://sky.fit.qut.edu.au/~middletm/cont_voc.html • http://www.controlledvocabulary.com/ • http://php.iupui.edu/~kcmcreyn/su03/control.html • http://www.hsl.creighton.edu/hsl/Searching/c-vocab1.html • http://www.dlese.org/Metadata/vocabularies/term_expln.htm
A Brief History • The 1970’s and 1980’s: bloody battles and casualties • Controlled vocabularies vs. natural language • Command languages vs. free-form queries • CVs vs. abstracts vs. full text • Librarians vs. end users • The 1990’s and the Web: natural language for the masses • The 21st Century: the best of both worlds
Vocabulary Control for Information Retrieval, 1972 • by F. Wilfrid Lancaster • About this title: Contents- * Why Vocabulary Control? * Pre-coordinate & Post-coordinate Systems * Vocabulary Structure & Display * Gathering the Raw Material * Standards & Guidelines * Organization of Terms: The Hierarchical Relationship * Organization of Terms: The Associative Relationship * Terms: Form & Compounding * The Entry Vocabulary * Homography & Scope Notes * Thesaurus Display * Vocabulary Growth Updating * The Role of the Computer * Identifiers & Checklists * The Influences of Vocabulary on the Performance of a Retrieval System * Evaluation of Thesauri * Natural-language Searching & the Post-controlled Vocabulary * Hybrid Systems * Compatibility & Convertibility * Multilingual Aspects * Automatic Approaches to Thesaurus Construction * Some Cost-effectiveness Aspects of Vocabulary Control * Bibliography * Index. "The publisher's announcement claims that the original edition is an information science classic that has emerged as the 'bible' of indexing & retrieval vocabularies, & (is the) first definitive monograph devoted exclusively to controlled vocabularies in information retrieval. ..
An Associative Interactive Dictionary for Online Searching, 1978 • Title: AID, an Associative Interactive Dictionary for Online Searching. • Authors: Doszkocs, Tamas E. • Descriptors: • Dictionaries - Information Retrieval - Online Systems - Search Strategies - Tables (Data) - Word Frequency • Source: On-Line Review, v2 n2 p163-73 Jun 1978, Jun78 • AID meta-searched MEDLINE, TOXLINE and the Hepatitis Databank and displayed result clusters of keywords and MeSH headings
CITE, 1979 • Doszkocs T. E., Rapp B. A. Searching Medline in English: A prototype user interface with natural language query, ranked output and relevance feedback. Proc. ASlS Annu. Meet. Vol 16 pp 131-137 1979. • Automatic suggestion of Medical Subject Headings • Used as NLM’s OPAC 1979-1984
WebLine, 1994 • The first Web interface to an online retrieval system • Associative Concept Navigation in MEDLINE and other NLM Databases via a Mosaic - Forms - WWW Interface Combining Natural Language Processing, Expert Systems and (un)Conventional Information Retrieval Techniques; Tamas E. Doszkocs, Seth B. Widoff, Bruno M. VastaNational Library of Medicinein Proceedings of the Second World Wide Web Conference , Chicago 1994 • http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/doszkocs/doszkocs.html • see also WebCrawler (Brian Pinkerton) • The Open Web and the Hidden Web
Jerry’s Guide to the Web, 1994 • Jerry Yang and David Filo’s Yahoo! 1995 • a directory of web sites, organized in a hierarchy of subject descriptors • Librarians at Yahoo • Surfing is to Yahoo! what the Dewey Decimal System is to libraries. In other words, Surfing is the categorization of websites. It also happens to be how Yahoo! began. Today our Surfing team continues its passion for finding, evaluating, and organizing information on the Internet. They have a voracious appetite for learning about new topics. They are curious individuals who are skilled at intuitively and efficiently analyzing and classifying diverse, unstructured pieces of information across the Yahoo! network. Surfers are critical to the relevance and intuitive nature of information presented on Yahoo!. http://careers.yahoo.com/job_descriptions.html
Clustering and Search Refinement with Natural Language and Controlled Vocabularies
Controlled Vocabularies in Searching Tamas Doszkocs, Ph.D.Computer Scientistdoszkocs@nlm.nih.gov