260 likes | 526 Views
Organizing Information. Vocabulary Control. From last week: index syntax. Pre/post coordinate syntax What should constitute a terminological unit Uniterm vs. compound terms (e.g. Alcohol Studies Database )
E N D
Organizing Information Vocabulary Control
From last week: index syntax • Pre/post coordinate syntax What should constitute a terminological unit Uniterm vs. compound terms(e.g. Alcohol Studies Database) • In pre-coordinate syntax, a class representation is a combination of many elemental concepts (e.g. Heart diseases – diagnostic methods – infants) • In post-coordinate syntax, the basic unit of representation is term/elemental concept
Balance between human and computer indexing • Balance between keyword (free-text) and subject (controlled vocabulary) search Human indexing on the Web (Yahoo directory, INFOMINE
Vocabulary control: why bother • The complexity of natural language Homograph – impact on retrieval? bow, the front part of a ship bow, to bend bow, a decorative knot CELLS (biology), CELLS (ELECTRIC) *Qualifier Synonym – impact on retrieval? MERCY KILLING/EUTHANASIA SUPPLEMENATRY EARNING/PERKS POSTAL SERVICE/MAIL SERVICE ALIENS/FOREIGNERS
Ambiguity of reference of proper names changing names Jacquelyn Bouvier, Jacquelyn Kennedy, Jacquelyn Onassis ASIS – ASIST (American Society of Information Science and Technology) Variants Jackie Kennedy, Jacquelyn Bouvier Kennedy, Mrs. Kennedy, JBK, … pseudonyms, aliases … Twain, Mark/Clemens, Samuel
Ambiguity of acronyms IRA-- Irish Republican Army IRA -- individual retirement account CIA -- Central Intelligence Agency CIA -- Culinary Institute of America ASIST Acronym Finder
Degree of vocabulary control • Strings, terms, concepts Between Term and Strings (morphological) Digital “archive” = (archive or archived or archiving or archiver or archives or archivers) Between Terms and a Concept (semantic) ARCHIVE Use for Collection See also LIBRARY *(~ search)
Controlled vocabulary • A subset of the vocabulary of a natural language from which certain types of words and syntactic forms have been excluded by rules. • Controlled vocabularies are generally recorded in subject heading lists or thesauri.
Tools for vocabulary control • Authority file: Library of Congress Authority heading search MeSH scope note : (restrict and clarify the meaning of otherwise ambiguous terms) e.g. INCOME SN Income of individual organization or person Otherwise use National Income • Thesaurus: ERIC Thesaurus • Subject heading list: e.g. MeSH, LCSH (searchable database)
Thesaurus Construction 1. Term admission (preferred terms, sources of terms) 2. Level of pre-coordination (compound terms and elemental descriptors) 3. Structure and relationships 4. Categorize/display terms
Preferred and non-preferred terms • Preferred term (descriptor): A term in an indexing language chosen as the preferred or authorized representation of a concept conveyed by the text of a document or a feature of it. • Entry/Lead-in term Sex Bias USE SEXISM
Warrant: criteria for term admissionsources of terms • Literary warrant The vocabulary of a subject language should be empirically derived from the literature it is intended to describe • Use Warrant Accommodate users’ vocabulary as lead-in or descriptor vocabulary • Structure Warrant Term introduced for collocation and navigation purpose AERIAL SPORTS NT GLIDING PARACHUTING PARAGLIDING SKY DIVING
Compound terms and factored terms • “it is a general rule that terms in a thesaurus should represent simple or unitary concepts as far as possible, and compound terms should be factored (i.e. split) into simple elements, except when this is likely to affect the users’ understanding” (ISO 2788) International organization of standards “workload of dentists in Scotland” factored into WORKLOAD + DENTISTS + SCOTLAND
Structure of compound terms • The focus: the genus term that identifies the broader class of things or events the term as a whole refers. e.g. PHILOSOPHY OF EDUCTION • The difference: the modifier or species term, which refer to a characteristic, or a logical difference e.g. PHILOSOPHY OF EDUCATION
Factoring of compound terms • Semantic factoring: “a term which expresses a complex notion is re-expressed in the form of simpler or definitional elements” e.g. CARDIAC FAILUE HEART + OUTPUT + BELOW + NORMAL * Lead to precision loss, against common usage, not recommended by ISO 2788
Factoring of compound terms (Cont.) • Syntactical factoring: applied to compound terms which are “amenable to morphological analysis into separate components, each of which can be accepted as an indexing term in its own right” e.g. PROTOTYPE FUZZY QUERY PROCESSORS PROTOTYPE + FUZZY QUERY + PROCESSORS COTTON SPINNING COTTON + SPINNING HOUSING MANAGEMENT HOUSING + MANAGEMENT
Structure and relationships • Basic thesaural relationships Equivalence COGNITIVE PROCESS UF Thinking Process Thinking Process USE COGNITIVE PROCES Hierarchical COGNITIVE PROCESS MEMORY NT Abstract Reasoning BT Cognitive Process NT Memory Associative COGNITIE PROCESS COGNITIVE TEST RT COGNITIVE STYLE RT COGNITIVE STYLE RT COGNITIVE TEST
Equivalence relationships 1. Synonyms: “terms whose meanings can be regarded as the same in a wide range of contexts.” DISADVANTAGED UF UNDERPRIVILEDCGED 2. Lexical variants: different word forms for the same expression such as spelling, grammatical variation, irregular plurals, direct versus indirect order and abbreviated formats) FIBER OPTICS VOCABULARY CONTROL UF FIBRE OPTICS UF Control of Vocabulary 3. Quasi-synonyms RACIAL SEGRATION UF APARTHEID
Equivalence relationships (cont.) 4. Upward posting: Treat narrower terms as if they are equivalent to, rather than the species of, their broader terms. SOCIAL CLASS Elite UF Elite USE SOCIAL CLASS Middle class Working class 5. Factored and unfactored forms of compound terms Cotton Spinning USE COTTON + SPINNING COTTON SPINNING + SPINNING + COTTON UF Cotton spinning UF Cotton spinning
Hierarchical relationships 1. The generic relationship (kind of) TEACHERS NT Adult educators School teachers Special education teachers 2. The hierarchical whole-part relationship EAR NT EXTERNAL EAR LABYRINTH SEMICIRCULAR CANALS VESTIBULAR APPARATUS MIDDLE EAR
Hierarchical relationships (cont.) • The instance relationship SEAS NT Baltic Sea Caspian Sea Mediterranean Sea • Polyhierarchical relationships EAR NERVES NT Acoustic nerve NT Acoustic nerve ACCOUTIC NERVE BT Ear Nerves
The associative relationship Two terms are closely associated when • “One of the terms should be strongly implied, according to the frames of reference shared by the users of the index, whenever the other is employed as an indexing term” • “it will frequently be found that one of the terms is a necessary component in any definition or explanation of the other.” (ISO 2788) VIOLENCE RT Violence victims ART THERAPY RT Psychiatric patients WOMEN Femininity
Organize and display terms • Alphabetical • Topics • Facets
Thesaurus online resources • American Society of Indexers “Thesauri Online”, prepared by Jessica L. Milstead • Thesaurus construction software A list of thesaurus construction software complied by the willpower Thesaurus builder • For graphics and cultural artifacts Museum material thesaurus Art & Architecture Thesaurus
End-user thesauri as search tools • Traditionally, thesauri have been used in information retrieval to guide the indexer rather than the searcher. • There have been recent efforts to use the structure of thesauri to guild users’ browsing/searching http://www.aim25.ac.uk/search/ http://bailando.sims.berkeley.edu/flamenco.html