1 / 25

Organizing Information

Organizing Information. Vocabulary Control. From last week: index syntax. Pre/post coordinate syntax What should constitute a terminological unit Uniterm vs. compound terms (e.g. Alcohol Studies Database )

maeko
Download Presentation

Organizing Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Organizing Information Vocabulary Control

  2. From last week: index syntax • Pre/post coordinate syntax What should constitute a terminological unit Uniterm vs. compound terms(e.g. Alcohol Studies Database) • In pre-coordinate syntax, a class representation is a combination of many elemental concepts (e.g. Heart diseases – diagnostic methods – infants) • In post-coordinate syntax, the basic unit of representation is term/elemental concept

  3. Balance between human and computer indexing • Balance between keyword (free-text) and subject (controlled vocabulary) search Human indexing on the Web (Yahoo directory, INFOMINE

  4. Vocabulary control: why bother • The complexity of natural language Homograph – impact on retrieval? bow, the front part of a ship bow, to bend bow, a decorative knot CELLS (biology), CELLS (ELECTRIC) *Qualifier Synonym – impact on retrieval? MERCY KILLING/EUTHANASIA SUPPLEMENATRY EARNING/PERKS POSTAL SERVICE/MAIL SERVICE ALIENS/FOREIGNERS

  5. Ambiguity of reference of proper names changing names Jacquelyn Bouvier, Jacquelyn Kennedy, Jacquelyn Onassis ASIS – ASIST (American Society of Information Science and Technology) Variants Jackie Kennedy, Jacquelyn Bouvier Kennedy, Mrs. Kennedy, JBK, … pseudonyms, aliases … Twain, Mark/Clemens, Samuel

  6. Ambiguity of acronyms IRA-- Irish Republican Army IRA -- individual retirement account CIA -- Central Intelligence Agency CIA -- Culinary Institute of America ASIST Acronym Finder

  7. Degree of vocabulary control • Strings, terms, concepts Between Term and Strings (morphological) Digital “archive” = (archive or archived or archiving or archiver or archives or archivers) Between Terms and a Concept (semantic) ARCHIVE Use for Collection See also LIBRARY *(~ search)

  8. Controlled vocabulary • A subset of the vocabulary of a natural language from which certain types of words and syntactic forms have been excluded by rules. • Controlled vocabularies are generally recorded in subject heading lists or thesauri.

  9. Tools for vocabulary control • Authority file: Library of Congress Authority heading search MeSH scope note : (restrict and clarify the meaning of otherwise ambiguous terms) e.g. INCOME SN Income of individual organization or person Otherwise use National Income • Thesaurus: ERIC Thesaurus • Subject heading list: e.g. MeSH, LCSH (searchable database)

  10. Thesaurus Construction 1. Term admission (preferred terms, sources of terms) 2. Level of pre-coordination (compound terms and elemental descriptors) 3. Structure and relationships 4. Categorize/display terms

  11. Preferred and non-preferred terms • Preferred term (descriptor): A term in an indexing language chosen as the preferred or authorized representation of a concept conveyed by the text of a document or a feature of it. • Entry/Lead-in term Sex Bias USE SEXISM

  12. Warrant: criteria for term admissionsources of terms • Literary warrant The vocabulary of a subject language should be empirically derived from the literature it is intended to describe • Use Warrant Accommodate users’ vocabulary as lead-in or descriptor vocabulary • Structure Warrant Term introduced for collocation and navigation purpose AERIAL SPORTS NT GLIDING PARACHUTING PARAGLIDING SKY DIVING

  13. Compound terms and factored terms • “it is a general rule that terms in a thesaurus should represent simple or unitary concepts as far as possible, and compound terms should be factored (i.e. split) into simple elements, except when this is likely to affect the users’ understanding” (ISO 2788) International organization of standards “workload of dentists in Scotland” factored into WORKLOAD + DENTISTS + SCOTLAND

  14. Structure of compound terms • The focus: the genus term that identifies the broader class of things or events the term as a whole refers. e.g. PHILOSOPHY OF EDUCTION • The difference: the modifier or species term, which refer to a characteristic, or a logical difference e.g. PHILOSOPHY OF EDUCATION

  15. Factoring of compound terms • Semantic factoring: “a term which expresses a complex notion is re-expressed in the form of simpler or definitional elements” e.g. CARDIAC FAILUE HEART + OUTPUT + BELOW + NORMAL * Lead to precision loss, against common usage, not recommended by ISO 2788

  16. Factoring of compound terms (Cont.) • Syntactical factoring: applied to compound terms which are “amenable to morphological analysis into separate components, each of which can be accepted as an indexing term in its own right” e.g. PROTOTYPE FUZZY QUERY PROCESSORS PROTOTYPE + FUZZY QUERY + PROCESSORS COTTON SPINNING COTTON + SPINNING HOUSING MANAGEMENT HOUSING + MANAGEMENT

  17. Structure and relationships • Basic thesaural relationships Equivalence COGNITIVE PROCESS UF Thinking Process Thinking Process USE COGNITIVE PROCES Hierarchical COGNITIVE PROCESS MEMORY NT Abstract Reasoning BT Cognitive Process NT Memory Associative COGNITIE PROCESS COGNITIVE TEST RT COGNITIVE STYLE RT COGNITIVE STYLE RT COGNITIVE TEST

  18. Equivalence relationships 1. Synonyms: “terms whose meanings can be regarded as the same in a wide range of contexts.” DISADVANTAGED UF UNDERPRIVILEDCGED 2. Lexical variants: different word forms for the same expression such as spelling, grammatical variation, irregular plurals, direct versus indirect order and abbreviated formats) FIBER OPTICS VOCABULARY CONTROL UF FIBRE OPTICS UF Control of Vocabulary 3. Quasi-synonyms RACIAL SEGRATION UF APARTHEID

  19. Equivalence relationships (cont.) 4. Upward posting: Treat narrower terms as if they are equivalent to, rather than the species of, their broader terms. SOCIAL CLASS Elite UF Elite USE SOCIAL CLASS Middle class Working class 5. Factored and unfactored forms of compound terms Cotton Spinning USE COTTON + SPINNING COTTON SPINNING + SPINNING + COTTON UF Cotton spinning UF Cotton spinning

  20. Hierarchical relationships 1. The generic relationship (kind of) TEACHERS NT Adult educators School teachers Special education teachers 2. The hierarchical whole-part relationship EAR NT EXTERNAL EAR LABYRINTH SEMICIRCULAR CANALS VESTIBULAR APPARATUS MIDDLE EAR

  21. Hierarchical relationships (cont.) • The instance relationship SEAS NT Baltic Sea Caspian Sea Mediterranean Sea • Polyhierarchical relationships EAR NERVES NT Acoustic nerve NT Acoustic nerve ACCOUTIC NERVE BT Ear Nerves

  22. The associative relationship Two terms are closely associated when • “One of the terms should be strongly implied, according to the frames of reference shared by the users of the index, whenever the other is employed as an indexing term” • “it will frequently be found that one of the terms is a necessary component in any definition or explanation of the other.” (ISO 2788) VIOLENCE RT Violence victims ART THERAPY RT Psychiatric patients WOMEN Femininity

  23. Organize and display terms • Alphabetical • Topics • Facets

  24. Thesaurus online resources • American Society of Indexers “Thesauri Online”, prepared by Jessica L. Milstead • Thesaurus construction software A list of thesaurus construction software complied by the willpower Thesaurus builder • For graphics and cultural artifacts Museum material thesaurus Art & Architecture Thesaurus

  25. End-user thesauri as search tools • Traditionally, thesauri have been used in information retrieval to guide the indexer rather than the searcher. • There have been recent efforts to use the structure of thesauri to guild users’ browsing/searching http://www.aim25.ac.uk/search/ http://bailando.sims.berkeley.edu/flamenco.html

More Related