1 / 32

Module 7b: Term Control and Semantic Relationships

Module 7b: Term Control and Semantic Relationships. IMT530: Organization of Information Resources Winter 2008 Michael Crandall. Steps in Constructing CVs. Define your domain Gather concepts From user interviews, search logs, content analysis, preexisting vocabularies Select your approach

krista
Download Presentation

Module 7b: Term Control and Semantic Relationships

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Module 7b: Term Control and Semantic Relationships IMT530: Organization of Information Resources Winter 2008 Michael Crandall

  2. Steps in Constructing CVs • Define your domain • Gather concepts • From user interviews, search logs, content analysis, preexisting vocabularies • Select your approach • Extract terminology • Control your terms • Organize your terms • Maintain, maintain, maintain IMT530- Organization of Information Resources

  3. Elements of Building CVs • Select your approach • Pre- or post-coordinated (sixteenth century lute music or sixteenth century and lutes and music) • Open or closed (indexers can add terms or not) • Enumeration vs. synthesis (facets) • Extract terms • Warrant (from users or domain or both) • Control terms • Specificity (cats or Siamese cats?) • Control of homographs (qualifications) • Term consistency and word form (plurals, etc.) • Multiword/phrase sequence and form (inverted, normal form?) • Term definitions (scope notes) • Syntax (citation order) • Semantic factoring • Organize terms • Semantic relationships IMT530- Organization of Information Resources

  4. Term Control

  5. Term control • Specificity (cats or Siamese cats?) • Control of homographs (qualifications) • Term consistency and word form (plurals, etc.) • Multiword/phrase sequence and form (inverted, normal form?) • Term definitions (scope notes) • Syntax (citation order) • Semantic factoring IMT530- Organization of Information Resources

  6. Specificity • Depends on user needs and time available • Should be consistent throughout CV to avoid user confusion • May be influenced by choice of approach • If faceted some facets may be more specific than others • If hierarchical you should be consistent throughout IMT530- Organization of Information Resources

  7. Homographs • Sometimes a single word or phrase has multiple meanings: e.g., “power”, “drum”, “Java”, “Jupiter” • Controlled vocabularies “disambiguate” these terms to make each term have a single meaning • In thesauri & subject heading lists, parenthetical qualifiers are added, e.g. these LCSH terms “Power (Mechanics)”; “Power (Christian theology)”; “Power (Social Sciences)”; Power (Philosophy)” • In taxonomies and classifications, the meaning of homographs is contextualized by placement in a particular hierarchy (following the example above, Power will appear in the Philosophy, Christianity, Social Sciences, and Mechanics hierarchies and the terms themselves, by virtue of their location (thus, different notation), will be disambiguated) IMT530- Organization of Information Resources

  8. Word Form • Single word form should be consistent • Choose verbs or nouns • Singular or plural • Standard form • Phrases should be standard form • Either direct (Constitutional government) • Or inverted (government, constitutional) • Allows closer grouping of like terms in alphabetic display- not used much anymore IMT530- Organization of Information Resources

  9. Scope Notes • Scope notes are term definitions in a thesaurus or controlled vocabulary • Scope notes are useful for indexers to let them know what the precise meaning of the term is; and for users to help them know if they are searching on the correct term IMT530- Organization of Information Resources

  10. Syntax • Syntax describes how terms are built (especially, how multiple concepts may be combined), and citation order (order of facets) • Syntax is an issue when concepts are pre-coordinated in an indexing term (whether the syntax is consistent or not) • Syntax is an issue for CVs that use synthesis with facets in that rules for synthesis (also called citation order in classification schemes) determine term syntax IMT530- Organization of Information Resources

  11. Semantic Factoring • “The process of analyzing some or all of the categories of an ontology into a collection of primitives” Sowa, J. F. (2003). Ontology. Glossary. http://www.jfsowa.com/ontology/gloss.htm • Essentially, you are trying to decompose terms into their elemental concepts, to minimize duplication and maximize reuse • For example: ship = vehicle+water transport • Not always possible, especially with non-concrete concepts • “Creating a thesaurus without doing semantic factoring is like trying to put together furniture from Ikea without following the instructions. You will get interesting configurations, but you will not save time.” Ezzo, J. (2005) Bella and Yakov and Tillie's Panties: What I Learned in “Construction and Maintenance of Indexing Languages and Thesauri” Bulletin of the American Society for Information Science and Technology 31(4) April/May 2005. http://www.asis.org/Bulletin/Apr-05/ezzo.html IMT530- Organization of Information Resources

  12. Relationships in CVs

  13. Relationships in Controlled Vocabularies • There are three major types of relationships between subject concepts • Equivalence Relationships • Hierarchical Relationships • Associative Relationships IMT530- Organization of Information Resources

  14. Equivalence Relationships • In natural language one word or phrase can refer to one or more concepts; and multiple terms can refer to a single concept • In other words, there is no one-to-one correspondence between words/phrases and concepts IMT530- Organization of Information Resources

  15. Preferred Terms and Cross references (Synonyms) • Controlled vocabularies create one-to-one relationships between synonyms – multiple words or phrases that share similar meaning • To do this we: • Select Preferred term (descriptor, subject heading) • Create cross references from non-preferred terms (entry vocabulary, lead-in terms) IMT530- Organization of Information Resources

  16. Example Equivalence Display • Sample display for descriptor (preferred term) “Creativity” from the ERIC Thesaurus: Creativity UF Creative ability Originality • If you searched on “Originality” or “Creative ability” in the ERIC database, you would see these references: • “Creative ability” see “Creativity” OR • “Originality” use “Creativity” • In other words, you would be led from the unused (lead-in) terms to the used (preferred) term. IMT530- Organization of Information Resources

  17. Equivalence Relationships - Summary • Exist between words or phrases that share the same (or similar) meaning • Equivalent terms are considered synonymous (whether they actually are or are not) • When controlling vocabulary, one equivalent term is selected as a preferred term (e.g., descriptor); the other equivalent terms are treated as “lead in” terms or cross references • References used in the CV to show equivalence relationships include: “UF” (use for); and “Use” “See”; and “Search under” IMT530- Organization of Information Resources

  18. Hierarchical Relationships • Hierarchical Relationships: • May be strictly defined as: • Genus-species (also called class inclusion or “is-a”) relationships • Whole-part relationships (sometimes these are treated as associative relationships) IMT530- Organization of Information Resources

  19. Hierarchical Relationships • Hierarchical Relationships: • May be illustrated by set notation: Set G (green) is a subset of Set B (blue) • All Gs are also Bs (in other words, a G is a B) • Using a real-world analogy, if Gs are gorillas, and Bs are animals, all gorillas are animals IMT530- Organization of Information Resources

  20. Ideal CV Hierarchical Relationships • Ideally, all hierarchical relationships indicated in a controlled vocabulary are also controlled and defined as genus-species (and sometimes also whole-part) relationships • ALL other relationships between terms are associative relationships • In real life CVs, this is not always the case! IMT530- Organization of Information Resources

  21. References for Hierarchical Relationships • Hierarchically related terms are shown by the BT (broader term), NT (narrower term), and sometimes See also/Search also references. • Examples of two entries in the ERIC thesaurus: Creativity BT Psychological characteristics Psychological characteristics NT Creativity Intelligence Cognitive style IMT530- Organization of Information Resources

  22. BTs & NTs • In the previous slide, both Creativity and Psychological characteristics are preferred terms • Each has its own display; the Creativity display (Creativity as a preferred term display) shows the reference to the broader, preferred term “Psychological characteristics” IMT530- Organization of Information Resources

  23. Testing for Hierarchical Relationships • To test for a hierarchical relationship between terms, use the ‘is-a’ test. • The relationship between “robin” and “bird”? (A robin is a (type of) bird, so the relationship is hierarchical; Bird is the broader term, Robin is the narrower) • The relationship between Water and Hydronomy? (Water is not a hydronomy or a type of hydronomy; Hydronomy is not a water or a type of water; so the relationship here is an associative relationship) IMT530- Organization of Information Resources

  24. Examples of Hierarchical Relationships • What is the relationship between these sets of terms? • books and library materials • water and floods • buildings and chimneys • painting and acrylic paints • water and groundwater IMT530- Organization of Information Resources

  25. Answers • Books and Library materials (hierarchical) • Water and floods (associative because a flood is not the same type of thing as water--one way you can tell is that one is a count noun, and the other is not--but maybe hierarchical is ok depending on context) • Buildings and chimneys (hierarchical if you include whole-part relationships; associative if you don’t) • Painting and acrylic paints (associative) • Water and ground water (hierarchical) IMT530- Organization of Information Resources

  26. More on Hierarchical Relationships • A characteristic of the hierarchical relationship between terms that are strictly hierarchically related (genus-species only, not whole part) is Hierarchical Force • When a narrower term is hierarchically related to a broader term, the narrower terms (NT) inherits all of the characteristics of the terms above it in a hierarchy IMT530- Organization of Information Resources

  27. Associative Relationships • Include all relationships not encompassed by equivalence and hierarchical relationships • In Controlled Vocabularies, these relationships are shown by the following references: • Related Term (RT), see also (SA) • Examples of types of associative relationships (there are many of these!): • Thing and property (rubber, elasticity) • Complementary activities (teaching, learning) • Agent and activity (artist, painting) IMT530- Organization of Information Resources

  28. Associative Relationships • Many of these are semantic relationships • Some of these are syntactic relationships too: • Children see related term Games • Problems – when to stop? How close in meaning or syntactic relation do two terms have to be to show them in a CV? • Note: associative relationships are rarely shown in classifications & taxonomies IMT530- Organization of Information Resources

  29. Example Associative Relationship Display • From the ERIC thesaurus: Comprehension RT Concept formation Misconceptions Scientific literacy Thinking skills • Again, remember that both Comprehension and all of the RTs are preferred terms; however, this is the display for the preferred term Comprehension IMT530- Organization of Information Resources

  30. Some Guidelines • Does the taxonomy cover the domain appropriately? • Is it within scope? • Do draft definitions for concepts express them clearly? • Are duplicate concepts removed? • Are basic-level concepts represented? • Does extracted terminology express them? • Is the structure useful and sensible? IMT530- Organization of Information Resources

  31. Questions? • If not, take a break!!! IMT530- Organization of Information Resources

  32. Exercise 7b • Take your concept lists from the last exercise, and use those in Exercise 7b to begin building a controlled vocabulary • Do as much as you can in class today, work on the rest during the week • Each group should send me your initial controlled vocabularies by email by next Friday IMT530- Organization of Information Resources

More Related