1 / 29

INLS 520

INLS 520. Information Organization. Review. Last week Types of categorization & classification structures Classification Definitions Look at Library classification systems for Dewey & Library of Congress. Today. Controlled vocabularies Types Basic concepts Related technologies

acton
Download Presentation

INLS 520

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INLS 520 Information Organization INLS 520 – Fall 2007 Erik Mitchell

  2. Review • Last week • Types of categorization & classification structures • Classification • Definitions • Look at Library classification systems for Dewey & Library of Congress INLS 520 – Fall 2007 Erik Mitchell

  3. Today • Controlled vocabularies • Types • Basic concepts • Related technologies • Metadata standards • Example Systems • Knowledge organization systems • Term Lists, Thesauri, Taxonomies, Ontologies INLS 520 – Fall 2007 Erik Mitchell

  4. Concepts & definitions • Controlled Vocabularies • “organized lists of words and phrases, or notation systems, that are used to initially tag content, and then to find it through navigation or search.” (Warner via Leise, Fast) • “the primary purpose of vocabulary control is to achieve consistency in the description of content objects and to facilitate retrieval” (ANSI Z39.19) • Knowledge organization systems • “tools that present the organized interpretation of knowledge structures” (Hjørland) • “classification schemes that organize materials at a general level…, subject headings that provide more detailed access, and authority files that control variant versions of key information” (Hodge) • “It depends on what the meaning of the words 'is' is.” (Clinton) INLS 520 – Fall 2007 Erik Mitchell

  5. Uses of controlled vocabulary (1) • Define scope, content, and context of information • Navigation, breadcrumbs • Map to user terminology • Enhance browsing, searching • Term consistency and relationships INLS 520 – Fall 2007 Erik Mitchell

  6. Functions of a CV • Removes ambiguity • Synonyms, Homonyms, polysemes, • Defines relationships • Equivalence, hierarchical, associative (BT, NT, RT, CR) reciprocity, • Provides context • Category, scope, qualifiers, modifiers, scope notes INLS 520 – Fall 2007 Erik Mitchell

  7. Types of Controlled Vocabularies • Term Lists • Glossaries, Dictionaries, Gazetteers, Folksonomies • Synonym rings • Z39.19 example • Oracle Text • Taxonomies • Website navigation scheme • Thesauri / Ontologies • Authority files, subject thesauri, topic maps INLS 520 – Fall 2007 Erik Mitchell

  8. A conceptual map http://www.taxotips.com/ INLS 520 – Fall 2007 Erik Mitchell

  9. Content Analysis Ambiguity Synonymy Exhaustivity Specificity Co-extensivity Aboutness Semantic structure Warrant (User, Literary, Organization) Form Analysis Linguistics Grammar Semiotics Single / Multiple terms Indexing & Retrieval Pre vs. Post Coordinate Recall vs. Precision Natural language processing (NLP) CV Concepts INLS 520 – Fall 2007 Erik Mitchell

  10. Content Analysis (1) • Ambiguity • Each term should relate to a single concpet • Synonymy • Each concept should be identified by a single entry • Specificity • Using the most specific words or phrase expressing the subject • Exhaustivity • The extent to which the entire document is indexed (Summarization, depth) • Co-extensivity • “Assign as many terms as needed to bring out the main theme, and according to guidelines sub-themes.” (p. 29, Lancaster) • “nothing more, nothing less” • Semantic Structure • Terms can be related with equivalence, hierarchy, or associated relationships (Use, See, NT, BT, RT) INLS 520 – Fall 2007 Erik Mitchell

  11. Content Analysis (2) • Aboutness = Subject/topic? • Wilson (1968) • Author intent, topicality, relationship to other resources, textual analysis • Farithorne (1969) • Intentional aboutness (author), extensional aboutness (document) • Maron (1977) • objective about (document), subjective about (user), and retrieval about (information retrieval) • Hjorland (2001) • “Closely related to theories of meaning, interpretation, and epistemology” INLS 520 – Fall 2007 Erik Mitchell

  12. Content Analysis (3) • Wilson’s criteria for evaluating aboutness (1968) • Identify author’s purpose (intent) • Weigh the predominant topics, elements (topical analysis) • Group/count a document’s use of concepts and references (bibliometrics) • Identify essential elements (text analysis) INLS 520 – Fall 2007 Erik Mitchell

  13. Content Analysis (4) • Literary Warrant • “The inclusion of a vocabulary term in a controlled vocabulary based on its appearance in one or more content items. For example, a medical text may use the term “oncology.” Based on literary warrant, that term would be included in the controlled vocabulary even though the general public uses the term “cancer.” (Glosso-Thesaurus) • User Warrant • “The inclusion of a vocabulary term in a controlled vocabulary based on use by users. Such terms can be identified through search log analysis or free listing.” (Glosso-Thesaurus) • Organizational Warrant • “Justification for the...selection of a preferred term due to the characteristics and context of the organization using the resource” (ANSI Z39.19) INLS 520 – Fall 2007 Erik Mitchell

  14. Form Analysis • Linguistics • Synatx/Form (grammar) • Morphology (internal word structure) • Semantics (meaning) • Pragmatics, discourse analysis (word/phrase use) • Semiotics • study of signs/symbols • Lexical structure • Document layout, markup, tags (think DOM) INLS 520 – Fall 2007 Erik Mitchell

  15. Indexing & Retrieval • Pre/Post-Coordinate • Organization prior to retrieval • Organization at the point of retrieval • Recall / Precision • Recall: Number of retrieved relevant docs / total number of docs in collection • Precision: number or retrieved relevant docs / all relevant docs in collection • Natural language processing • Uses semantics and syntax to automatically distill ‘aboutness’ INLS 520 – Fall 2007 Erik Mitchell

  16. Recall & Precision • A collection of 100 documents • Searches • “Vocabularies” • Recall 100/100 = 1 • Precision 100/100 = 1 • “Facet” • Recall 20/100= .2 • Precision 20/28 = .71 • “OWL” • Recall 1/100 = .001 • Precision 1/1 = 1 Recall = # of docs retrieved / total # of docs in collection Precision = # relevant of docs retrieved / total relevant # of docs in collection INLS 520 – Fall 2007 Erik Mitchell

  17. Term List Examples • Authority files – Maps to preferred terms • Library of Congress • Encoded Archival Context • Union List of Artist Names • Glossaries/Dictionaries –Words & definitions, sometimes topic focused • Glosso-Thesaurus • Folksonomies – • Contextualization, Trend discovery, Personal Information • Synonym rings – Used for back-end equivalence in searching • Princeton Wordnet INLS 520 – Fall 2007 Erik Mitchell

  18. Thesauri & taxonomy examples • List of vocabularies • http://www.slais.ubc.ca/resources/indexing/database1.htm • Taxonomy warehouse • Two Examples • Health & Ageing Thesaurus • Thesaurus of Geographic names INLS 520 – Fall 2007 Erik Mitchell

  19. Interoperable system example • NCBI Entrez • 35 databases using interoperable controlled vocabulary systems to provide rich meta-searching • Cross-database discovery – search for “heart attack” • Cross database linking – search for aconitase, follow the “other links” tab. INLS 520 – Fall 2007 Erik Mitchell

  20. Vocabulary and Classification systems - exercise • Break into groups, discuss & list • Goal • Structure • Issues • Benefits • Resources • Kwasnik, Boxes & arrows • Organization structures • Term Lists / Enumerative systems • Hierarchies • Tees • Paradigms • Facets / Associative relationships • Folksonomies INLS 520 – Fall 2007 Erik Mitchell

  21. Choosing a framework • Use questions • Who is your user, what are their needs? • What systems are your users familiar with? • Will this system be internal/external? • Content questions • How extensive, defined is the information? • Is your subject matter static or fluid? • What organizational framework best describes your content? • System Questions • What access are you trying to provide? • What external pressures exist? • What external entities/theories will interact with this system? INLS 520 – Fall 2007 Erik Mitchell

  22. Interoperability issues • Similarity of subject matter in domains • Multiple CV accepted in a domain • Specificity/granularity of content indexing • Use of synonyms, warrant • Intended use, purpose of system INLS 520 – Fall 2007 Erik Mitchell

  23. Creating a CV (1) • Design methods • Re-use existing, start with content & desired use ideas • Committee / community approach • Top-down • Concept driven • Bottom-up • Document driven • Empirical approach • Deductive approach • Select terms, create relationships, perform term control • Inductive approach • Establish CV at outset, build hierarchies on as needed basis INLS 520 – Fall 2007 Erik Mitchell

  24. Top-Down Identify audience Identify all topics, concepts, uses, and context of the domain Sort topics identified into an appropriate organization scheme (enumerative, hierarchical, faceted) Solidify structure and clean up gaps & redundancies Assign documents to categories, test retrieval Bottom-up Identify audience Survey documents for topics/concepts. Build system on the fly – let content drive structure and limits of system Identify gap & redundancies in system Test retrieval Creating a CV (2) INLS 520 – Fall 2007 Erik Mitchell

  25. Creating a CV (3) • Think about scope, use, content, maintenance • Gather Terms • Based on existing systems, content • Based on user needs/expectations • Investigate issues of specificity, exhaustivity, granularity • Build hierarchies, relationships • Broader/narrower terms, Related terms, Use/Use for, see/see also • Establish Rules • Implement • Evaluate • Maintain http://www.boxesandarrows.com/view/creating_a_controlled_vocabulary INLS 520 – Fall 2007 Erik Mitchell

  26. Evaluating a CV • Goals • Determine if the CV solves retrieval needs of user/system • Determine if CV matches user’s content model/term expectations • Methods • Expert evaluation of CV • User based card sorting compared to actual CV • Identification of non-included documents • Analysis of use of system - HCI INLS 520 – Fall 2007 Erik Mitchell

  27. CV Maintenance • Primary responsibility • Editor, board, committee • New terms • Is it really new or a different view • What is the proper form & placement • Modified terms • Include a change log • Use a “USE” reference to point to new term • Deleted terms • Unused / Overused terms • May want to keep for historical retrieval purposed • Modification history • Use modification notes, date/time stamps INLS 520 – Fall 2007 Erik Mitchell

  28. Class exercise • Protégé overview • Orientation • Object types (Classes, Slots, Instances) • Relationships (hierarchies, associative) • Replication of the Glosso-Thesaurus • Visit the Boxes & Arrows Glosso Thesaurus • Look at the data there and come up with a structure in Protégé that allows replication of the thesaurus • Some issues to consider are: • Do you want terms to be classes or instances? • What is the easiest way to show the relationships (broader term, narrower term, etc)? • Do you need to allow multiple relationships for a given type (BT, RT, etc)? • If you have multiple classes, at what level should you create the slots? INLS 520 – Fall 2007 Erik Mitchell

  29. Next Week • More on Knowledge organization systems • Taxonomies, Ontologies • More work with Protégé INLS 520 – Fall 2007 Erik Mitchell

More Related