910 likes | 920 Views
Dive deep into characterizing knowledge organization structures through the exploration of thesauri, taxonomy, and ontology. Learn about controlled vocabularies, semantic relationships, and the application of terms.
E N D
SKOS-2-HIVE GWU workshop
Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Morning Session Schedule Introductions Section 1: Characterizing Knowledge Organization Structures Section 2: Thesauri and What They Represent BREAK Section 3: From Thesauri to SKOS Section 4: From SKOS to HIVE Exploring HIVE
Types of knowledge organization structures From least to most structure • Term lists • Controlled vocabularies • Thesauri • Taxonomy • Ontology
Languages for aboutness Indexing languages: Terminological tools • Thesauri (CV – controlled vocabulary) • Subject headings lists • Authority files for named entities (people, places, structures, organizations) Classification / Classificatory systems Keyword lists Natural language systems (broad interpretation)
Term lists Controlled but semi-unstructured list Term List in practice http://library.lib.asu.edu/search/y
Authority files -standardization of names, subjects and titles for easier identification and interoperability of information Authority Files: http://authorities.loc.gov/
Thesauri • Less-structured and structured thesauri • Lexical semantic relationships • Composed of indexing terms/descriptors • Descriptors - representations of concepts Concepts - Units of meaning
Thesaurus basics • Preferred terms vs. non-preferred terms --ex. dress vs. clothing • Semantic relations between terms --broader, narrower, related • How to apply terms (guidelines, rules) • Scope notes
Common thesaural identifiers • SN Scope Note • Instruction, e.g. don’t invert phrases • USE Use (another term in preference to this one) • UF Used For • BT Broader Term • NT Narrower Term • RT Related Term
Controlled Vocabularies (less structured thesauri also referred to as subject heading lists) • Library of Congress Subject Headings (LCSH) • Sears Subject Headings • Medical Subject Headings (MeSH) http://www.nlm.nih.gov/mesh/MBrowser.html
Thesauri Thesaurus in practice • ERIC • NBII http://thesaurus.nbii.gov/portal/server.pt • NASA thesaurus http://www.sti.nasa.gov/thesfrm1.htm
Taxonomy First used by Carl von Linne (Linneaus) to classify zoology. A grouping of terms representing topics or subject categories. A taxonomy is typically structured so that its terms exhibit hierarchical relationships to one another, between broader and narrower concepts. taxonomy == a subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy (Garshol 2004)
Ontology • In general (in the LIS domain): • a tool to help organize knowledge • a way to convey or represent a class (or classes) of things, and relationships among the class/es. • No exact definition…this comes from the community you are coming from
KOS used in Digital Libraries Looked at 269 online digital libraries and collections KOS used: Locally developed taxonomy (113) LCSH (78) Author list (34) Thesauri (26) Alphabetical listing (20) Geographic arrangement (16) Shiri, A. and Chase-Kruszewski, S. (2009) Knowledge organization systems in North American digital library collections. Program:electronic library and information systems. 43 (2) pp 121-139.
Discussion: Think about your own organization. What type of controlled vocabularies, thesauri, and ontologies does your organization use for everyday work? How do these vocabulary choices help you meet the goals of your institution?
Hodge’s Types of Knowledge Organization Systems Hodge, G. (2000) Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files.http://www.clir.org/pubs/abstract/pub91abst.html Terms Lists : Authority Files, Glossaries, Gazetteers, Dictionaries Classifications and Categories: Subject Headings, Classification Schemes, Taxonomies, and Categorization Schemes Relationship Lists: Thesauri, Semantic Networks, Ontologies
(McGuinness, D. L. (2003). Ontologies Come of Age. In Fensel, et al, Spinning the Semantic Web. Cambridge, MIT Press), pp. 175. [see also, p. 181 + 189])
Greenberg’s Ontology Continuum Classical view of ILS languages <___|____|_______|______|_____|______|______|_______|________|_____> Simple thesauri/ deeper taxonomies low level full/intricate Key word CV thesauri ontologies ontologies Lists (WordNet)(OWL)
Examples of different types of “thesauri” • Cook’s Thesaurus http://www.foodsubs.com/ • BZZURKK! Thesaurus of Champions http://epe.lac-bac.gc.ca/100/200/300/ktaylor/kaboom/bzzurkk.htm • General Multilingual Environmental Thesaurus http://www.eionet.europa.eu/gemet
Common thesaural identifiers • SN Scope Note Instruction, e.g. don’t invert phrases • USE Use (another term in preference to this one) • UF Used For • BT Broader Term • NT Narrower Term • RT Related Term
Syndetic Relationships • Hierarchical • Equivalent • Associative
Hierarchical • Level of generality – both preferred terms • BT (broader term) • Birthday cakes BT Cakes • NT (narrower term) • Cakes NT Birthday cakes …remember inheritance
Equivalent • When two or more terms represent the same concept • One is the preferred term (descriptor), where all the information is collected • The other is the non-preferred and helps the user to find the appropriate term
Equivalent • Non-preferred term USE Preferred term • Biological diversification USE Biodiversity • Preferred term UF (used for) Non-preferred term • Biodiversity UF Biological diversification
Associative • One preferred term is related to another preferred term • Non-hierarchical • “See also” function • In any large thesaurus, a significant number of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy
Associative • Related Terms (RT) can be used to show these links within the thesaurus • Bed RT Bedding • Paint Brushes RT Painting • Vandalism RT Hostility • Programming RT Software
Exercise: Thesauri Building • Montages • Digital photographs • Illustrations • Pictures • Photographic prints • Drawings • Photographs • Daguerreotypes • Negatives
Where to start: • Look at the overall offering • Determine the aboutness • Identify the “root” element or broadest term • Identify groups/categories of information • Start structuring based on the syndetic relations you know • Create hierarchies based on the semantic relations • Use the appropriate identifiers to show the relationships
Simple Knowledge Organization Systems Classical view of ILS languages <___|____|_______|______|_____|______|______|_______|_______|______> Simple thesauri/ deeper taxonomies low level full/intricate Key word CV thesauri ontologies ontologies Lists (i.e WordNet) (i.e. OWL) SKOS
Descriptive Markup “the markup is used to label parts of the document rather than to provide specific instructions as to how they should be processed. The objective is to decouple the inherent structure of the document from any particular treatment or rendition of it. Such markup is often described as "semantic". --from Wikipedia
Markup Languages “is a system for annotating a text in a way which is syntactically distinguishable from that text.” Using tags: <tag>content to be rendered</tag> Or a keyword in brackets to distinguish texts --from Wikipedia
HTML Hypertext Markup Language --language used to mark up webpages --both descriptive and processing
HTML encoding <!doctype html> <html> <head> <title>Hello HTML</title> </head> <body> <p>Hello World!</p> </body> </html>
<a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_9',null,[['synonym','Heterozygotes']]);">Heterozygotes</a></td><td class="valign”><table><tbody id="result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_14:tbody_element”><tr class="odd"><td class="type">BT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_14:0:j_id_jsp_1679715049_18',null,[['synonym','Genotypes']]);">Genotypes</a></td></tr><tr class="even"><td class="type">NT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_14:1:j_id_jsp_1679715049_18',null,[['synonym','Carriers (genetics)']]);">Carriers (genetics)</a></td></tr><tr class="odd"><td class="type">RT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_14:2:j_id_jsp_1679715049_18',null,[['synonym','Heterozygosity']]);">Heterozygosity</a></td></tr><tr class="even"><td class="type">RT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_14:3:j_id_jsp_1679715049_18',null,[['synonym','Homozygotes']]);">Homozygotes</a></td></tr><tr class="odd"><td class="type">SC</td><td class="synonym">LSC Life Sciences</td></tr></tbody></table></td></tr><tr class="even"><td class="valign"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:1:j_id_jsp_1679715049_9',null,[['synonym','Homozygotes']]);">Homozygotes</a></td><td class="valign”><table><tbody id="result:j_id_jsp_1679715049_7:1:j_id_jsp_1679715049_14:tbody_element”><tr class="odd"><td class="type">BT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:1:j_id_jsp_1679715049_14:0:j_id_jsp_1679715049_18',null,[['synonym','Genotypes']]);">Genotypes</a></td></tr><tr class="even"><td class="type">RT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:1:j_id_jsp_1679715049_14:1:j_id_jsp_1679715049_18',null,[['synonym','Heterozygotes']]);">Heterozygotes</a></td></tr><tr class="odd"><td class="type">RT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:1:j_id_jsp_1679715049_14:2:j_id_jsp_1679715049_18',null,[['synonym','Homozygosity']]);">Homozygosity</a></td></tr><tr class="even"><td class="type">SC</td><td class="synonym">LSC Life Sciences</td></tr></tbody></table></td></tr>; NBII in HTML
XML Extensible Markup Language --Created by the World Wide Web Consortium (W3C). --Used to mark up documents on the internet or electronic documents. --Users get to describe the tags that are used and define how they are used.
<CONCEPT> <DESCRIPTOR>Zygotes</DESCRIPTOR> <UF>Ookinetes</UF> <BT>Ova</BT> <NT>Oocysts</NT> <RT>Hemizygosity</RT> <RT>Reproduction</RT> <RT>Zygosity</RT> <SC>ASF Aquatic Sciences and Fisheries</SC> <SC>LSC Life Sciences</SC> <STA>Approved</STA> <TYP>Descriptor</TYP> <INP>2007-08-14</INP> <UPD>2007-08-14</UPD> </CONCEPT> NBII in XML
RDF Resource Description Framework “is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadatadata model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats” --from Wikipedia
RDF data model is similar to Entity-Relationship or Class diagrams, statements about resource in subject-predicate- object expressions called “triples”. subject = resource predicate = traits or aspects of the resource and expresses a relationship between the subject and the object.
The sky has the color blue RDF triple: a subject denoting "the sky“ a predicate denoting "has the color” an object denoting "blue”
OWL Web Ontology Language --knowledge representation language for displaying ontologies working with logic