280 likes | 292 Views
This presentation outlines the importance and implementation of controlled vocabularies at the Library of Congress, covering types of vocabularies, SKOS introduction, concept databases, and examples including ISO 639-2 and PREMIS. Various types of controlled vocabularies such as enumerated lists, code lists, taxonomies, thesauri, and their application in metadata standards are discussed. The role of SKOS (Simple Knowledge Organisation System) in structuring vocabularies and creating relationships between concepts is highlighted, along with examples and advantages of using SKOS. The establishment of a controlled vocabularies registry at the Library of Congress is detailed, showcasing ongoing projects and potential applications in maintaining standards.
E N D
A Registry for controlled vocabularies at the Library of Congress Rebecca Guenther Network Development & MARC Standards Office, Library of Congress October 29, 2008
Outline of presentation • Types of controlled vocabularies • Vocabularies maintained at LC • An introduction to SKOS • Establishing concept databases at LC • Examples of concept schemes: ISO 639-2 and PREMIS event type • Providing the registry as a web service ASIST 2008
Why establish controlled vocabularies? • Control values that occur in metadata • Document and publish for reuse • Reduce ambiguity • Control synonyms • Establish formal relationships among terms (where appropriate) • Test and validate terms ASIST 2008
Types of Controlled Vocabularies used in metadata standards • Lists of enumerated values • Code lists (e.g. language, country) • Taxonomies • Formal Thesauri • Locally controlled enumerated lists ASIST 2008
Enumerated lists • Simple list of terms used in a pull-down menu or Web site pick list • Values enumerated in an XML schema • Little additional information or structure about each value • Examples: • Code and value from a MARC 21 fixed field, e.g. code “e” in Leader/06 is “cartographic material” • Enumerated value “MD5” for METS CHECKSUMTYPE • Enumerated value “born digital” in MODS digitalOrigin ASIST 2008
Code lists • Some established as ISO standards and used worldwide in many communities for many purposes • The standard standardizes the code, not a particular name for it • Codes are used as identifiers • Examples (maintained by LC): • ISO 639-2 (language codes) • MARC relator codes • MARC country codes ASIST 2008
Thesauri • A thesaurus is a controlled vocabulary with multiple types of relationships Example: Rice UF paddy BT Cereals BT Plant products NT Brown rice RT Rice straw ASIST 2008
Standards maintained at LC that use controlled vocabularies • MARC (including code lists) • MODS • METS • MIX (XML schema for Z39.87 Technical metadata for digital still images) • PREMIS • ISO 639-2 (language codes) • Thesaurus of Graphic Materials • LCSH • … and some others ASIST 2008
SKOS: What is it? Simple Knowledge Organisation System(s) SKOS is … for declaring and publishingtaxonomies, thesauri or classification schemes, for use in adistributed, decentralised information system (i.e. a semantic web). fordescribing Concepts and creating relationships between Concepts and Terms A practical application of RDF a formal language for representing controlled, structured vocabularies ASIST 2008
The SKOS data model 10 ASIST 2008 …views a knowledge organization system as a concept scheme comprising a set of conceptual resources (concepts). • These concept schemes and conceptual resources are identified by URIs. • The model is multilingual and extensible
Concepts can be… 11 ASIST 2008 labeled with any number of strings. One label, in any given language, can be indicated as the "preferred" label for that language, and others as "alternate“ labels, "hidden“ labels, or using a notation: • skos:prefLabel • skos:altLabel • skos:hiddenLabel • skos:notation
Concepts can be… 12 ASIST 2008 linked to other concepts within the same concept scheme. • Hierarchical links: • skos:broader and skos:narrower • skos:broaderTransitive and skos:narrowerTransitive • Associative links: • skos:related
Concepts can be… 13 ASIST 2008 grouped into collections, which can be labeled and/or ordered. A concept can be in one or more collections • skos: Collection • skos: OrderedCollection • skos: member • skos: memberList
Concepts can be… 14 ASIST 2008 mapped to other concepts in different concept schemes. • Hierarchical mapping: • skos:broadMatch • skos:narrowMatch • Associative mapping: • skos:relatedMatch • skos:closeMatch • skos:exactMatch
Advantages to using SKOS • SKOS has a defined element set which is particularly relevant for controlled vocabularies • Relationships between entries in a thesaurus can be expressed (broader, narrower, etc.) • Relationships between entries in different thesauri can be expressed (exactMatch, related) • Having a dereferencable URI for concepts and their concept schemes enhances the ability to provide web services for consumers of these standards ASIST 2008
Controlled vocabularies registry at LC • Library of Congress is establishing databases with controlled vocabulary values for standards that it maintains • Controlled lists are represented using SKOS as well as alternative syntaxes • Lists currently in progress: • ISO 639-2 and MARC language code list • MARC geographic area codes • MARC country code list • MARC relators • PREMIS controlled value lists • Thesaurus of Graphic Materials • Other possibilities • Enumerated values in MODS schema • Coded and uncoded value lists in MARC ASIST 2008
Reasons for developing a registry • Facilitate development and maintenance process • Make controlled lists openly available • Develop a web service where comprehensive information about controlled terms is available • Experiment with semantic web technologies • Expose vocabularies to a wider communities ASIST 2008
Example: ISO 639-2 vocabulary • One in the family of ISO 639 language coding standards • Has a close relationship with other language coding standards (ISO 639-1 and -3, MARC) • LC is maintenance agency • The standard is the CODE, not the language name; multiple names are given ASIST 2008
ISO 639-2 language code example <rdf:Descriptionrdf:about= "http://www.loc.gov/standards/registry/vocabulary/iso639-2/por"> <rdf:typerdf:resource="http://www.w3.org/2008/05/skos #Concept"/> <skos:prefLabelxml:lang="x-notation">por</skos:prefLabel> <skos:altLabelxml:lang="en-Latn">Portuguese</skos:altLabel> <skos:altLabelxml:lang="fr-Latn">portugais</skos:altLabel> <skos:notationrdf:datatype="xs:string">por</skos:notation> <skos:definitionxml:lang="en-Latn">This Concept has not yet been defined.</skos:definition> <skos:inSchemerdf:resource="http://www.loc.gov/standards/registry/vocabulary/iso639-2"/> <vs:term_status>stable</vs:term_status> <skos:historyNoterdf:datatype="xs:dateTime">2006-07-19T08:41:54.000- 05:00</skos:historyNote> <skos:exactMatchrdf:resource= "http://www.loc.gov/standards/registry/vocabulary/iso639-1/pt"/> <skos:changeNoterdf:datatype="xs:dateTime">2008-07- 09T13:49:05.321-04:00</skos:changeNote> </rdf:Description>
PREMIS controlled lists • PREMIS Data Dictionary for Preservation Metadata • Some semantic units call for controlled vocabularies and have suggested lists • A central registry could document and make them available • Users could submit their own terms • PREMIS schema could be enhanced with enumerated values for validation generated dynamically ASIST 2008
PREMIS event type example <rdf:Description rdf:about= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/creation"> <rdf:type rdf:resource= "http://www.w3.org/2008/05/skos#Concept"/> <skos:prefLabel xml:lang="en-latn"> creation</skos:prefLabel> <skos:narrower rdf:resource= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/migration"/> <skos:narrower rdf:resource= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/normalization"/> <skos:definition xml:lang= "en-latn">the act of creating a new object</skos:definition> <skos:inScheme rdf:resource= "http://www.loc.gov/standards/registry/vocabulary /preservationEvents"/> </rdf:Description>
Registry Web service XML Database using XQuery (eXist) RDF Triple Store (Sesame) HTTP request User Runs queryGets resultsSends back to database and then to user Interprets URIFormulates SPARQL query
Further development • Consider programming changes to improve speed • Develop mechanisms to output all public documentation from database • Include additional coding about relationships to other concept schemes and controlled vocabularies (facilitating crosswalks) • Encourage experimentation ASIST 2008
Questions? • Contacts: • Rebecca Guenther: rgue@loc.gov • Clay Redding: cred@loc.gov ASIST 2008