360 likes | 516 Views
INLS 520. Information Organization. Review. Controlled vocabularies Term Lists, Hierarchies, Trees, Paradigms, Facets, Folksonomies Knowledge organization systems Term Lists, Thesauri, Taxonomies, Ontologies. Today. Protege tutorial Create a thesaurus Create an ontology Ontologies
E N D
INLS 520 Information Organization INLS 520 Erik Mitchell
Review • Controlled vocabularies • Term Lists, Hierarchies, Trees, Paradigms, Facets, Folksonomies • Knowledge organization systems • Term Lists, Thesauri, Taxonomies, Ontologies INLS 520 Erik Mitchell
Today • Protege tutorial • Create a thesaurus • Create an ontology • Ontologies • Basic concept • Building in protege • RDF (?) • OWL (?) INLS 520 Erik Mitchell
Assignment 1 recap • Required XML tags • <?XML... ?> • Required DC elements • None, need a content wrapper <dc> and at least one element <title>, <author>, etc. • Advanced Concepts • Namespaces • Schemas/DTDs • MARC & DC • Advantages / disadvantages • Techniques for discovering data • View Source • DC DOT Metadata generator INLS 520 Erik Mitchell
CV Concepts & definitions • Controlled Vocabularies • Organized Lists • Relationships between concepts • Knowledge organization systems • Typed relationships • Direct / inferable knowledge INLS 520 Erik Mitchell
Thesauri Definitions • “Guide to use of terms, showing relationships between them, for the purpose of providing standardized, controlled vocabulary for information storage and retrieval”(Monash) • “A list of words showing similarities, differences, dependencies, and other relationships to each other”(USG) INLS 520 Erik Mitchell
Thesauri Concepts • Preferred terms • Non-preferred terms • Semantic relations between terms • How to apply terms (guidelines, rules) • Scope notes • Adding terms (How to produce terms that are not listed explicitly in the thesaurus) INLS 520 Erik Mitchell
Common thesaural identifiers • SN Scope Note • Instruction, e.g. don’t invert phrases • USE Use (another term in preference to this one) • UF Used For • BT Broader Term • NT Narrower Term • RT Related Term INLS 520 Erik Mitchell
Thesauri Guides • National Information Standards Organization. (2005). Guidelines for the construction, format, and management of monolingual thesauri. ANSI/NISO Z39.19-2005. Bethesda, MD: NISO Press. • http://www.niso.org/standards/resources/Z39-19-2005.pdf?CFID=5559601&CFTOKEN=31747314 • Aitchison, Jean & Gilchirist, Alan. Thesaurus Construction: A Practical Guide. 3rd ed. London: Aslib, 1997. • Willpower Information Management Consultants • http://www.willpower.demon.co.uk/thesprin.htm INLS 520 Erik Mitchell
Ontology Definitions • “The study of being or existence” • “A conceptualization of a specification” (Gruber) • “An ontology formally defines a common set of terms that are used to describe and represent a domain.” (OWL) INLS 520 Erik Mitchell
Webster’s Dictionary • Webster’s Third New International Dictionary defines Ontology as: • A science or study of being, specifically a branch of metaphysics*relating to the nature and relations of being. • A theory concerning the kinds of entities and specifically the kinds of abstract entities that are to be admitted to a language system. *Metaphysics: Nature of being “or” existence. INLS 520 Erik Mitchell
Ontology Concepts • Classes • Names of objects in the domain • Relationships between classes • Connections between classes • Properties of classes • Background or identifying knowledge of these objects • Constraints on these properties & relationships • Limits and parameters of the relationships INLS 520 Erik Mitchell
Class exercise • Protégé overview • Orientation • Object types (Classes, Slots, Instances) • Relationships (hierarchies, associative) • As a group, we will work through the protege training guide • http://protege.stanford.edu/doc/tutorial/get_started/get-started.pdf INLS 520 Erik Mitchell
What is the semantic web • URI (Universal resource identifier) • OWL/RDFS • All built on top of regular web • RDF underlying language of semantic web • Xml represents data (document based) • RDF represents pure information (anyone can use, re-harvestable), you could call this knowledge • Examples • Swoogle • Goog411 INLS 520 Erik Mitchell
Ontologies (review) • “A common set of terms that are used to describe and represent a domain” • Classes, Relationships, Properties, Constraints • A formal organization of knowledge • The primary role of an ontology is to define a language which people and computers in a given domain can share INLS 520 Erik Mitchell
A good ontology has • Features: • Meaningful – all classes have instances • Accurate / correct • Non-redundant – each class/instance is represented in a single way • Rich in description – context, content • Enabled functionality: • Able to use queries to connect new pieces of information • Use XML & definitions to integrate knowledge across domains INLS 520 Erik Mitchell
Ontology Continuum • Keyword Lists • Basic Thesauri • Complex Thesauri • Taxonomies • Simple Ontologies (wordnet) • Complex Ontologies (OWL) INLS 520 Erik Mitchell
SHOE Ontology project – • Possible to build an ontology for anything • Simple HTML Ontology Extensions (SHOE) Project • http://www.cs.umd.edu/projects/plus/SHOE/ • http://www.cs.umd.edu/projects/plus/SHOE/html-pages.html • Sample projects • Beer Ontology • http://www.cs.umd.edu/projects/plus/SHOE/onts/index.html#beer • Document Ontology • http://www.cs.umd.edu/projects/plus/SHOE/onts/docmnt1.0.html INLS 520 Erik Mitchell
Ontology Concepts • Multiple inheritance • Vertical and horizontal relationships • Decomposed subject/object • Predicate based description (isRelatedto, hasVersion) • First Order Predicate Logic • Statements broken down into subjects/predicates • Proposition • All men are mortal, Socrates is a man • Therefore • Socrates is mortal INLS 520 Erik Mitchell
Creating a CV review • Design methods • Re-use existing, start with content & desired use ideas • Committee / community approach • Top-down • Concept driven • Bottom-up • Document driven • Empirical approach • Deductive approach • Select terms, create relationships, perform term control • Inductive approach • Establish CV at outset, build hierarchies on as needed basis INLS 520 Erik Mitchell
Top-Down Identify audience Identify all topics, concepts, uses, and context of the domain Sort topics identified into an appropriate organization scheme (enumerative, hierarchical, faceted) Solidify structure and clean up gaps & redundancies Assign documents to categories, test retrieval Bottom-up Identify audience Survey documents for topics/concepts. Build system on the fly – let content drive structure and limits of system Identify gap & redundancies in system Test retrieval Creating a CV review (2) INLS 520 Erik Mitchell
Creating a CV review (3) • Think about scope, use, content, maintenance • Gather Terms • Based on existing systems, content • Based on user needs/expectations • Investigate issues of specificity, exhaustivity, granularity • Build hierarchies, relationships • Broader/narrower terms, Related terms, Use/Use for, see/see also • Establish Rules • Implement • Evaluate • Maintain http://www.boxesandarrows.com/view/creating_a_controlled_vocabulary INLS 520 Erik Mitchell
Creating an Ontology • Determine Scope of field, define boundaries • Check for existing ontologies, vocabularies • Select a top-down/bottom-up approach • Identify concepts, vocabulary, parameters, constraints • Identify relationships • Multiple hierarchies, inheritance • Build, test, maintain INLS 520 Erik Mitchell
Class exercise • Design your own ontology • In Groups, pick a domain of knowledge • Type of food (pizza, soup, beer), field of study (library science, math), etc • Come up with a basic ontological framework and begin creating it in Protege • Be prepared to share a brief overview with the class which will include • Domain area • Top level classses • Instance definitions • Relationships INLS 520 Erik Mitchell
Assignment 2 • Overview • In this assignment you will create an ontology on a topic of your choice. Your ontology should contain multiple classes and instances and be focused on a specific purpose. This assignment includes an implementation of the ontology in Protégé and a brief paper explaining your ontology. • Guidelines • Select a topic of interest and determine the top level (i.e. Basketball, Chocolate, etc). • Define the scope (depth/breadth) and purpose of the ontology. Define specific classes and facets (known as slots in Protégé) that describe those classes. Your ontology should have between 5-10 classes with multiple (2-5) slots for each class. Think about the use of hierarchy and multiple inheritance in your ontology. • Summarize your ontology in a short paper (no more than two pages). Outline your ontology and discuss your rationale and key decisions (e.g. scope, purpose, classes and slots, defining relationships) • Implement the ontology in Protégé. Define your classes and instances. Create two queries that illustrate ways in which the data could be retrieved. • Dates & groupwork • Due – November 6th • Groupwork is acceptable INLS 520 Erik Mitchell
RDF • Subject, property, object triples • Transmitted in xml • RDFS extends RDF with an ontology language • Properties, specialization • OWL • More powerful extension of RDFS • Uses same syntax of RDF INLS 520 Erik Mitchell
RDF Model Author Webpage: http://www.stuff.com “Saki Knafo” (Value) Object (Resource) Subject (Property type) Predicate • “The author of the stuff webpage is Saki Knafo” • A literal, a triple, a statement INLS 520 Erik Mitchell
How is RDF different? • RDF is a descriptive model that • Allows variable contextualized description • Deconstructs the descriptive process • Allows more granular automated processing of data • Uses exact markup to indicate the context of values (namespaces, schemas) INLS 520 Erik Mitchell
Encoding RDF in XML <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdf:Description rdf:about="http://www.stuff.com/"> <dc:title>The Hang: The Island of Black Jeans</dc:title> <dc:creator>SAKI KNAFO</dc:creator> <dc:date>Sun, 16 Sep 2007 01:04:40 GMT</dc:date> <dc:description>descriptive content</dc:description> </rdf:Description> </rdf:RDF> INLS 520 Erik Mitchell
Iterative RDF description <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:vcard="http://dli.grainger.uiuc.edu/publications/metadatacasestudy/dc_schemas/vcard.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdf:Description rdf:about=“http://www.stuff.com"> <dc:title>The Hang: The Island of Black Jeans</dc:title> <dc:creator rdf:href = "#Creator_001"/> <dc:identifier>http://www.stuff.com</dc:identifier> <dc:date>Sun, 16 Sep 2007 01:04:40 GMT</dc:date> <dc:description>descriptive content</dc:description> </rdf:Description> <rdf:Description ID="Creator_001"> rdf:about="http://dli.grainger.uiuc.edu/publications/metadatacasestudy/dc_,,,"> <vcard:given>Saki</vcard:given> <vcard:family>Knafo</vcard:family> <vcard:email> <vcard:userid>knafo@www.nytimes.com</vcard:userid> </vcard:email> </rdf:Description> </rdf:RDF> INLS 520 Erik Mitchell
RDFS • RDF Schema • Defines additional rdf elements that help type relationships • Special Classes • Based on RDF Classes / Properties / Attributes with additional • http://www.w3schools.com/rdf/rdf_reference.asp • Allows the creation of vocabularies / ontologies INLS 520 Erik Mitchell
OWL (Web Ontology Language) • An ontolgy that is geared towards representing information on the web • Classes, properties, and relationships that describe URIs and their facets. • Based on the Triple concept • Subject, Predicate, Object • 3 versions: OWL-Lite, OWL-DL, OWL-Full • Formatted in RDF/XML • Uses RDF and RDFS as a foundation • Adds new elements in the owl namespace INLS 520 Erik Mitchell
OWL Versions • OWL-Lite • Simple hierarchies, constraints • OWL-DL • Uses description logics • Logic-based semantic markup based on first-order predicate logic • Still guarantees finite relationship processing • Best suited for automation • OWL-Full • Most complex • Open ended, possible to get into infinite processing INLS 520 Erik Mitchell
OWL Example <?xml version="1.0"?> <rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:rdfs="http://www.w3.org/2000/01/rdfschema#" xmlns:owl=http://www.w3.org/2002/07/owl# xmlns=http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/part.owl# xml:base="http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/part.owl"> <owl:Ontology rdf:about=“> <owl:versionInfo rdf:datatype="http://www.w3.org/2001/X...">1.0</owl:versionInfo> <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string" > An ontology containing the basic part relations: partOf, hasPart, partOf_directly, and hasPart_directly. These are described in the accompanying note. Author: Chris Welty </rdfs:comment> </owl:Ontology> <owl:TransitiveProperty rdf:ID="partOf"> <owl:inverseOf> <owl:TransitiveProperty rdf:ID="hasPart"/> </owl:inverseOf> </owl:TransitiveProperty> <owl:ObjectProperty rdf:ID="hasPart_directly"> <rdfs:subPropertyOf rdf:resource="#hasPart"/> <owl:inverseOf> <owl:ObjectProperty rdf:ID="partOf_directly"> <rdfs:subPropertyOf rdf:resource="#partOf"/> </owl:ObjectProperty> </owl:inverseOf> </owl:ObjectProperty> </rdf:RDF> (Chris Welty) INLS 520 Erik Mitchell
More OWL Examples • Airport • Pizza INLS 520 Erik Mitchell
Next Week(s) • Fall Break – Enjoy • 10/30 – Guest speaker Lorrie Eakin • 11/6 – First Group presentations INLS 520 Erik Mitchell