690 likes | 711 Views
Introduction to Ontologies. Adding Meaning to Metadata. Brian Lowe Metadata Working Group February 16, 2007. So…. what. the. exactly. talking about. heck. are. we. ontologies are really, really. simple. Thing. eats. Person. Food. Ontologies can also be really, really. complex.
E N D
Introduction to Ontologies Adding Meaning to Metadata Brian Lowe Metadata Working Group February 16, 2007
So… what the exactly talking about heck are we
ontologies are really, really simple. Thing eats Person Food
Let’s back up a bit We store data and metadata in all kinds of ways. We’re probably all familiar with a database record: Record Record Number: 289425 Title: Metamorphosis Author: Kafka, Franz Publication date: 1946 Publisher: Vanguard Press Type: book
What do we do when we want to express something else? We need to add another field. Record Record Number: 289425 Title1: Metamorphosis Title2: Die Verwandlung Author: Kafka, Franz Publication date: 1946 Publisher: Vanguard Press Type: book
Say we want unlimited titles. We need to add another table. Thing Record Number: 289425 Author: Kafka, Franz Publication date: 1946 Publisher: Vanguard Press Type: book Title 289425 Metamorphosis 289425 Die Verwandlung 20027 Dr. Strangelove
Well-designed databases tend to deal with lots of relationships between different elements of data. The way the relationships are set up is called the data model Relational databases are great. Until you want to share your data with someone else who isn’t running the same database software or who doesn’t understand what you’ve done.
OK, no problem. Why don’t we just create a standardized way of shipping data around. Let’s call this standard XML. <?xml version=“1.0” encoding=“UTF-8”?> <things> <thing id=“289425”> <title>Metamorphosis</title> <title noindex=“4”>Die Verwandlung</title> <author>Kafka, Franz</author> <publisher>Vanguard Press</publisher> <publicationDate>1946</publicationDate> <type>book</type> </thing> <thing id=“20027”> <title>Dr. Strangelove</title>
XML is great. • we can use standardized tools • XML is readable by both machines and humans (in theory) • we can create rich schemas that will let us check whether an XML document is valid
XML alone is all about trees. But sometimes trees aren’t enough. What about all those complex relationships?
one (nonstandard) way of breaking out of a tree <eml:eml packageId="gss1.37.2" system="knb" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.0.1 eml.xsd" scope="system"> <dataset scope="document"> <title>Test GIS data upload</title> <creator id="1170948373895" scope="document"> <individualName> <surName>steinhart</surName> </individualName> <organizationName>mann</organizationName> <positionName>librarian</positionName> </creator> <abstract> <para>test upload of a 58MB GIS data file w/eml record</para> </abstract> <contact scope="document"> <references system="document">1170948373895</references> </contact>
Let’s use one standard data model. RDF: Resource Description Framework librarian position creator Dataset1.37.2 Gail contact organization Mann Library graphs instead of trees
Everything is expressed as statements or triples Subject —— Property —— Object (predicate) Thing289435 title “Metamorphosis” Thing289435 title “Verwandlung” Thing289435 author “Kafka, Franz” Thing289435 type book
If everything’s a triple, we can store new things very easily. Subject —— Property —— Object (predicate) Thing289435 title “Metamorphosis” Thing289435 title “Die Verwandlung” Thing289435 author “Kafka, Franz” Thing289435 type book Thing289435 comment “This is the one where Gregor Samsa wakes up as a cockroach.” Thing289435 callNumber “PT2621.A25 V5 1946”
“Triple Stores” S—P—O S—P—O S—P—O S—P—O S—P—O TRIPLE STORE There are various query languages for RDF, similar to SQL We can select all the triples where the subject is Thing289435 Or select all the triples where the property is “title.”
RDF: Resource Description Framework What’s a resource? Something we assign a specific identifier or URI (Uniform Resource Identifier). http://www.somerandomlibrary.org/ourthings/Thing289435 We use this URI as the subject or object of a triple. We can now mash this up with a whole bunch of other triples and not get confused about which thing we’re describing.
RDF: Resource Description Framework We also assign URIs to the properties. http://purl.org/dc/elements/1.1/title Subject http://www.somerandomlibrary.org/ourthings#Thing289435 Property http://purl.org/dc/elements/1.1/title Object “Metamorphosis” Now, anything that understands what a Dublin Core title is can find the title of our book.
RDF: Resource Description Framework We even use URIs with things that aren’t resources. Subject http://www.somerandomlibrary.org/ourthings/Thing289435 Property http://purl.org/dc/elements/1.1/title Object “Metamorphosis”^^http://www.w3.org/2001/XMLSchema#string
“Semantics” This is “semantic” metadata in its simplest sense. We’ve explicitly stated what kind of relationship exists between two things. But it’s still up to software or humans to understand what the different properties actually “mean”
Ontologies • describe what we mean in some ways that machines can understand. • are a standardized way of modeling the ways the different pieces of data relate to one another • Ontologies have been around for decades, but there is an increasing interest in sharing them over the Web.
So how do we make an ontology? • We need to decide what kinds of things we want to talk about (classes) • We also need to describe what kind of relationships they can have (properties)
Class hierarchy (also called the taxonomy or the terminology box (“TBox”)) Thing Person Employee Academic Employee Non-Academic Employee Faculty member Librarian Cataloger Programmer
Class hierarchy Arrows represent “subclass of” or “is a” relationships Thing • A faculty member is an academic employee • A faculty member is an employee • A faculty member is a person • (A faculty member is a thing.) Person Employee Academic Employee Non-Academic Employee Faculty member Librarian Cataloger Programmer
Class hierarchy The classes here are not disjoint. We can assert that someone is a librarian. We can assert that the same individual is also a faculty member, and that’s not a problem. Thing Person Employee Academic Employee Non-Academic Employee Faculty member Librarian Cataloger Programmer
Class hierarchy Let’s make some classes disjoint. Now if we try to assert that something is both a faculty member and a cow, the ontology will tell us that these statements are inconsistent with our model. Thing disjoint Person Farm Animal Employee Cow Academic Employee Non-Academic Employee Faculty member Librarian Cataloger Programmer
Making a class hierarchy can be tricky How do we model an organization? Cornell University Organization charts are typically organized by what things are part of CUL CALS A&S Asian Studies LTS Plant Pathology Crop & Soil Sciences IRIS
Making a class hierarchy can be tricky This is not a valid class hierarchy. Why not? University Library System College College Department Library Department
Making a class hierarchy can be tricky This is not a valid class hierarchy. Why not? Plant Biology is a College Department. Plant Biology is a College. (NO!) Plant Biology is a University. (NO!) University Library System College College Department Library Department Plant Biology
Making a class hierarchy can be tricky Let’s try this instead. Organization { Siblings disjoint College Library System Department University Maybe not the best model, but it works.
Let’s add a property subunitOf Organization subunitOf { Siblings disjoint College Library System Department University Plant Biology
Let’s add a property subunitOf Now we can assert things like: subject property object CALS subunitOf Cornell CUL subunitOf Cornell Arts&Sciences subunitOf Cornell Plant Biology subunitOf CALS LTS subunitOf CUL and model our organization chart.
Property hierarchies As with classes, properties can be arranged in a hierarchy. partOf subpropertyOf subunitOf
Property Hierarchies subunitOf Now if we assert statements like: subject property object CALS subunitOf Cornell CUL subunitOf Cornell Arts&Sciences subunitOf Cornell Plant Biology subunitOf CALS LTS subunitOf CUL Our ontology tell us these statements must also be true: subject property object CALS partOf Cornell CUL partOf Cornell Arts&Sciences partOf Cornell Plant Biology partOf CALS LTS partOf CUL
Property hierarchies Another example. memberOf subpropertyOf headOf
Things that are tricky to do with ontologies / statements What if we want to express things that aren’t simple subject-predicate-object statements? Mike took a picture of a moose with a Nikon camera in Maine.
Event-based ontologies ABC Ontology / Harmony Project http://metadata.net/harmony/ - events - participants in events - tools used in events - outcomes of events
W3C “Technologies” Gives us the simple standard data model that lets us draw graphs and show how things are related to one another RDF RDF Schema (RDFS) Lets us construct basic ontologies and build class and property hierarchies Web Ontology Language (OWL) Lets us do significantly more complex things.
RDF Schema Inferencing To make an inference is to add new statements based on existing ones. Software that understands RDF Schema can make the kinds of simple inferences we’ve seen so far: From: Dr.Smith type Faculty Member Joe Jones headOf Finance Committee RDFS inferencing adds: Dr.Smith type Person Joe Jones memberOf Finance Committee Why? Faculty Member is a subclass of Person. headOf is a subproperty of memberOf.
RDF Schema Limitations Usually when we relate two things with a property, it’s very useful if the relationship is bidirectional. David Skorton presidentOf Cornell University implies Cornell University hasPresident David Skorton RDF Schema doesn’t come with a very good way of handling this.
“RoleNoun” One way of dealing with this is to use a naming convention. president is president of Software that assumes this convention can automatically the text to display for the inverse property. The Dublin Core properties are largely compatible with this convention: publisher is publisher of contributor is contributor of (Doesn’t work!)
Inferencing More complex inferencing with OWL usually requires a separate inference engine (also known as a reasoner or classifier). Flavors of OWL OWL “Tiny” OWL Lite OWL DL (Description Logics) Inference engines get increasingly complex. Inference engines choke. Very expressive; bad for reasoning. OWL Full
OWL Basics Object Properties relate resources to other resources Datatype Properties relate resources to literals Most software supports only string and integer datatypes. Classes overlap by default Must specify which classes are disjoint. (But can’t do this if we’re using OWL Lite!)
Stuff OWL Gets Us Explicit Inverse Properties hasPresident presidentOf OWL allows us to specify that these two properties are inverses of each other. Cornell hasPresident David Skorton OWL inferencing automatically adds: David Skorton presidentOf Cornell University
Stuff OWL Gets Us Transitive Properties partOf If Ithaca is part of Tompkins County and Tompkins County is part of New York State and New York State is part of the United States then Ithaca is part of the United States.
Stuff OWL Gets Us Transitive Properties partOf OWL lets us specify that is property is transitive If we assert these statements… Ithaca partOf Tompkins County Tompkins County partOf New York State New York State partOf United States
Stuff OWL Gets Us Transitive Properties an OWL reasoner fills in these additional statements: Ithaca partOf New York State Ithaca partOf United States Tompkins County partOf United States
Stuff OWL Gets Us Transitive Properties This time, let’s also say the partOf and hasPart are inverses of each other. Again, we’ll assert: Ithaca partOf New York State Ithaca partOf United States Tompkins County partOf United States
Stuff OWL Gets Us Transitive Properties Now the OWL reasoner adds these: Ithaca partOf New York State Ithaca partOf United States Tompkins County partOf United States United States hasPart New York State United States hasPart Tompkins County United States hasPart Ithaca New York State hasPart Tompkins County New York State hasPart Ithaca Tompkins County hasPart Ithaca
Stuff OWL Gets Us Transitive Properties We put in three statements manually and got nine more free. What good is this? Makes it easier to query the data in different ways. We sacrifice some space (store more stuff) to make it faster to get the answer we want.