630 likes | 833 Views
Introduction to Protégé for Absolute Beginners. University at Buffalo August 11-12, 2012. Goal and Content of Tutorial. The goal of the tutorial is to explain how to translate ontologies into a language that can be processed by computers Three main sections by content:
E N D
Introduction to Protégé for Absolute Beginners University at Buffalo August 11-12, 2012
Goal and Content of Tutorial • The goal of the tutorial is to explain how to translate ontologies into a language that can be processed by computers • Three main sections by content: • Overview of the Web Ontology Language (OWL) • Hands-on training in Protégé, an OWL editor • Overview of SPARQL Protocol and RDF Query Language (SPARQL), a query language for retrieving and modifying ontologically grounded information
The Current State of Data Integration on the Web • Search engines return some remarkably precise results but the precision degrades as the topics become less standardized
The Current State of Data Integration in the Enterprise • Using more than a single software application carries a risk of added cost to combine the information they create. • Databases carry very little meta-data about the content of information they contain • Spreadsheets most often carry less
In the Social Network, Hashtags Cluster Information Into Categories • But the ambiguities of language reappear in the categories • and the lack of rigor in relating one category to another is an obstacle to machine based validation of usage.
The Value Added by OWL Ontologies to Data Integration • Ontologies endow terms with machine processable definitions and disambiguate different senses of the same expression • Ontologies place restrictions on how terms can be related to other terms so that misuse and inconsistencies can be detected.
The Ontologized Web, Enterprise and Social Network • What if creators of web pages, databases, and blogs used terminology from curated ontologies to annotate their content? • Standardized ways of describing the structures to represent data is accepted, why not extend that acceptance to annotation of content? • Expected Benefits: • The precision of search increase dramatically • Data from different sources can be merged • Gaps in information can be identified • Falsehoods and incoherent expressions can be detected
Resource Description Framework (RDF) • Designed to be a language for making assertions about resources • A Resource* is • an electronic document, an image, a source of information with a consistent purpose • not necessarily accessible via the Internet; e.g., human beings, corporations, and books in a library can also be resources. • an abstract concept such as the operators and operands of a mathematical equation or types of a relationship (e.g., "parent" or "employee“) *derived from RFC 3986-Uniform Resource Identifier (URI): Generic Syntax from http://tools.ietf.org/html/rfc3986
Expressing Information in RDF • Statements are always expressed in the form of a triple: • Subject – Predicate – Object (a.k.a. RDF Triple) • Translating the statement “Austria’s GDP per capita is 30,500 Euros” into RDF requires breaking it into triples
Universal Resource Identifiers (URIs) and Literals • URIs are unique names of resources • http://dbpedia.org/page/Austria • http://en.wikipedia.org/wiki/Austria • Literals • Can be a simple raw text value • can be annotated with a language tag as in “Austria”@en • can be typed with a datatype as in “30,500Euros”^^string
Rules for RDF Statements • Subject and Predicate have to be URI named resources • Object – can be either a URI named resource or a literal
Applying the Rules Using “dbpedia:”, “ro”, and “example:” as prefixes for: http://dbpedia.org/page, http://www.obofoundry.org/ro, and http://www.myexample.com/resource respectively, Which of the following are well-formed RDF statements?
RDF Graphs Nodes dbpedia: Austria example: Austrian_GDPperCapita example:has_economic_indicator example:has_value Edges 30,500Euros^^string> The direction of the edges is always away from the subject and towards the object of the statement
Graphing RDF How would the following be represented in a RDF Graph?
Graphing RDF game1: MonopolyTokenBoot_Game1 mnply:represented_by game1: Monopoly Game_Game1 game1: Monopoly Player_1 mnply: Monopoly Player mnply:competes_in rdf:type mnply:has_role game1: Monopoly Banker_ Game1
How far does RDF take us toward our goal? • The value of RDF lies in the use of URIs, as it allows distinct information sources to share a common meaning for terms • Every occurrence of the same URI is a reference to the same resource • There is no inference with RDF, no way to validate use of URIs.
RDF Schema (RDFS) “RDF Schema defines classes and properties that may be used to describe classes, properties and other resources”* RDFS defines terms that can describe classes of things and the relationships that hold between these classes *RDF Vocabulary Description Language 1.0: RDF Schema from http://www.w3.org/TR/rdf-schema/
The Need for RDFS • RDF can name, but not define, resources or the relationships that hold between them • But what about…
The Need for RDFS • Machines cannot process elements of an expression that lie outside of RDF. To a machine our example looks like: • We need language elements that enable a machine to process relationships between entities
RDFS Types • Allows a resource to be typed as a class (i.e. a collection of individuals) • Allows a class to be defined as a subclass of another class (i.e. all individuals that it contains are contained in the other) • Allows a property to be defined as a subproperty of another property
RDFS Taxonomies • Enables the creation of taxonomies of both classes and properties Class Taxonomy Property Taxonomy
RDFS Vocabulary rdfs:Resource rdfs:Class rdfs:Literal rdfs:Datatype rdfs:range rdfs:domain rdfs:subClassOf rdfs:subPropertyOf rdfs:label rdfs:comment rdfs:ContainerMembershipProperty rdfs:Member rdfs:seeAlso rdfs:isDefinedBy
RDFS Vocabulary in Action • rdfs:subClassOf is used to assert that every instance of a class is an instance of another. • If a resource is rdf:typedbpedia:Apple, a reasoner will assert that the resource is also rdf:typedbpedia:Fruit example:NewtonsApple dbpedia:Apple dbpedia:Fruit rdfs:subClassOf rdf:type rdf:type
RDFS Vocabulary in Action • rdfs:subPropertyOf is used to assert that every pair of resources that are related by a property are also related by another. • If Ann is the sister of Ben and is sister of is a subproperty of is sibling of, then a reasoner will assert that Ann is a sibling of Ben
RDFS Vocabulary in Action • rdfs:domain is used to assert that a property is always applied to instances of one or more classes. • If Ann is related to Ben via the ex:is_sister_of property, a reasoner will assert that Ann is rdf:typeex:Female example: Ann Example: Ben example: Female example:is_ Sister_of rdf:type
RDFS Vocabulary in Action • rdfs:range is used to assert that the instances of the object of a property are always of one or more classes or datatypes • If Newton’s apple is related to Newton’s apple tree via the ex:is_borne_by property, a reasoner will assert that Newton’s apple tree is rdf:typedbpedia:Plant example: Newton’s Apple Example: Newton’s Apple Tree dbpedia: Plant example:is_ borne_by rdf:type
RDFS Vocabulary in Action • rdfs:label is used to provide a human readable version of a resource’s name. • If a GUID is used as the identifier for the class of Apple, then use rdfs:label to assign as many human readable versions as desired.
RDFS Vocabulary in Action • rdfs:comment is used to provide a human-readable description of a resource Both comments are reused from http://dbpedia.org/page/Apple
RDFS Vocabulary in Action • rdfs:seeAlso is used to assert that a resource provides additional information about the subject resource.
RDFS Vocabulary in Action • rdfs:isDefinedBy is used to assert that a resource defines the subject resource.
How far does RDFS take us toward our goal? • Contains elements that enable machine inferencing on necessary conditions (e.g. Apples are the fruit of the apple tree) • Doesn’t allow restrictions on classes that would enable inferencing on sufficient conditions (e.g. Apples are the fruit of the apple tree) • Doesn’t provide a way to exclude resources from class membership, can’t validate assertions.
Web Ontology Language (OWL*) • OWL is the descendant of Knowledge Representation Languages of the 1990’s such as Simple HTML Ontology Extensions (SHOE) and Ontology Inference Layer (OIL) and from the DARPA Agent Markup Language (DAML) • The initial version of OWL became a formal W3C Recommendation on February 10, 2004 • OWL 2 became a W3C Standard on October 27, 2009 * why “OWL” instead of “WOL” http://lists.w3.org/Archives/Public/www-webont-wg/2001Dec/0169.html
The Need for OWL • RDFS lacks the expressive power allow inferences about individuals beyond their class membership. • Based on this equivalence a machine can infer only that the two classes have the same instances. • We want to enable a machine to infer the attributes of an individual based upon the definition of the class of which they are members
OWL Usage “The W3C OWL 2 Web Ontology Language (OWL) is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things. OWL is a computational logic-based language such that knowledge expressed in OWL can bereasoned with by computer programs either to verify the consistency of that knowledge ortomakeimplicit knowledge explicit.”* * http://www.w3.org/TR/owl2-primer/
Defining Classes -Enumeration Use owl:oneOf to enumerate the members of a class In Manchester Syntax Class: MonopolyToken EquivalentTo: {Battleship , Boot , Car , Dog , Thimble , Top_Hat , Wheelbarrow, Iron} SubClassOf: Thing
Defining Classes - Restrictions • owl:Restriction creates a class defined using an object property and either: • a value constraint which places a constraint on the range of the property when applied to this particular class • e.g. the rdfs:range of the is_borne_by property might be plant, but when defining apple we would constrain the range to the class of apple trees • a cardinality constraint which places a constraint on the number of values a property can take in the context of a particular class • e.g. there can be no more than 8 players in a game of Monopoly
Additional Inferences Gained Through Restrictions Without a restriction all that can be inferred about an improved property is that it must also be a property Class: MonopolyImprovedProperty SubClassOf: MonopolyProperty Adding a restriction adds the information that an improved property must be a property and that it must be the location of some building Class: MonopolyImprovedProperty EquivalentTo: location_of some MonopolyBuilding SubClassOf: MonopolyProperty
rdfs:subClassOf vs. owl:equivalentClass property that is the location of a building ? is a subclass of Virginia Place is the location of House 1 ? improved property property that is the location of a building is an equivalent class of Virginia Place is the location of House 1 improved property
owl:allValuesFrom vs. owl:someValuesFrom • owl:allValuesFrom constrains the object property so that its value must come from the specified class or data range • Example: A mortgaged property is one such that it is owned only by the bank • owl:someValuesFrom constrains the object property so that at least one of its values must come from the specified class or data range • Example: An improved property is the location of some building
owl:hasValue • The owl:hasValue constraint limits an object property to a given value, which can be either an individual or a data value. For example we could use this constraint to assert that all monopoly railroads have a price of 200. Class: MonopolyRailroad SubClassOf: has_price value 200, MonopolyProperty • Given an resource that is a Monopoly Railroad a reasoner will infer that its price is 200. game1:ReadingRailroad mnply: Monopoly Railroad mnply: has_price = 200 rdfs:subClassOf rdf:type 200 mnply:has_price
owl:hasValue • To define the class of New York City building we can use owl:hasValue on the property of located_in and the individual NewYorkCity Class: NewYorkCityBuilding SubClassOf: located_in value NewYorkCity, Building • Given an resource that is a New York City building a reasoner will infer that its location is New York City. example: EmpireState Building example: NewYorkCity Building example: located_in NYC rdfs:subClassOf rdf:type example:NewYorkCity example:located_in
Cardinality Constraints • Useful in expressing that a class has an exact number of relationships to another class or data range. Example: A turn has exactly one player as a participant and exactly one integer as its ordinal value Class: MonopolyTurn Annotations: rdfs:label "Monopoly turn"^^xsd:string SubClassOf: has_ordinal_value exactly 1 xsd:integer, has_participant exactly 1 MonopolyPlayer, occurs_containing some MonopolyRollOfDice, occurs_during some MonopolyRound, MonopolyEvent