980 likes | 998 Views
Explore the world of semantic web technologies, markup languages, and reasoning models used to represent data on the web. Learn about RDF, XML, and OWL, and how they facilitate data integration and processing on the World Wide Web.
E N D
Semantic Basics: Markup, Querying, and Reasoning Marlon Pierce Community Grids Lab Indiana University With Slides and Help from Sean Bechhofer, Carole Goble, Line Pouchard, and Dave De Roure
Reductio ad Absurdum “Physics is the study of the harmonic oscillator.” • H. L. Richards “Statistical Mechanics is the study of the Ising Model” • H. L. Richards “Web Service standards are the study of <xsd:any> sequences” • M. E. Pierce, soon to be anonymous
Which Web Service Specs? <xs:element name="Header" type="tns:Header" /> <xs:complexType name="Header"> <xs:sequence> <xs:any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> <xs:anyAttribute namespace="##other" processContents="lax" /> </xs:complexType> <xsd:complexType name="SecurityHeaderType" > <xsd:sequence> <xsd:any processContents="lax" minOccurs="0" maxOccurs="unbounded"> </xsd:any> </xsd:sequence> <xsd:anyAttribute namespace="##other" processContents="lax" /> </xsd:complexType>
Which, What, and Why? Which is what? • Left is the definition of the SOAP header. • Right is taken from Web Service Secure Messaging Specification. • You will find this pattern repeated pretty often in web service specifications. Why? • We have limited ways of linking several XML schema data models. Imagine schemas for science applications and computing resources. • XML maps relationships to trees. Link application and computer schemas with <xsd:any>. In my application+computer schema, does application contain computer as child node, or vice versa? • Graphs are a more natural way of expressing many inter-relationships of concepts.
XML is not enough “The Creator of the Resource “http://www.w3.org/Home/Lassila” is Ora Lassila XML defines grammars to verify and structure documents The grammar enforces constraints on tags Different grammars define the same content XML lacks a semantic model – it only has a surface model which is a tree. Creator Ora Lassila http://www.w3.org/Home/Lassila <Creator> <uri> http://www.w3.org/Home/Lassila </uri> <name>Ora Lassila</name> </Creator> <Document uri=“http://www.w3.org/Home/Lassila” <Creator>Ora Lassila</Creator> </Document> <Document uri=“http://www.w3.org/Home/Lassila” Creator=“Ora Lassila”/>
XML is not enough Meaning of XML documents is intuitively clear • “semantic” markup tags are domain terms But computers do not have intuition • Tag names per se do not provide semantics • The semantics are encoded outside the XML specification XML makes no commitment on: Domain specific ontological vocabulary Ontological modeling primitives requires pre-arranged agreement on & Feasible for closed collaboration • agents in a small & stable community • pages on a small & stable intranet Semantic Web Markups often are expressed in XML but they carry extra meaning.
Enter the Semantic Web/Grid “The Semantic Web is the representation of data on the World Wide Web. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming.”
The Semantic Stack Defines the syntax for structured documents. Defines rules for XML dialects (SVG, GML, etc.) and also built-in data types. A data model definition language with XML bindings XML XML Schema RDF RDF Schema A way to define RDF-based languages (DAML-OIL, OWL). OWL An extension of RDF/RDFS with extensive property/relationship definitions for expressing logical relationships.
Semantic Markups All semantic markup languages should be understood as assertion languages. • We will assert that certain relationships between resources exist. • We will express this using RDF, RDFS, and OWL using XML We must still provide tools for processing (and verifying) the assertions.
Resource Description Framework Overview of RDF basic ideas and XML encoding.
Resource Description Framework (RDF) RDF is the simplest of the semantic languages. Basic Idea #1: Triples • RDF is based on a subject-verb-object statement structure. • RDF subjects are called resources (classes) • Verbs (predicates) are called properties. • Objects (values) may be simple literals or other resources. Basic Idea #2: Everything is a resource that is named with a URI • RDF nouns, verbs, and objects are all labeled with URIs • Recall that a URI is just a name for a resource. • It may be a URL, but not necessarily. • A URI can name anything that can be described Web pages, creators of web pages, organizations that the creator works for,….
RDF Graph Model RDF is defined by a graph model. Resources are denoted by ovals (nodes). Lines (arcs) indicate properties. Squares indicate string literals (no URI). Resources and properties are labeled by a URI. http://.../CMCS/Entries/X http://purl.org/dc/elements/1.1/creator http://purl.org/dc/elements/1.1/title http://.../CMCS/People/DrY H2O
Encoding RDF in XML The graph represents two statements. • Entry X has a creator, Dr. Y. • Entry X has a title, H2O. In RDF XML, we have the following tags • <RDF> </RDF> denote the beginning and end of the RDF description. • <Description>’s “about” attribute identifies the subject of the sentence. • <Description></Description> enclose the properties and their values. • We import Dublin Core conventional properties (creator, title) from outside RDF proper.
RDF XML: The Gory Details <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/2 2-rdf-syntax-ns#' xmlns:dc='http://purl.org/dc/elements/1.0 /'> <rdf:Description rdf:about='http://.../X‘> <dc:creator rdf:resource='http://…/people/MEP‘/> <dc:title rdf:resource='H2O'/> </rdf:Description> </rdf:RDF>
Encoding RDF as Triplets In addition to graphs and XML, RDF may be written as triple “sentences”. A triple is just the subject, predicate, and object (in that order) of a graph segment. <http://.../CMCS/Entries/X>http://purl.org/dc/ele ments/1.1/creator<http://.../CMCS/People/DrY> • This structure may look trivial but is useful in expressing queries (more later).
Creating RDF Documents Writing RDF XML (or DAML or OWL) by hand is not easy. • It’s a good way to learn to read/write, but after you understand it, automate it. Authoring tools are available • OntoMat: buggy • Protégé: preferred by CGL grad students • IsaViz: another nice tool with very good graphics. You can also generate these programmatically using Hewlett Packard Labs’ Jena toolkit for Java. • This is what I did in previous example.
What is the Advantage? So far, properties are just conventional URI names. • All semantic web properties are conventional assertions about relationships between resources. • RDFS and OWL will offer more precise property capabilities. But there is a powerful feature we are about to explore… • Properties provide a powerful way of linking different RDF resources “Nuggets” of information. For example, a publication is a resource that can be described by RDF • Author, publication date, URL are all metadata property values. • But publications have references that are just other publications • DC’s “hasReference” can be used to point from one publication to another. Publication also have authors • An author is more than a name • Also an RDF resource with collections of properties Name, email, telephone number,
Graph Model Depicting vCard and DC Linking dry@stateu.edu dc:creator http://.../CMCS/Entry/1 vcard:EMAIL dc:title http://.../People/DrY vcard:N H20 vcard:Family vcard:Given
What Else Does RDF Do? Collections: typically used as the object of an RDF statement • Bag: unordered collection of resources or literals. • Sequence: ordered collection or resources or literals. • Alternative: collection of resources or literals, from which only one value may be chosen And that’s about it. RDF does not define properties, it just tells you where to put them. • Definitions are done by specific groups for specific fields (Dublin Core Metadata Initiative, for example). • RDF Schema provides the rules for defining specific resources classes and properties. But the graph model has opened some doors • Linked querying across data models. • Reasoning about information
RDF Schema RDF Schema is a rules system for building RDF languages. • RDF and RDFS are defined in terms of RDFS • DAML+OIL and OWL are defined by RDFS. Take our Dublin Core RDF encoding as an example: • Can we formalize this process, defining a consistent set of rules? Previous example was valid RDF but how do I formalize the process of writing sentences about creators of entries? • Can we place restrictions and use inheritance to define resources? What really is the value of “creator”? Can I derive it from another class, like “person”? • Can we provide restrictions and rules for properties? How can I express the fact that “title” should only appear once? • Current DC encoding in fact is defined by RDFS.
Some RDFS Classes (Subjects and Values) RDFS: Resource The RDFS root element. All other tags derive from Resource RDFS: Class The Class class. Literals and Datatypes are example classes. Classes consist of entities that share properties. RDFS: Literal The class for holding Strings and integers. Literals are dead ends in RDF graphs. RDFS: Datatype A type of data, a member of the Literal class. RDFS: XMLLiteral A datatype for holding XML data. RDFS:Property This is the base class for all properties (that is, verbs).
Some RDFS Properties Indicates the subject is a subclass of the object in a statement. The subject is a subProperty of the property (masquerading as an object). Restricts a property to only apply to certain classes of subjects Restricts the values of a property to be members of an indicated class or one of its subclasses. Denotes an instance of a particular class. Actually from RDF, not RDFS. subClassOf subPropertyOf Domain Range type
Sample RDFS: Defining <Property> <rdfs:Class rdf:ID=“Property"> <rdfs:isDefinedBy rdf:resource="http://.../some/uri"/> <rdfs:label>Property</rdfs:label> <rdfs:comment>The class of RDF properties.</rdfs:comment> <rdfs:subClassOf rdf:resource="http://.../#Resource”> </rdfs:Class> This is the definition of <property>, taken from the RDF schema. The “about” attribute labels names this nugget. <property> has several properties • <label>,<comment> are self explanatory. • <subClassOf> means <property> is a subclass of <resource> • <isDefinedBy> points to the human-readable documentation.
Property Relationships and Simple Reasoning subClassOf: • Carole is a member of the class <Professor> • <Professor> is a subclass of <UniversityEmployee> • So Carole works for a university. subPropertityOf: • Marlon hasSibling Susan • hasSibling is a subclass of hasRelative • So Marlon and Susan are related. Domain and Range: • hasSibling applies to animal subjects and animal objects, so Marlon is a member of the class <Animal>.
Web Ontology Language (OWL) Eeyore: W-O-L. That spells owl. Owl: Bless my soul! So it does! (Many Slides Courtesy of Sean Bechhofer)
What’s an Ontology? English definitions tend to be vague to non-specialists • “A formal, explicit specification of a shared conceptionalization” Clearer definition: an ontology is a taxonomy combined with inference rules • T. Berners-Lee, J. Hendler, O. Lassila But really, if you sit down to describe a subject in terms of its classes and their relationships, you are creating an Ontology.
RDFS Limitations RDFS too weak to describe resources in sufficient detail • No localised range and domain constraints Can’t say that the range of hasChild is person when applied to persons and elephant when applied to elephants • No existence/cardinality constraints Can’t say that all instances of person have a mother that is also a person, or that persons have exactly 2 parents • No transitive, inverse or symmetrical properties Can’t say that isPartOf is a transitive property, that hasPart is the inverse of isPartOf or that touches is symmetrical Difficult to provide reasoning support • No “native” reasoners for non-standard semantics • May be possible to reason via FO axiomatisation
OWL Semantic Layering Three language “layers”: • OWL Lite A subset of OWL useful for expressing classifications and simple relationships • OWL DL (Description Logic) Contains all OWL constructions but with limitations that guarantee computational completeness and decidability. • OWL Full All OWL constructs with no restrictions but no guaranteed processibility. Syntactic Layering Semantic Layering • Layers should agree on semantics. • All legal Lite ontologies are legal DL ontologies. • All legal DL ontologies are legal Full ontologies Full DL Lite
OWL Lite Synopsis Built on RDFS, with usual RDFS classes (see previous table in these slides). • Includes a special class, <Thing>, that is the superclass of all OWL classes. • Built in class <Nothing> that is the most specific class (has no instances or subclasses). • Built-in class <Individual> for instances of classes. In OWL, properties may apply to either individuals or to all members of a class. So <worksForIU> applies to Marlon but not Dave. Expresses concepts such as equivalent classes, synonymous properties. Allows you to assert that properties can be inverse, transitive, and symmetric.
Some OWL DL and OWL Full Extensions Class Axioms: • oneOf: a class can be defined by its members (ex: daysOfWeek defined by members) An Enumeration class • disjointWith More Boolean Relationships: • unionOf, complementOf, intersectionOf Unrestricted cardinality • Ex: daysOfWeek as cardinality of 7
Differences Between DL and Full Both DL and Full use the same OWL vocabulary • See previous slide. Difference #1: DL classes and properties cannot also be individuals (instances), and vice versa. • That is, there is a strict separation between type and subClassOf. • So if you use <Merlot> as <rdf:type> of <Wine>, you can’t subclass <Merlot> to add additional properties in OWL DL. • “subClass versus instance” decisions should be made based on the intended use of the ontology. Don’t make Merlot an instance if you are developing an ontology to describe your wine collection, which consists of many bottles of Merlot (instances), and you want to use OWL DL Difference #2: All DL properties are required to be either • owl:ObjectProperty: used to connect instances of two classes. • owl:DataTypeProperty: used to connect class instances with XML schema types and RDF literal strings. • (OWL Full allows us to tag DataTypeProperties as owl:InverseFunctionalProperty, so we can create a string literal instance that uniquely identifies a class instance. )
An OWL Example An Earth Systems Grid example (Courtesy of Line Pouchard)
An Example Ontology: Climate Data The example shows how to construct a really simple ontology and instance. We don’t use it to encode all data but rather to encode metadata about data files. • Where is the data file (URI) that has the temperature associated with this dataset? Two classes: • dataset • Parameter One property: • hasParameter Several parameters: cloud_medium, bounds_latitude, temperature Line Pouchard (ORNL) created this for ESG using Protégé and OilEd.
Let’s Begin Front matters: OWL ontologies begin with the <Ontology> header. • A useful place to put metadata about the document. • Line uses the Dublin Core to establish authorship. Next, define two classes: dataset and parameter. • Class definitions are almost trivial. • We really state what something is by its properties. Deep philosophical arguments here, I’m sure. Most of the work will go into defining the property, hasParameter. • Begins on bottom of next slide • But the full extent of the definition requires a separate slide.
Ontology header With Dublin Core Parameters. Class Definitions hasParameter Definition
Defining hasParameter hasParameter domain: it applies to the dataset class. hasParameter range: it applies to a list of 3 OWL Things • Cloud_medium, bounds_latitude, and temperature. • This is done using the awkward RDF list structure. “Give me the first of the rest recursively until I get to nil” These three OWL Things are then defined. • They are each of type “parameter” That is, members of the parameter class. • Each may also be further defined by additional properties and classes. Temperature has units, for example, bounds_latitude needs starting and stopping values in decimal degrees,etc. • Or it may be out of scope. I may just need to know that the bounds_latitude for particular dataset is located in some resource with a specific URI.
Parameter: Cloud_medium Parameter: Bounds_latitude Parameter: temperature
Finally, Apply It to Something What is the file PCM.B06.10.dataset1? • It’s a member of the dataset class, which we have defined. What properties does it have? • bounds_latitude and cloud_medium, as all such members do. Where can I get the bounds_latitude for this data set? • It’s in the file indicated by the rdf:resource.
OWL Enriched RDF Metadata about PCM.B06.10.dataset1
Is It Lite, DL, or Full? Our ontology example is (at least) DL because we include the oneOf property.
OWL Equivalence and Inheritance <owl:Class rdf:ID=”user”> <owl:equivalentClass rdf:resource=”person”> <owl:Class> <owl:Class rdf:about=”#magneticSpe ctrometer”> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=”#hasMagnet s”> <owl:allValuesFrom rdf:resource=”#Spectrome ter”> </owl:Restriction> </rdfs: subClassOf> </owl:Class> Other logical relationships that can be asserted: •inverseOf, •TransitveProperty, •SymmetricProperty, •FunctionalProperty, •InverseFunctionalProperty
Querying Semantic Data The Data Access Working Group (DAWG)
What Is Semantic Querying? Don’t confuse querying with inference. Querying just means retrieving data from Semantic data models. • Post a query to the world of distributed RDF data nuggets. For RDF-like structures, this amounts to querying triples Examples • Finding an Email address from a person’s vCard. • Searching across subgraphs: get me the email of the author of this document (Dublin Core + vCard). • Persistent/scheduled queries on updates to several multimedia databases.
The DAWG Working Group Unfortunately, there are no standards for querying RDF, etc. • There are solutions, like RDQL/SquishQL • These are just not “official” The W3C Data Access Working Group DAWG is filling the query gap. • Formed Feb 2004. This is a work in progress: • Use Cases and Requirements: http://www.w3.org/TR/rdf-dawg-uc/ • BRQL Query Language: http://www.w3.org/2001/sw/DataAccess/rq23/
A Simple Query Consider the following RDF triple • <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "BRQL Tutorial“ • Recall this is equivalent to the sentence “book1 [has] title ‘BRQL Tutorial’” • We may have a large set of such triples in our data store. We want to make a query on this data like this: “What is the title of book1?”
The Query and the Results We can construct queries on any of the parts of the triple, such as SELECT ?title WHERE { <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title . } Thus just means “what is the title of book1?” ?title = "BRQL Tutorial“
So What? This was a trivial example in which we posed a query on the triple’s object, which was a string. But the object of the triple may be a URI (an RDF resource), not just a literal. • Or we may construct queries against subjects or verbs of triples. For complicated graphs, this means that the query returns a “pointer” to another section of the graph. This means that we can make linked queries that allow us to navigate graphs.