360 likes | 590 Views
RDF Query Languages. Flavius Frasincar flaviusf@win.tue.nl. Contents. Why RDF Query Languages? RDF Features (Recap) RDF Query Language Requirements RDF Query Languages RQL (RDF Query Language): Select: variables Where: path expressions From: condition Summary. Why RDF QLs?.
E N D
RDF Query Languages Flavius Frasincar flaviusf@win.tue.nl ISA
Contents • Why RDF Query Languages? • RDF Features (Recap) • RDF Query Language Requirements • RDF Query Languages • RQL (RDF Query Language): • Select: variables • Where: path expressions • From: condition • Summary ISA
Why RDF QLs? • RDF is the standard representation language for Web metadata (foundation of the Semantic Web) • RDF is already used in: • Large description schemas: ODP (Open Directory Project) - web site classification with 385,965 topics, UNSPSC (United Nations Standard Products and Services Code) - product classification with 16,506 classes • Large description bases: ODP classifies 3,339,355 sites • RDF QLs are needed in order to access data from (large) RDF representations ISA
Rembrandt RDF Primitive Semantics: Subject Predicate Object (one statement) Three alternative notations: • Graph • Triple (http://example.com/sb.jpg, painted_by, “Rembrandt”) • RDF/XML<rdf:Description rdf:ID=http://example.com/sb.jpg> <painted_by> Rembrandt </painted_by> </rdf:Description> painted_by http://example.com/sb.jpg ISA
RDF Features • RDF: • Data Model: Directed Labeled Graph • Nodes: Resources (with or without URIs) or Literals • Edges: Properties (attributes or relationships) • Labels: Nodes (URI) or Edges (Property URI) • RDFSchema: • Multiple classification of resources • Specialization of both classes/properties (simple and multiple) • Unordered, optional, and multivalued properties • Domain and range polymorphism of properties ISA
RDF vs. XML • Different Data Models: • RDF data model: a directed graph with labels on both edges and nodes • XML data model: a tree with labels on edges or nodes • Different Semantics: • RDF is able to model complex semantic relations (e.g. class/property hierarchies based on specialization) • XML has only one type of semantics (inclusion semantics) (an element contains another element) • RDF has an XML syntax RDF/XML but XML QLs do not support RDF semantics: we need an RDF QL ISA
Requirements for an RDF QL • Understand RDF Data Model (RDF graph or RDF triples) • Path expressions can use labels from both nodes and edges • Compose queries: the output of one query can be used as input for the next query • Declarative: not bound to any implementation (closer to human language!) • Support RDF Schema ISA
RDF Query Languages • Triple-based: querying the structure • RDQL • Triple [successor of SiLRI] (Horn logic) e.g. Find statements whose subject is … and object is … • XML-based: querying the syntax • RDF Query • RQuery (XQuery) e.g. Find description elements whose attribute value contains … • Graph-based (but not graphical): querying the semantics • RQL (OQL) e.g. Find resources classified under … whose property value is … ISA
RDF Query Language (RQL) • Declarative query language for RDF • Language proposal (not yet a standard) • Based on the RDF-graph representation • Supports RDF Schema (a few from the existing RDF QL do that) • References (small differences between them): • RQL from ICS-FORTH (Greece) (http://139.91.183.30:9090/RDF/RQL/) • Sesame from Aidministrator (Holland) (http://sesame.aidministrator.nl/) • The rest of the presentation refers to the Sesame impl. ISA
RQL Input • The input to an RQL query is a complete RDF model, i.e. a model that contains its RDFS-closure (defined in RDF Semantics). • Note that the RDFS-closure includes the RDF-closure • [RDF-closure] e.g. rdf1: if (xxx aaa yyy)then add (aaa rdf:type rdf:Property) • [RDFS-closure] e.g. rdfs9: if (xxx rdfs:subClassOf yyy) and (aaa rdf:type xxx) then add (aaa rdf:type yyy) • There are operators variants (append ^) that discard this new data (intensional data) and consider only the given statements (extensional data) from an RDF model ISA
Example:Web Resources • &r1 http://www.european-history.com/rembrandt.html • &r2 • http://www.artchive.com/rembrandt/abraham.jpg ISA
Select-Where-From List of variables selectX, Y from{X}cult:paints{Y},{X}cult:first_name{Xfname} whereXfname like "Rembrandt" usingnamespace cult=http://www.icom.com/schema.rdf# • Variables on graph labels • Path expressions/conditions use variables and constants • RQL result is a table of tuples (a relation) that has for each variable (the columns) a value assigned (the rows) List of path expressions Condition (optional) ISA
RQL Result <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Bag rdf:ID="query_result"> <rdf:li> <rdf:Seq> <rdf:li rdf:resource="http://www.european-history.com/rembrandt.html"/> <rdf:li rdf:resource="http://www.artchive.com/rembrandt/ artist_at_his_easel.jpg"/> </rdf:Seq> </rdf:li> <rdf:li>…abraham.jpg …</rdf:li> </rdf:RDF> ISA
Why RQL Result Is a Bag? selectX from{X}cult:paints{Y},{X}cult:first_name{Xfname} whereXfname like "Rembrandt" usingnamespace cult=http://www.icom.com/schema.rdf# • e.g. if only one variable is returned there might be multiple bindings of this variable with the same value (we need a Bag) ISA
Namespaces • All the labels for nodes and edges are associated with a certain namespace usingnamespace cult=http://www.icom.com/schema.rdf# adm=http://www.oclc.org/schema.rdf# • cult contains information intended for museum specialists (e.g. artists, artifacts, museums descriptions) • adm contains information for portal administrators (e.g. title, file_size, mime-type of a certain external resource) • (Web) Resources are orthogonally classified using the two above schemas ISA
Select: Variables • There are three kinds of variables: • Instance: e.g. X • Class: e.g. $C • Property: e.g. @P • “Find all resources together with their associated classes, properties, and property values”: select X, $C, @P, Y is equivalent toselect * from {X : $C}@P{Y} (* = all variables) from {X : $C}@P{Y} • “A resource X has type C” has two syntaxes X : C (not standalone) or C{X} (a path expression that limits a node) ISA
From: Path Expressions • Path expressions specify a linear path through the RDF data model • Each variable used in a path expression is bound to labels from the model • “Find all painters and their associated paintings” select Painter, Painting from {Painter}cult:paints{Painting} using namespace cult=http://www.icom.com/schema.rdf# ISA
The ‘.’ in Path Expressions • Path expressions can be arbitrarily long • The ‘.’is used to specify a join condition between the object and the subject of two consecutive properties select Painter, Painting, Technique from {Painter}cult:paints{Painting}. cult:technique{Technique} using namespace cult=http://www.icom.com/schema.rdf# • In the above example Painting is the object of cult:paints and the subject of cult:technique • If Painting is not interesting it can be omitted from {Painter}cult:paints. cult:technique{Technique} ISA
List of Path Expressions • Since path expressions are linear it is not possible to express two paths with the same origin in one path expression • List of path expressions sharing variables select Painter, Painting, Painter_lname from {Painter}cult:paints{Painting}, {Painter}cult:last_name{Painter_lname} using namespace cult=http://www.icom.com/schema.rdf# ISA
Class of a Resource select Painter, $Painter, Painting from {Painter : $Painter}cult:paints{Painting} using namespace cult=http://www.icom.com/schema.rdf# select Painter, Painter_type, Painting from {Painter}rdf:type{Painter_type}, {Painter}cult:paints{Painting} using namespace rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns# , cult = http://www.icom.com/schema.rdf# • Q1 returns the most specific type (class) for a resource while Q2 returns all types of this resource Q1 (better) Q2 ISA
Class Restriction for Resources select Painter from {Painter :cult:Flemish}cult:paints{Painting} using namespace cult=http://www.icom.com/schema.rdf# • Note that cult:Flemish must be part of the domain of cult:paints, otherwise the query returns 0 results. select Painter from cult:Flemish{Painter} using namespace cult=http://www.icom.com/schema.rdf# • Q1 returns multiple times a Flemish painter that has more than one paintings while Q2 does not so. Q1 Q2 (better) ISA
Domain and Range select $Domain, $Range from {:$Domain}cult:has_style{:$Range} using namespace cult=http://www.icom.com/schema.rdf# selectdomain(@P),@P,range(@P) from {}@P{} where @P = cult:has_style using namespace cult=http://www.icom.com/schema.rdf# • Q1 return data from schema with RDFS-closure while Q2 return data present in schema without RDFS-closure (both are independent of the model instance) Q1 (better) Q2 ISA
Where: Condition • The where clause is optional • The condition constrains the value of variables bound in the from clause. It uses two kind of operators: • Comparison: <, <=, =, >, >=, != like (with *)[lexical], in [set] • Logical: and, or, not • The first 5 comparison operators are overloaded for sets or single-valued (classes, properties, reals, integers, and literals/resources) based on set comparison or single-value comparison (subClassOf, subPropertyOf, reals comparison, integers comparison, and lexical comparison) ISA
Comparison Operators • “Select all artists, their type, and their first name that have a painting resource containing the string ‘abraham’” select Artist, $Artist, ArtistFName from {Artist : $Artist} cult:first_name {ArtistFName} where Artist in select Painter from {Painter} cult:paints {Painting} where Painting like "*abraham*" usingnamespace cult = http://www.icom.com/schema.rdf# ISA
Logical Operators • “Select all painters with a first name that starts with R and all sculptors with a first name that does not start with M” select Artist, ArtistFName from {Artist :$Artist} cult:first_name {ArtistFName} where ($Artist <= cult:Painter and ArtistFName like "R*") or ($Artist <= cult:Sculptor andnot (ArtistFName like "M*")) using namespace cult = http://www.icom.com/schema.rdf# ISA
Standard Functions • Standard functions are used to retrieve standard RDFS relationships • We already did see: domain() and range() • Other examples: Class, Property, subClassOf(), subPropertyOf(), typeOf() etc. • The standard functions can be used also as standalone queries Class subClassOf ( http://www.icom.com/schema.rdf#Artist ) typeOf( http://www.european-history.com/rembrandt.html ) etc. ISA
Strict Interpretation with ‘^’ • “Retrieve the direct subclasses of Artist” subClassOf^( http://www.icom.com/schema.rdf#Artist ) • “Retrieve all subclasses of Artist” subClassOf ( http://www.icom.com/schema.rdf#Artist ) • “Retrieve the most specific classes to which the resource http://www.european-history.com/rembrandt.html belongs to” typeOf^( http://www.european-history.com/rembrandt.html ) • “Retrieve the classes to which the resource http://www.european-history.com/rembrandt.html belongs to” typeOf ( http://www.european-history.com/rembrandt.html ) ISA
Standalone Queries • The standard functions: Class, subClassOf, Property, subPropertyOf etc. • Any class (resource of type rdf:Class): returns the extension (resources) of this class http://www.icom.com/schema.rdf#Artist • Any property (resource of type rdf:Property): returns the extension (pairs subject-object) of this property http://www.icom.com/schema.rdf#creates ISA
Set Operations • The query results can be combined using the following operators: union, intersect, and minus • “Retrieve the first name and the last name of all painters” (select PainterR, PainterLName, PainterFName from cult:Painter{PainterR}. cult:last_name{PainterLName}, {PainterR}cult:first_name{PainterFName}) union (select PainterR, PainterLName, NULL from cult:Painter{PainterR}. cult:last_name{PainterLName} wherenot (PainterR inselect PainterR from {PainterR}cult:first_name )) usingnamespace cult = http://www.icom.com/schema.rdf# Note that not all painters have a first name in the input model (outer join operation) ISA
Summary • There is a need for RDF query languages (XML query language cannot handle RDF semantics) • RQL: declarative query language for uniformly querying RDF schemas and RDF descriptions Select list of variables (variables to be returned) From list of path expressions (variables are bound) Where condition (constrains the value of variables) • Compositional (in and set operations) • Very expressive • Well-defined semantics, syntax can be improved … … but not yet a standard! ISA
Appendix • Try your own queries at: http://sesame.aidministrator.nl/sesame/actionFrameset.jsp?repository=museum • The result of the query: • HTML Table • RDF-Bag • XML • Explore the Museum example (with or without inferred statements): • Schema (ontology) • Instance (data statements) ISA
Exercise 1 • “Find the first name of painters that have paintings using the ‘oil on canvas’ technique and return also these paintings” select Painter_fname, Painting from {Painter}cult:paints{Painting}. cult:technique{Painting_technique}, {Painter}cult:first_name{Painter_fname} where Painting_technique like "oil on canvas" usingnamespace cult=http://www.icom.com/schema.rdf# , adm=http://www.oclc.org/schema.rdf# ISA
Exercise 2 • “Find the first name of the painters that have a painting stored in a file with size greater than 5” select Painter_fname from {Painter}cult:paints{Painting}.adm:file_size{Painting_fsize}, {Painter}cult:first_name{Painter_fname} where Painting_fsize > 15 usingnamespace cult=http://www.icom.com/schema.rdf# , adm=http://www.oclc.org/schema.rdf# ISA
Exercise 3 • “Find the resources which are not of type ExtResource” • First Solution: select R from rdfs:Resource{R} where not (R in select R from adm:ExtResource{R}) usingnamespace rdfs=http://www.w3.org/2000/01/rdf-schema# , adm=http://www.oclc.org/schema.rdf# ISA
Exercise 3 (cont’d) • Second solution: (select R fromrdfs:Resource{R}) minus (select R from adm:ExtResource{R}) usingnamespace rdfs=http://www.w3.org/2000/01/rdf-schema# , adm=http://www.oclc.org/schema.rdf# ISA