380 likes | 400 Views
Explore the RAL (RDF Algebra) framework for querying RDF data models, operators, and optimization in web information systems. Understand the Semantic Web architecture and RDF query languages.
E N D
RAL: an RDF Algebra Flavius Frasincar Geert-Jan Houben Richard Vdovjak Peter Barna WISE 2002
Contents • Introduction • RAL Goals • RAL Data Model • RAL Operators • Conclusion WISE 2002
1. Introduction • Metadata is machine understandable information about web resources or other things [Source: Tim Berners-Lee, “Metadata Architecture”] • RDF (Resource Description Framework) is the Web metadata language for the Web • RDF extends the syntactic interoperability of XML to semantic interoperability being the foundation for the Semantic Web WISE 2002
Semantic Web Architecture “Layer Cake” [Source: Tim Berners-Lee Director W3C Keynote speech at XML2000 “RDF and the Semantic Web” (Washington DC, 6 Dec. 2000)] WISE 2002
Hera • Hera research project: Web Information Systems (WIS) and web (hypermedia) generation in WIS • WIS use RDF to represent and query application data for: • Semantic integration of data coming from heterogeneous sources • Semantic information presentation • Semantic querying • Huge quantities of data and metadata need to be processed in real-time: optimization is crucial WISE 2002
Hera Methodology/Suite WISE 2002
Rembrandt painted_by http://example.com/sb.jpg RDF Representations Primitive semantics: Subject Predicate Object Three alternative notations: • Triple (http://example.com/sb.jpg, painted_by, “Rembrandt”) • RDF/XML<rdf:Description rdf:ID=http://example.com/sb.jpg> <painted_by> Rembrandt </painted_by> </rdf:Description> • Graph WISE 2002
RDF Query Languages • Triple-based: • Triple [successor of SiLRI] (Horn logic) • Metalog (Datalog) • XML-based: • RDF Query • RQuery (XQuery) • Graph-based (but not graphical): • RQL (OQL) WISE 2002
2. RAL Goals • Support the formal specification of RDF query languages • Provide a reference framework to compare different RDF query languages • Consider the result construction phase • presently neglected by RDF query languages which focus only on extraction • Enable algebraic query optimization WISE 2002
RAL • RAL Data Model: specify what information is accessible (for RAL operators) in an RDF graph • Nodes: Resources and Literals • Edges: Properties • RAL Operators: define operators working on collections of nodes from the RAL Data Model • Extraction Operators • Loop Operators • Construction Operators WISE 2002
rdf:Property rdf:type 3. RAL Data Model • R is the set of resources R = U B • U is the set of URI references rdf:Property U • B is the set of blank nodes • L is the set of literals U, B, L are disjoint • P is the set of properties P R, rdf:type P R L U B P WISE 2002
An RDF model M is a finite set of triples(statements) M R U (R L) • The set of properties of an RDF model M PM = {p| (s, p, o) M (p, rdf:type, rdf:Property) M} • The RDF graph model is similar to a directed labeled graph (DLG) • It is not a DLG since it allows for multiple edges between two nodes • It is not a general multigraph because different edges between two nodes cannot share the same label WISE 2002
The RDFgraph model corresponding to an RDF model M is defined by • GM= (N, E, lN , lE), lN: N R L, lE: E P • using the following construction mechanism: • for each (s, p, o) M • add nodes ns, no to N (different only if s o) • assign lN (ns) = s, lN (no) = o • add ep to E as a directed edge between ns and no • assign lE (ep ) = p • Observations: • lN (.) is an injective partial function • lE (.) is a total function WISE 2002
Basic Properties Edges Nodes • Two non-blank nodes are equal if they have the same id • Twoblank nodes are equal if they have the same properties • and the corresponding property values are equal WISE 2002
RDF(S)-Closure • RDF Model Theory defines the RDF-closure and RDFS-closure of an RDF Model M by proposing a set of rules for generating new triples • Extensional data: the original model M triples • Intensional data: the new triples generated by the RDF(S)-closure • RAL operators work on extensional+intensional data • Variants of the operators can be defined to neglect the intensional data (similar to the RQL strict interpretation) WISE 2002
4. RAL Operators • All operators have the following form o[f](x1, x2, … xn: expression) wherean expression is a collection of nodes and f is a function having as input/output collection of nodes • Extraction Operators: retrieve the needed information from an RDF graph • Loop Operators: control the repetitive application of certain operators • Construction Operators: build new RDF graphs from the extracted data WISE 2002
4.1 Extraction Operators Projection [re_name](e: expression) computes the values of the properties with a name given by the regular expression re_name over strings on the input collection given by e Example [(P|p)aint[s]#](r4) returns the resources painted by r4 WISE 2002
Selection [condition](e: expression) selects input collection nodes fulfilling the given condition Example [[tname] = “Chiaroscuro”](c) where c is the collection of input resourcesr1, r2, r3, and r4, returns the resources representing the painting technique with the name“Chiaroscuro” WISE 2002
Cartesian Product • (x: expression) (y: expression) • for each element in the Cartesian product of the input collections, a blank node that has all properties of both originating nodes is added to the result • Example • [[rdf:type] = Technique](c)[[rdf:type] = Painter](c) • returns a collection of blank nodes, each blank node having all the properties of the corresponding pair from the Cartesian product(the new nodes have both types Technique and Painter) WISE 2002
Join (x: expression) ⋈[condition] (y: expression) [condition](x y) is a Cartesian product followed by a selection Example (x:[[rdf:type] = Technique](c)) ⋈[[exemplified_by](x) = [paints](y)] (y: [[rdf:type] = Painter](c)) returns a collection of blank nodes, each blank node having all the properties of the corresponding pair from the Cartesian productthat satisfies the given condition WISE 2002
Union, Difference, Intersection (x: expression) (y: expression) where {, , } defined as in set theory Example [[rdf:type] = Technique](c)[[rdf:type] = Painter](c), returns the collection of resources obtained by combining the two collections (these two collections are obtained using two selections) WISE 2002
4.2 Loop Operators Map map[f](e: expression) applies the function f to each element of the input collection; the function results are added in the output collection Example map[[rdfs:subClassOf]](Painting, Painter) computes the parent classes using the property rdfs:subClassOf for the collection consisting of Painting and Painter WISE 2002
Kleene Star [f](e: expression) repeats the function f possibly infinite times starting with the given input collection; at each iteration the results of the function are added to the next function input Example [[rdfs:subClassOf]](Painting)) computes the transitive closure of the property rdfs:subClassOf starting from Painting, i.e. Painting and all its superclasses WISE 2002
4.3 Construction Operators Create Node node[type, id]() adds a new node to the graph with the given type and id (id is missing for blank nodes) and returns this node; if a resource is created, an rdf:type edge is added between the resource and the node representing rdfs:Resource The Create Node operator assigns a unique (in the resulted RDF graph) internal identifier for each created node WISE 2002
Caravagio rdfs:Resource rdf:type Example node[Resource]() and node[Literal,“Caravagio”]() create a Resource representing a blank node and a Literal representing the string “Caravagio” WISE 2002
Create Edge edge[name, subject](object: expression) adds edges between the subject node and each of the nodes in the object collection, and returns the subject node; the label of the edges is given by name which is the id of a property resource The Create Node and Create Edge operators abort if the “well-formed RDF(S) graph” conditions (e.g. rdf:type cannot refer to a literal, literals cannot have properties etc.) are not met after construction WISE 2002
Caravagio rdfs:Resource rdf:type name Example edge[name, node[Resource]()](node[Literal, ”Caravagio”]()) creates an edge labeled with name between the nodes defined in the previous example WISE 2002
5. Conclusion • The RAL algebra is developed from a DB perspective and proposes a set of operators similar to their relational algebra counterparts: • Extraction Operators: Projection, Selection, Cartesian Product, Join, Union, Difference, Intersection • Similar to the existing semi-structured query languages RAL considers powerful repetition operators: • Loop Operators: Map, Kleene Star • As opposed to present RDF query languages RAL supports result construction: • Construction Operators: Create Node, Create Edge WISE 2002
Future Work • Analyze the power of expression of RAL compared to RQL, a popular RDF query language at present time (build a translation scheme from RQL to RAL) • Formally specify the semantics of other RDF query languages in terms of RAL • Compare the power of expression of different RDF query languages using RAL as reference language • Explore equivalence rules for RAL expressions to be used in query optimization • Develop an RDF query optimization algorithm on RAL WISE 2002