370 likes | 463 Views
RDF Aggregate Queries and Views. Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan. Maintenance of RDF Aggregate Views. Introduction of RDF and RDQL RDQL Extension for Aggregate Views Aggregate View Maintenance Algorithms AMX
E N D
RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan
Maintenance of RDF Aggregate Views • Introduction of RDF and RDQL • RDQL Extension for Aggregate Views • Aggregate View Maintenance Algorithms AMX • Implementation and Experiments • Related Work
Introduction • Resource Description Framework (RDF) • W3C Recommendation • Represents metadata about resources identifiable on the web (by Uniform Resource Identifier (URI)) • Triple: (Resource, Property, Value) • (Artist, rdf:type, rdfs:Class) • (Painter, rdf:type, rdfs:Class) • (Painter, rdfs:subClassOf, Artist)
RDF Schema <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.auctionschema.com/schema1#"> <rdfs:Class rdf:ID="Artist"/> <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf rdf:resource="#Artist"/></rdfs:Class> <rdfs:Datatype rdf:about="&xsd;string"/> <rdf:Property rdf:ID="fname"> <rdfs:domain rdf:resource="#Artist"/> <rdfs:range rdf:resource="&xsd;string"/> </rdf:Property> </rdf:RDF> <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#"> <rdf:Description rdf:about="http://www.artist.net#guyrose"> <rdf:type rdf:resource="ns1:Painter"/> <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname> </rdf:Description> </rdf:RDF> RDF Instance
fname Artist String <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.auctionschema.com/schema1#"> <rdfs:Class rdf:ID="Artist"/> <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf rdf:resource="#Artist"/></rdfs:Class> <rdfs:Datatype rdf:about="&xsd;string"/> <rdf:Property rdf:ID="fname"> <rdfs:domain rdf:resource="#Artist"/> <rdfs:range rdf:resource="&xsd;string"/> </rdf:Property> </rdf:RDF> <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#"> <rdf:Description rdf:about="http://www.artist.net#guyrose"> <rdf:type rdf:resource="ns1:Painter"/> <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname> </rdf:Description> </rdf:RDF> subClassOf Painter fname &r1 Guy &r1 = http://www.artist.net#guyrose
RDQL: RDF Query Language SELECT?highprice WHERE (?artist, <ns1:lname>, "Rose"), (?artist, <ns1:fname>, "Guy"), (?artist, <ns1:creates>, ?artifact), (?artifact, <ns1:estimated>, ?price), (?price, <ns1:high>, ?highprice), (?artifact, <ns1:presented>, ?date) AND 2004-04-01 <= ?date <= 2004-04-30 USING ns1 FOR http://www.auctionschema.com/schema1#> graph pattern
RDQL Extension for Aggregates and Views CREATEVIEW AS SELECTmax(?highprice) WHERE (?artist, <ns1:lname>, "Rose"), (?artist, <ns1:fname>, "Guy"), (?artist, <ns1:creates>, ?artifact), (?artifact, <ns1:estimated>, ?price), (?price, <ns1:high>, ?highprice), (?artifact, <ns1:presented>, ?date) AND 2004-04-01 <= ?date <= 2004-04-30 USING ns1 FOR http://www.auctionschema.com/schema1#>
Aggregate Query • Aggregate operators, e.g. min, max, sum, count, average • GROUP BY clause • Output a table of tuples • Output can be (i) an RDF instance or (ii) a table • Advantage of (i): allows us to further query the result • However, (ii) allows any forms of tables, which include the possibility to output in the form of an RDF instance if the table consists of a set of RDF tuples.
We are expanding the syntax of RDQL so that it allows constants in SELECT clauses which equivalently creates new resources using the constants. • For example, the previous query can be modified as follows CREATEVIEW AS SELECT <ns1:works_by_guyrose>, <ns1:maxprice>, max(?highprice) WHERE (?artist, <ns1:lname>, "Rose"), (?artist, <ns1:fname>, "Guy"), (?artist, <ns1:creates>, ?artifact), (?artifact, <ns1:estimated>, ?price), (?price, <ns1:high>, ?highprice), (?artifact, <ns1:presented>, ?date) AND 2004-04-01 <= ?date <= 2004-04-30 USING ns1 FOR http://www.auctionschema.com/schema1#> • The result is a valid RDF statement (<ns1:works_by_guyrose>,<ns1:maxprice>,``800000"^^ns1:USD)
Aggregate View Maintenance • Relational Approach • Store all triples in a relational table with schema (Resource, Property, Value) OR • Store resources and values of the same property in a separate relational table with schema (Resource, Value) • #self-joins = (#triples in where-clause) – 1 • Large number of delta rules during relational view maintenance expensive
Aggregate View Maintenance • Our Approach • Localized search in RDF graphs • Modified version of breadth-first search starting at the inserted/deleted edge • auxiliary data are needed for certain aggregate views • min, max, avg
Distributive Aggregate Function • An aggregate function f is distributive w.r.t a source update operation if and only if • the updated value is based on its old value and update without reference to the source. • Examples: count, sum, average w.r.t. insertion, deletion and update • For average, we will need an additional attribute size which stores the size of intermediate result S in order to compute the correct updated value (or, we can use sum, count to calculate it) • max and min are distributive w.r.t. insertion, but not deletion and update • Auxiliary data computed from S help to avoid the need to refer to the source.
BAG 800000
BAG 800000, 500000 SELECTmax(?highprice)
Compute Aggregates Algorithm CAA Algorithm CAA(I, Q) /* Input: RDF graph I, query Q */ /* Output: table T(Q, I) */ • GP BuildGP(Q); X aggregate variables of Q; • Y GROUP BY variables of Q; • S [VRetrieve(θ, GP, X U Y) | θMSearchAll(GP, Q, I)]; • Return T(Q, I) TCompute(S, Q);
Aggregate View Maintenance Algorithms AMX • AMI – Insertion • AMD – Deletion • AMT – Triple Modification • AMR – Resource Modification
BAG 800000, 500000 Update: Insertion paints
BAG 800000, 500000 paints
BAG 800000, 500000, 60000 SELECTmax(?highprice) paints
AMI for Insertion Algorithm AMI(I, Q, A(Q, I), T(Q, I), t) /* Input: RDF graph I, query Q, auxiliary data A(Q, I), query result T(Q, I), inserted triple t */ /* Output: table T(Q, I U t), auxiliary data A(Q, I U t) * • GP BuildGP(Q); • X aggregate variables of Q; • Y GROUP BY variables of Q; • If TMatch(GP, t) == TRUE, then • ΔS [VRetrieve(θ, GP, X U Y) | θMSearch(GP, Q, t, I U t)]; • return (T(Q, I U t), A(Q, I U t)) TMaintainI(T(Q,I), ΔS, A(Q, I), Q); • else, return (T(Q, I U t), A(Q, I U t)) (T(Q, I), A(Q, I));
Algorithm MSearch(GP, Q, t, I) /* Input: graph pattern GP, query Q, triple t, RDF graph I */ /* Output: Θ = {θ | θ is a pattern matching} */ • Θ ; • for each t’ GP s.t. θ’, t θ’ = t’ θ’, • for each θ bSearch(t, t’, GP, I), • if θ satisfies the constraints in Q, then Θ Θ U θ; • return Θ;
Handling GROUP BY • From GROUP BY clause, each tuple in ΔS affects a particular group. • TMaintainI only maintain each affected group (and its corresponding auxiliary data) using affecting tuples. • Delete empty groups and insert new groups.
TMaintainI • Handling sum, count, min, max • No auxiliary data required • Suppose f(x) is an aggregate function on attribute x, F the original result, F’ the new result • F’ = F + if f = sum • F’ = F + |ΔS| if f = count • F’ = min([F] U πx(ΔS)) if f = min • F’ = max([F] U πx(ΔS)) if f = max • πx(ΔS) projects a bag of values of x from ΔS
TMaintainI • Handling average • We need size of S size’ = size+|ΔS|
BAG 800000, 500000, 60000 Update: Deletion paints
BAG 800000, 500000, 60000 paints
BAG 500000, 60000 SELECTmax(?highprice) paints
AMD for Deletion Algorithm AMD(I, Q, A(Q, I), T(Q, I), t) /* Input: RDF graph I, query Q, auxiliary data A(Q, I), query result T(Q, I), deleted triple t */ /* Output: table T(Q, I - t), auxiliary data A(Q, I - t) * • GP BuildGP(Q); • X aggregate variables of Q; • Y GROUP BY variables of Q; • If TMatch(GP, t) == TRUE, then • ΔS [VRetrieve(θ, GP, X U Y) | θMSearch(GP, Q, t, I)]; • return (T(Q, I - t), A(Q, I - t)) TMaintainD(T(Q,I), ΔS, A(Q, I), Q); • else, return (T(Q, I - t), A(Q, I - t)) (T(Q, I), A(Q, I));
TMaintainD • Handling min, max • Min and max are not distributive w.r.t. deletion • We need to store πx(S) which projects a bag of values of x from S • The new aggregate value F’ is obtained by: • F’ = min(πx(S - ΔS)) if f = min • F’ = max(πx(S - ΔS)) if f = max • We need to update πx(S) to become • πx(S) - πx(ΔS)
Implementation and Experiment • Implemented in Java • Jena – RDQL Engine of HP • Comparison with Relational Approach (standard view maintenance algorithm on relational tables) • Counting Algorithm in Gupta et al. "Maintaining Views Incrementally", SIGMOD 1993 • Dataset: Chef Moz Project RDF dump • Data stored in memory
Other Related Work • Volz, Oberle, Studer [DBFUSION’02] • the first to introduce a view mechanism for RDF data • Their views require that • the results contain class instances (i.e., a subject or object variable), or • the result itself has the pattern of RDF statement (i.e., a triple containing subject, predicate and object). • Magkanaraki et al [ISWC’03] • proposed RVL, a view definition language that can also create virtual RDF schemas and restructure class and property hierarchies such that new resources, property values, classes and property types can be created. • None of these works specifically address (i) aggregates in RDF or (ii) the problem of maintaining aggregate RDF views.
Summary • Aggregate Views are important for RDF applications • RDQL Extension for Views and Aggregates • Aggregate View Maintenance Algorithms AMX • Localized search in RDF graphs
Thank you very much! Questions and Answers