240 likes | 253 Views
This paper explores light-weight ontology versioning using multi-temporal RDF schema, discussing temporal RDF data models, database models, triples, elements, integrity constraints, and memory-saving techniques for efficient storage. The example provided illustrates the benefits in large RDF datasets.
E N D
Fifth International Conference on Advances in Semantic Processing - SEMAPRO 2011 Light-weightOntologyVersioningwithMulti-temporal RDF Schema Fabio Grandi Alma Mater Studiorum - Università degli Studi di Bologna
Introduction • Some application fields require the maintenance of past versions of an ontology after changes • For instance, in the legal domain: • Ontologies evolve as a natural consequence of the dynamics involved in normative systems • Agents must often deal with a past perspective (e.g. a Court judging today on some fact committed in the past) • Moreover, several time dimensions are usually important for applications in such domains SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Multi-temporalversioning • Time dimensions of interest in the legal domain: • Validity timeis the time a norm is in force in the real world • Efficacy timeis the time a norm can be applied to a concrete case;while such cases exist, the norm continues its efficacy though no longer in force • Transaction timeis the time a norm is stored in the computer system • Publication timeis the time a norm is published on the Official Journal SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Temporal RDF Data Models • Temporal RDF data models have been recently proposed, the proposals remarkably include: [Gutierrez, Hurtado & Vaisman, 2007] [Pugliese, Udrea & Subrahmanian, 2008] [Tappolet & Bernstein, 2009] • Interval timestamping of RDF triples is adopted • A single time dimension (valid time) is usually considered • Index structures (e.g. tGRIN and keyTree) have been proposed for efficient processing of temporal queries SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
A Multi-temporal RDF Database Model • N-dimensionaltime domain: • T = T1 x T2x … x TNTi = [0,UC)i • Multi-temporal RDF triple: • ( s,p,o | T )sis a subjectpis a predicateoisanobjectT Tis a timestamp • Multi-temporal RDF database: • RDF-TDB = { ( s,p,o | T ) | T T } SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Multi-temporal RDF Triples • A temporal triple ( s,p,o | T ) assigns a temporalpertinencetoan RDF triple ( s,p,o ) • The non-temporal triple ( s,p,o )is the value (or the contents) of the temporal triple ( s,p,o | T ) • The temporalpertinenceTis a subset of the time domain T representedby a temporalelement SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
TemporalElements • A temporalelement[Gadia 98] isa disjointunionoftemporalintervals • Multi-temporalintervals are obtainedas the Cartesianproductofoneintervalforeachtemporaldimension • T = U1≤j≤mIj = U1≤j≤m [tjs, tje)1 x [tjs, tje)2 x … x [tjs, tje)N • Ij ∩ Ik= Ø forall1≤j<k≤m SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
IntegrityConstraint • No value-equivalentdistincttriplesexist: ( s,p,o | T ), ( s,p,o | T ) RDF-TDB:s=s p=p o=o T=T • The constraintismadepossibleby the adoptionoftemporalelementtimestamping • Temporal elements lead to space saving, whenever the temporal pertinence of a triple is not a convex interval SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Memory Saving with Temporal Elements • For example, even with a monodimensional time domain, the two value-equivalent triples with interval time-stamping ( t2 < t3 ):( s,p,o | [t1, t2) ) and ( s,p,o | [t3, t4)) can bemergedinto a single triple withelementtime-stamping: ( s,p,o | [t1, t2) U [t3, t4)) where the same space is required for the timestamps in both cases (i.e. the space needed by 4 time points) and the contents of the triple is stored twice in the former case and only once in the latter • Different triple versions are stored only once with a complex timestamp instead of storing multiple copies (value-equivalent triples) with a simple timestamp SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
An Example • The memory saving obtained with temporal elements grows with the dimensionality of the time domain! • The memory saving is also emphasized by the triple size with respect to the timestamp size • In very large RDF benchmark datasets, the average triple sizeranges from 80140 bytes (DBpedia, UScensus, LUBM, BSBM)to more than 600 bytes (UniProtKB) • The timestamp (date+time) data size in SQL is 68 bytes • In the example which follows we assume a bitemporal domain (valid + transaction time) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Representation of the Evolution of a Triple t0t1 t2 UC (s, p, o1 ) With temporal elements (3 triples needed)( s, p, o1 | [t0,t1)x[t0,UC) U [t1,UC)x[t0,t1) )( s, p, o2 | [t1,t2)x[t1,UC) U [t2,UC)x[t1,t2) )( s, p, o3 | [t2,UC)x[t2,UC) ) • Withtemporalintervals(5 needed) • ( s, p, o1 | [t0,t1)x[t0,UC) )( s, p, o1 | [t1,UC)x[t0,t1) ) ( s, p, o2 | [t1,t2)x[t1,UC) )( s, p, o2 | [t2,UC)x[t1,t2) )( s, p, o3 | [t2,UC)x[t2,UC) ) (s, p, o2 ) (s, p, o3 ) t0 t1 t2UC SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Memory Saving Figures • Percentage space saving with temporal element vs interval timestamping. Avg. number of versions per triple in colums, triple size in bytes in rows. We assume 8-byte timestamps. • For instance, with 120-byte triples with 5 versions per triple on average, we have a 39,22% space saving.With 1 billion of triples, this means an RDF-TDB size of • 721 GB with temporal elements • 1.14 TB with temporal intervals SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
QueryOperators • The onlyretrievaloperatorweconsider in this workis a snaphotextractionoperator, which can beusedtoextractanontologyversionfrom a multi-versionontologyrepresentedas a temporal RDF database • Given a timepointt= (t1, t2,…, tN) T wedefine the RDF database snapshotvalid at tasRDF-TDB(t) = { ( s,p,o ) | ( s,p,o | T ) RDF-TDB t T} • The result is a (non-temporal) RDF graph, which can be used to represent the ontology version valid at t SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
ModificationOperators – Insertion • Assumingan (N-1)-dimensionaltemporalelementtv (foranymodification, transactiontime[now, UC)isimplied), the insertionoperation INSERT DATA { s,p,o} VALID tv can bedefined via itseffects on the database stateasfollows (using a triple calculus) RDF-TDB = RDF-TDB U { ( s,p,o | T ) | ( s,p,o | T ) RDF-TDB T = coalesce( TU tv x [now, UC) )} U { ( s,p,o | tv x [now, UC) ) | ¬ ( s,p,o | T ) RDF-TDB } SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Maintenanceoftemporalelements • In ordertoensure the results are stilltemporalelements,union and differenceoperationsmustbecarefullydefined • In particular, ifTi (i=1,2) are temporalelementsdefinedasTi = U1≤j≤miIijwhereIijare multidimensionalintervalsthen the difference can becomputedasfollowsT1 \ T2 = U1≤j≤m1I1j\ T2 and isensuredtobe a temporalelementifI1j\ T2 is a temporalelementforeachj • Given the difference, the union can becomputedasfollowsT1 UT2= T1 U (T2 \ T1) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
ModificationOperators - Deletion • Assumingan (N-1)-dimensionaltemporalelementtvand a selection predicate pred(s,p,o), the deletionoperation DELETE { s,p,o} VALID tv WHERE pred(s,p,o) can bedefined via itseffects on the database state asfollows RDF-TDB = RDF-TDB \ { ( s,p,o | T ) | ( s,p,o | T ) RDF-TDB pred(s,p,o) T ∩ tv x [now, UC) ≠ Ø} U { ( s,p,o | T ) | ( s,p,o | T ) RDF-TDB pred(s,p,o) T ∩ tv x [now, UC) ≠ Ø T = coalesce( T\ tv x [now, UC) )} SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
ModificationOperators - Update • Assumingan (N-1)-dimensionaltemporalelementtv,the update operation UPDATE { s,p,o} SET { s’,p’,o’} VALID tv WHERE pred(s,p,o) isnot primitive, asit can bedefinedas a deleteoperationfollowedbyaninsertoperationasfollows DELETE { s,p,o} VALID tv WHERE pred(s,p,o);INSERT DATA { s’,p’,o’} VALID tv SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Derivationof a newOntologyVersion (1) • We assume the newversionisobtainedbyapplyingchangestoanexistingontologyversion. The parametersneeded are: • OS_Validity: the validtimepointusedtoselect the ontologyversionsusedas base for the derivation • The sequenceofschema changestobeappliedto the selectedversion in orderto produce the newontologyversion • OC_Validity: the validtimeintervalusedtoassign the validityto the newversion (possibly in the past or future) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Derivationof a newOntologyVersion (2) t1 t2 t3 valid time OS_Validity SC_Validity= [ t4, UC ] schema changes t1 t2 t3 t4 valid time SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Transaction • On … • BEGIN TRANSACTION ; • CREATE GRAPH <workVersion> ; • INSERT INTO <workVersion> { ?s, ?p, ?o }WHERE { TGRAPH <tOntology> { ?s, ?p, ?o | ?t } . FILTER ( VALID(?t) CONTAINS OS_Validity && TRANSACTION(?t) CONTAINS current-date() )} ;=> a sequenceofontologychangesacting on the (non–temporal) workVersiongraphgoeshere • DELETE FROM <tOntology> { ?s, ?p, ?o } VALID OC_Validity ; • INSERT INTO <tOntology> { ?s, ?p, ?o } VALID OC_ValidityWHERE { GRAPH <workVersion> { ?s, ?p, ?o } } ; • DROP GRAPH <workVersion> ; • COMMIT TRANSACTION SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
OperatorsforOntology Management • On the basisof the primitivesintroduced so far, alsohigh-level macro operatorsfor the management of a multi-version RDF ontologycan bedefinedCREATE_CLASS(Name,Validity)RENAME_CLASS(Class,NewName,Validity) DROP_CLASS(Class,Validity)ADD_SUBCLASS(SubClass,Class,Validity)DEL_SUBCLASS(SubClass,Class,Validity) CREATE_PROPERTY(Name,Range,Validity)RENAME_PROPERTY(Property,NewName,Validity) CHANGE_PROPERTY_RANGE(Property,NewRange,Validity) DROP_PROPERTY(Property,Validity)ADD_PROPERTY(Class,Property,Validity) DEL_PROPERTY(Class,Property,Validity)ADD_SUBPROPERTY(SubProperty,Property,Validity)DEL_SUBPROPERTY(SubProperty,Property,Validity) ………… SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Sample OperatorDefinitions • Forexample the definitionsof some of the property management operatorsis the following • ADD_PROPERTY(Class,Property,Range,Validity)INSERT DATA{ Propertyrdfs:domain Class ;rdfs:rangeRange . } VALID Validity • CHANGE_PROPERTY_RANGE(Property,NewRange,Validity)UPDATE { Propertyrdfs:range ?range }SET { Propertyrdfs:rangeNewRange } VALID Validity • DEL_PROPERTY(Class,Property,Validity)DELETE { Propertyrdfs:domain Class ;rdfs:range ?range . } VALID Validity SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Conclusions • We presented a temporal RDF database model whose distinctive features with respect to previously proposed models are • It is defined on a multi-dimensional time domain • It employs triple timestamping with temporal elements • The adoption of temporal elements in the multi-temporal setting best preserves the scalability property enjoyed by triple storage technologies as it minimizes the database growth (the absence of value-equivalent triples is an integrity constraint) • The data model has been equipped with manipulation operatorsfor the extraction of a temporal snapshot and for the maintenance of the database; moreover, also high-level operators can be defined to be used to manage a multi-version RDF ontology SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema
Future Work • Some design choices were motivated by application requirements of an ontology-based personalization service in the legal (or medical) domain. We plan to explore the applicability of the approach also in application fields with more generic requirements • We also plan to consider extensions of the proposed RDF database model, including the development of a complete multi-temporal SPARQL-like query language and the adoption of suitable multi-temporal index structures SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema