1 / 24

Light-weight Ontology Versioning with Multi-temporal RDF Schema

This paper explores light-weight ontology versioning using multi-temporal RDF schema, discussing temporal RDF data models, database models, triples, elements, integrity constraints, and memory-saving techniques for efficient storage. The example provided illustrates the benefits in large RDF datasets.

Download Presentation

Light-weight Ontology Versioning with Multi-temporal RDF Schema

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fifth International Conference on Advances in Semantic Processing - SEMAPRO 2011 Light-weightOntologyVersioningwithMulti-temporal RDF Schema Fabio Grandi Alma Mater Studiorum - Università degli Studi di Bologna

  2. Introduction • Some application fields require the maintenance of past versions of an ontology after changes • For instance, in the legal domain: • Ontologies evolve as a natural consequence of the dynamics involved in normative systems • Agents must often deal with a past perspective (e.g. a Court judging today on some fact committed in the past) • Moreover, several time dimensions are usually important for applications in such domains SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  3. Multi-temporalversioning • Time dimensions of interest in the legal domain: • Validity timeis the time a norm is in force in the real world • Efficacy timeis the time a norm can be applied to a concrete case;while such cases exist, the norm continues its efficacy though no longer in force • Transaction timeis the time a norm is stored in the computer system • Publication timeis the time a norm is published on the Official Journal SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  4. Temporal RDF Data Models • Temporal RDF data models have been recently proposed, the proposals remarkably include: [Gutierrez, Hurtado & Vaisman, 2007] [Pugliese, Udrea & Subrahmanian, 2008] [Tappolet & Bernstein, 2009] • Interval timestamping of RDF triples is adopted • A single time dimension (valid time) is usually considered • Index structures (e.g. tGRIN and keyTree) have been proposed for efficient processing of temporal queries SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  5. A Multi-temporal RDF Database Model • N-dimensionaltime domain: • T = T1 x T2x … x TNTi = [0,UC)i • Multi-temporal RDF triple: • ( s,p,o | T )sis a subjectpis a predicateoisanobjectT Tis a timestamp • Multi-temporal RDF database: • RDF-TDB = { ( s,p,o | T ) | T T } SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  6. Multi-temporal RDF Triples • A temporal triple ( s,p,o | T ) assigns a temporalpertinencetoan RDF triple ( s,p,o ) • The non-temporal triple ( s,p,o )is the value (or the contents) of the temporal triple ( s,p,o | T ) • The temporalpertinenceTis a subset of the time domain T representedby a temporalelement SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  7. TemporalElements • A temporalelement[Gadia 98] isa disjointunionoftemporalintervals • Multi-temporalintervals are obtainedas the Cartesianproductofoneintervalforeachtemporaldimension • T = U1≤j≤mIj = U1≤j≤m [tjs, tje)1 x [tjs, tje)2 x … x [tjs, tje)N • Ij ∩ Ik= Ø forall1≤j<k≤m SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  8. IntegrityConstraint • No value-equivalentdistincttriplesexist: ( s,p,o | T ), ( s,p,o | T  )  RDF-TDB:s=s  p=p  o=o  T=T • The constraintismadepossibleby the adoptionoftemporalelementtimestamping • Temporal elements lead to space saving, whenever the temporal pertinence of a triple is not a convex interval SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  9. Memory Saving with Temporal Elements • For example, even with a monodimensional time domain, the two value-equivalent triples with interval time-stamping ( t2 < t3 ):( s,p,o | [t1, t2) ) and ( s,p,o | [t3, t4)) can bemergedinto a single triple withelementtime-stamping: ( s,p,o | [t1, t2) U [t3, t4)) where the same space is required for the timestamps in both cases (i.e. the space needed by 4 time points) and the contents of the triple is stored twice in the former case and only once in the latter • Different triple versions are stored only once with a complex timestamp instead of storing multiple copies (value-equivalent triples) with a simple timestamp SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  10. An Example • The memory saving obtained with temporal elements grows with the dimensionality of the time domain! • The memory saving is also emphasized by the triple size with respect to the timestamp size • In very large RDF benchmark datasets, the average triple sizeranges from 80140 bytes (DBpedia, UScensus, LUBM, BSBM)to more than 600 bytes (UniProtKB) • The timestamp (date+time) data size in SQL is 68 bytes • In the example which follows we assume a bitemporal domain (valid + transaction time) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  11. Representation of the Evolution of a Triple t0t1 t2 UC (s, p, o1 ) With temporal elements (3 triples needed)( s, p, o1 | [t0,t1)x[t0,UC) U [t1,UC)x[t0,t1) )( s, p, o2 | [t1,t2)x[t1,UC) U [t2,UC)x[t1,t2) )( s, p, o3 | [t2,UC)x[t2,UC) ) • Withtemporalintervals(5 needed) • ( s, p, o1 | [t0,t1)x[t0,UC) )( s, p, o1 | [t1,UC)x[t0,t1) ) ( s, p, o2 | [t1,t2)x[t1,UC) )( s, p, o2 | [t2,UC)x[t1,t2) )( s, p, o3 | [t2,UC)x[t2,UC) ) (s, p, o2 ) (s, p, o3 ) t0 t1 t2UC SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  12. Memory Saving Figures • Percentage space saving with temporal element vs interval timestamping. Avg. number of versions per triple in colums, triple size in bytes in rows. We assume 8-byte timestamps. • For instance, with 120-byte triples with 5 versions per triple on average, we have a 39,22% space saving.With 1 billion of triples, this means an RDF-TDB size of • 721 GB with temporal elements • 1.14 TB with temporal intervals SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  13. QueryOperators • The onlyretrievaloperatorweconsider in this workis a snaphotextractionoperator, which can beusedtoextractanontologyversionfrom a multi-versionontologyrepresentedas a temporal RDF database • Given a timepointt= (t1, t2,…, tN)  T wedefine the RDF database snapshotvalid at tasRDF-TDB(t) = { ( s,p,o ) | ( s,p,o | T )  RDF-TDB  t  T} • The result is a (non-temporal) RDF graph, which can be used to represent the ontology version valid at t SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  14. ModificationOperators – Insertion • Assumingan (N-1)-dimensionaltemporalelementtv (foranymodification, transactiontime[now, UC)isimplied), the insertionoperation INSERT DATA { s,p,o} VALID tv can bedefined via itseffects on the database stateasfollows (using a triple calculus) RDF-TDB  = RDF-TDB U { ( s,p,o | T ) |  ( s,p,o | T )  RDF-TDB  T = coalesce( TU tv x [now, UC) )} U { ( s,p,o | tv x [now, UC) ) | ¬ ( s,p,o | T )  RDF-TDB } SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  15. Maintenanceoftemporalelements • In ordertoensure the results are stilltemporalelements,union and differenceoperationsmustbecarefullydefined • In particular, ifTi (i=1,2) are temporalelementsdefinedasTi = U1≤j≤miIijwhereIijare multidimensionalintervalsthen the difference can becomputedasfollowsT1 \ T2 = U1≤j≤m1I1j\ T2 and isensuredtobe a temporalelementifI1j\ T2 is a temporalelementforeachj • Given the difference, the union can becomputedasfollowsT1 UT2= T1 U (T2 \ T1) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  16. ModificationOperators - Deletion • Assumingan (N-1)-dimensionaltemporalelementtvand a selection predicate pred(s,p,o), the deletionoperation DELETE { s,p,o} VALID tv WHERE pred(s,p,o) can bedefined via itseffects on the database state asfollows RDF-TDB  = RDF-TDB \ { ( s,p,o | T ) |  ( s,p,o | T )  RDF-TDB  pred(s,p,o)  T ∩ tv x [now, UC) ≠ Ø} U { ( s,p,o | T ) |  ( s,p,o | T )  RDF-TDB  pred(s,p,o)  T ∩ tv x [now, UC) ≠ Ø  T  = coalesce( T\ tv x [now, UC) )} SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  17. ModificationOperators - Update • Assumingan (N-1)-dimensionaltemporalelementtv,the update operation UPDATE { s,p,o} SET { s’,p’,o’} VALID tv WHERE pred(s,p,o) isnot primitive, asit can bedefinedas a deleteoperationfollowedbyaninsertoperationasfollows DELETE { s,p,o} VALID tv WHERE pred(s,p,o);INSERT DATA { s’,p’,o’} VALID tv SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  18. Derivationof a newOntologyVersion (1) • We assume the newversionisobtainedbyapplyingchangestoanexistingontologyversion. The parametersneeded are: • OS_Validity: the validtimepointusedtoselect the ontologyversionsusedas base for the derivation • The sequenceofschema changestobeappliedto the selectedversion in orderto produce the newontologyversion • OC_Validity: the validtimeintervalusedtoassign the validityto the newversion (possibly in the past or future) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  19. Derivationof a newOntologyVersion (2) t1 t2 t3 valid time OS_Validity SC_Validity= [ t4, UC ] schema changes t1 t2 t3 t4 valid time SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  20. Transaction • On … • BEGIN TRANSACTION ; • CREATE GRAPH <workVersion> ; • INSERT INTO <workVersion> { ?s, ?p, ?o }WHERE { TGRAPH <tOntology> { ?s, ?p, ?o | ?t } . FILTER ( VALID(?t) CONTAINS OS_Validity && TRANSACTION(?t) CONTAINS current-date() )} ;=> a sequenceofontologychangesacting on the (non–temporal) workVersiongraphgoeshere • DELETE FROM <tOntology> { ?s, ?p, ?o } VALID OC_Validity ; • INSERT INTO <tOntology> { ?s, ?p, ?o } VALID OC_ValidityWHERE { GRAPH <workVersion> { ?s, ?p, ?o } } ; • DROP GRAPH <workVersion> ; • COMMIT TRANSACTION SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  21. OperatorsforOntology Management • On the basisof the primitivesintroduced so far, alsohigh-level macro operatorsfor the management of a multi-version RDF ontologycan bedefinedCREATE_CLASS(Name,Validity)RENAME_CLASS(Class,NewName,Validity) DROP_CLASS(Class,Validity)ADD_SUBCLASS(SubClass,Class,Validity)DEL_SUBCLASS(SubClass,Class,Validity) CREATE_PROPERTY(Name,Range,Validity)RENAME_PROPERTY(Property,NewName,Validity) CHANGE_PROPERTY_RANGE(Property,NewRange,Validity) DROP_PROPERTY(Property,Validity)ADD_PROPERTY(Class,Property,Validity) DEL_PROPERTY(Class,Property,Validity)ADD_SUBPROPERTY(SubProperty,Property,Validity)DEL_SUBPROPERTY(SubProperty,Property,Validity) ………… SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  22. Sample OperatorDefinitions • Forexample the definitionsof some of the property management operatorsis the following • ADD_PROPERTY(Class,Property,Range,Validity)INSERT DATA{ Propertyrdfs:domain Class ;rdfs:rangeRange . } VALID Validity • CHANGE_PROPERTY_RANGE(Property,NewRange,Validity)UPDATE { Propertyrdfs:range ?range }SET { Propertyrdfs:rangeNewRange } VALID Validity • DEL_PROPERTY(Class,Property,Validity)DELETE { Propertyrdfs:domain Class ;rdfs:range ?range . } VALID Validity SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  23. Conclusions • We presented a temporal RDF database model whose distinctive features with respect to previously proposed models are • It is defined on a multi-dimensional time domain • It employs triple timestamping with temporal elements • The adoption of temporal elements in the multi-temporal setting best preserves the scalability property enjoyed by triple storage technologies as it minimizes the database growth (the absence of value-equivalent triples is an integrity constraint) • The data model has been equipped with manipulation operatorsfor the extraction of a temporal snapshot and for the maintenance of the database; moreover, also high-level operators can be defined to be used to manage a multi-version RDF ontology SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

  24. Future Work • Some design choices were motivated by application requirements of an ontology-based personalization service in the legal (or medical) domain. We plan to explore the applicability of the approach also in application fields with more generic requirements • We also plan to consider extensions of the proposed RDF database model, including the development of a complete multi-temporal SPARQL-like query language and the adoption of suitable multi-temporal index structures SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

More Related