1 / 11

Validation of Integrated Enterprise Data With RIF

Validation of Integrated Enterprise Data With RIF. David Schaengold Revelytix, Inc October 21, 2011. Background: enterprise data integration. The Integrated Data Needs Validation. -Even if ultimate data sources are themselves validated

ceana
Download Presentation

Validation of Integrated Enterprise Data With RIF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Validation of Integrated Enterprise Data With RIF David Schaengold Revelytix, Inc October 21, 2011

  2. Background: enterprise data integration

  3. The Integrated Data Needs Validation -Even if ultimate data sources are themselves validated -Even if transmission and manipulation of data is perfect Example: "No person can have more than one ID #"

  4. Why not use just OWL or SPARQL? -More complex examples can't be easily modeled in OWL -It's helpful to have a single list of rules -It's helpful to have a rule syntax Example from financial services: "The ninth digit of a CUSIP must be equal to the sum mod 10 of every other digit of the first eight digits of the CUSIP"

  5. Why not use SWRL or SPIN? -RIF is a standard -RIF has an abstract syntax designed to be interchangeable -RIF is implementation-agnostic (though so is SWRL) -RIF has a more expressive set of built-in predicates -RIF has production rules -RIF has a framework for extensions -RIF can work with RDF, but isn't chained to it

  6. How do we use RIF for validation? -The data is exposed in a single format (RDF) -The data is exposed as a single schema (a "domain ontology") -All the constants in our rules are URIs from the domain ontology (and mapping ontologies, if you want provenance rules as well) -Triples are represented as frames -Rule consequents are always rif:error()

  7. Example Forall ?p1 ?p2 ?x ?y (     rif:error() :-       And(          ?p1[owl:propertyDisjointWith->?p2]         ?x[?p1->?y]          ?x[?p2->?y]      ) )

  8. Problem: RIF-BLD doesn't have NOT() Validation rules have the general form:     rif:error() :-       And( X NOT(Y)) Sometimes you can get the negation using built-ins, but sometimes you need to negate a graph Solution: Add Not(), but use only rif:error() in consequents to keep things nice and monotonic In principle, any zero-argument predicate would work

  9. How do we express violations? -Our rules engine writes all its output to a triple store -Violation output looks like this: [  a  :Violation ;    :violatesRule  "rule label here"^^xsd:string ;    :reproducedBy  "sparql here"^^xsd:string ] -Very important to keep output segregated from input

  10. Entailments are a bonus -OWL-RL can be exhaustively expressed as RIF -There may be domain-specific entailments -We entail before we validate Example: 

  11. Try it out Spyder: expose non-RDF data sources to SPARQL queries http://www.revelytix.com/content/spyder Spinner: federated query across SPARQL services http://www.revelytix.com/content/spinner Rex: forward-chaining RIF rules engine http://www.revelytix.com/content/rex

More Related