110 likes | 126 Views
Explore the use of RIF (Rule Interchange Format) for validating integrated enterprise data. Learn why RIF is a better choice than OWL, SPARQL, SWRL, or SPIN. Discover how to express validation rules, handle negation, and represent violations using RIF. Take advantage of RIF's ability to entail and validate data simultaneously.
E N D
Validation of Integrated Enterprise Data With RIF David Schaengold Revelytix, Inc December 7, 2011
The Integrated Data Needs Validation -Even if ultimate data sources are themselves validated -Even if transmission and manipulation of data is perfect Example: "No person can have more than one ID #"
Why not use just OWL or SPARQL? -More complex examples can't be easily modeled in OWL -It's helpful to have a single list of rules -It's helpful to have a rule syntax Example from financial services: "The ninth digit of a CUSIP must be equal to the sum mod 10 of every other digit of the first eight digits of the CUSIP"
Why not use SWRL or SPIN? -RIF is a standard -RIF has an abstract syntax designed to be interchangeable -RIF is implementation-agnostic (though so is SWRL) -RIF has a more expressive set of built-in predicates -RIF has production rules -RIF has a framework for extensions -RIF can work with RDF, but isn't chained to it
How do we use RIF for validation? -The data is exposed in a single format (RDF) -The data is exposed as a single schema (a "domain ontology") -All the constants in our rules are URIs from the domain ontology (and mapping ontologies, if you want provenance rules as well) -Triples are represented as frames -Rule consequents are always rif:error()
Example Forall ?p1 ?p2 ?x ?y ( rif:error() :- And( ?p1[owl:propertyDisjointWith->?p2] ?x[?p1->?y] ?x[?p2->?y] ) )
Problem: RIF-BLD doesn't have NOT() Validation rules have the general form: rif:error() :- And( X NOT(Y)) Sometimes you can get the negation using built-ins, but sometimes you need to negate a graph Solution: Add Not(), but use only rif:error() in consequents to keep things nice and monotonic In principle, any zero-argument predicate would work
How do we express violations? -Our rules engine writes all its output to a triple store -Violation output looks like this: [ a :Violation ; :violatesRule "rule label here"^^xsd:string ; :reproducedBy "sparql here"^^xsd:string ] -Very important to keep output segregated from input
Entailments are a bonus -OWL-RL can be exhaustively expressed as RIF -There may be domain-specific entailments -We entail before we validate Example:
Try it out Spyder: expose non-RDF data sources to SPARQL queries http://www.revelytix.com/content/spyder Spinner: federated query across SPARQL services http://www.revelytix.com/content/spinner Rex: forward-chaining RIF rules engine http://www.revelytix.com/content/rex