110 likes | 207 Views
Validation of Integrated Enterprise Data With RIF. David Schaengold Revelytix, Inc October 21, 2011. Background: enterprise data integration. The Integrated Data Needs Validation. -Even if ultimate data sources are themselves validated
E N D
Validation of Integrated Enterprise Data With RIF David Schaengold Revelytix, Inc October 21, 2011
The Integrated Data Needs Validation -Even if ultimate data sources are themselves validated -Even if transmission and manipulation of data is perfect Example: "No person can have more than one ID #"
Why not use just OWL or SPARQL? -More complex examples can't be easily modeled in OWL -It's helpful to have a single list of rules -It's helpful to have a rule syntax Example from financial services: "The ninth digit of a CUSIP must be equal to the sum mod 10 of every other digit of the first eight digits of the CUSIP"
Why not use SWRL or SPIN? -RIF is a standard -RIF has an abstract syntax designed to be interchangeable -RIF is implementation-agnostic (though so is SWRL) -RIF has a more expressive set of built-in predicates -RIF has production rules -RIF has a framework for extensions -RIF can work with RDF, but isn't chained to it
How do we use RIF for validation? -The data is exposed in a single format (RDF) -The data is exposed as a single schema (a "domain ontology") -All the constants in our rules are URIs from the domain ontology (and mapping ontologies, if you want provenance rules as well) -Triples are represented as frames -Rule consequents are always rif:error()
Example Forall ?p1 ?p2 ?x ?y ( rif:error() :- And( ?p1[owl:propertyDisjointWith->?p2] ?x[?p1->?y] ?x[?p2->?y] ) )
Problem: RIF-BLD doesn't have NOT() Validation rules have the general form: rif:error() :- And( X NOT(Y)) Sometimes you can get the negation using built-ins, but sometimes you need to negate a graph Solution: Add Not(), but use only rif:error() in consequents to keep things nice and monotonic In principle, any zero-argument predicate would work
How do we express violations? -Our rules engine writes all its output to a triple store -Violation output looks like this: [ a :Violation ; :violatesRule "rule label here"^^xsd:string ; :reproducedBy "sparql here"^^xsd:string ] -Very important to keep output segregated from input
Entailments are a bonus -OWL-RL can be exhaustively expressed as RIF -There may be domain-specific entailments -We entail before we validate Example:
Try it out Spyder: expose non-RDF data sources to SPARQL queries http://www.revelytix.com/content/spyder Spinner: federated query across SPARQL services http://www.revelytix.com/content/spinner Rex: forward-chaining RIF rules engine http://www.revelytix.com/content/rex