Validation of Integrated Enterprise Data With RIF

Validation of Integrated Enterprise Data With RIF David Schaengold Revelytix, Inc October 21, 2011

Background: enterprise data integration

The Integrated Data Needs Validation -Even if ultimate data sources are themselves validated -Even if transmission and manipulation of data is perfect Example: "No person can have more than one ID #"

Why not use just OWL or SPARQL? -More complex examples can't be easily modeled in OWL -It's helpful to have a single list of rules -It's helpful to have a rule syntax Example from financial services: "The ninth digit of a CUSIP must be equal to the sum mod 10 of every other digit of the first eight digits of the CUSIP"

Why not use SWRL or SPIN? -RIF is a standard -RIF has an abstract syntax designed to be interchangeable -RIF is implementation-agnostic (though so is SWRL) -RIF has a more expressive set of built-in predicates -RIF has production rules -RIF has a framework for extensions -RIF can work with RDF, but isn't chained to it

How do we use RIF for validation? -The data is exposed in a single format (RDF) -The data is exposed as a single schema (a "domain ontology") -All the constants in our rules are URIs from the domain ontology (and mapping ontologies, if you want provenance rules as well) -Triples are represented as frames -Rule consequents are always rif:error()

Example Forall ?p1 ?p2 ?x ?y ( rif:error() :- And( ?p1[owl:propertyDisjointWith->?p2] ?x[?p1->?y] ?x[?p2->?y] ) )

Problem: RIF-BLD doesn't have NOT() Validation rules have the general form: rif:error() :- And( X NOT(Y)) Sometimes you can get the negation using built-ins, but sometimes you need to negate a graph Solution: Add Not(), but use only rif:error() in consequents to keep things nice and monotonic In principle, any zero-argument predicate would work

How do we express violations? -Our rules engine writes all its output to a triple store -Violation output looks like this: [ a :Violation ; :violatesRule "rule label here"^^xsd:string ; :reproducedBy "sparql here"^^xsd:string ] -Very important to keep output segregated from input

Entailments are a bonus -OWL-RL can be exhaustively expressed as RIF -There may be domain-specific entailments -We entail before we validate Example:

Try it out Spyder: expose non-RDF data sources to SPARQL queries http://www.revelytix.com/content/spyder Spinner: federated query across SPARQL services http://www.revelytix.com/content/spinner Rex: forward-chaining RIF rules engine http://www.revelytix.com/content/rex

Validation of Integrated Enterprise Data With RIF

Validation of Integrated Enterprise Data With RIF

Presentation Transcript

Reviewing patient with RIF pain

Data Validation

Experimental Validation of Microarray Data

DATA VALIDATION

Data Validation

Validation of EC data

VALIDATION OF INTEGRATED POLICY USING ALLOY

Robocheck – Integrated Code Validation Tool

RIF-

Regional Data Validation

Validation of the Dimensions of Computational Model with Anatomical Data

The Integrated Enterprise Data Warehouse Engineering

Validation Data

Integrated Method Development and Validation

Geographic data validation

Development of Indicators for Integrated System Validation

Validation of Equipment Goods Data

INTEGRATED SMALL ENTERPRISE DEVELOPMENT

Facilities Management integrated with Enterprise Service Management

Validation of Integrated Enterprise Data With RIF