390 likes | 510 Views
Irini Fundulaki Giorgos Flouris Institute of Computer Science-FORTH 1st year review Luxembourg, December 2011. WP3: Provenance and Access Control. WP3: Work Plan View. 24. 12. 0. 6. 18. 30. 36. 42. 48. D 3.2 Provenance management and propagation through SPARQL query
E N D
Irini Fundulaki Giorgos Flouris Institute of Computer Science-FORTH 1st year review Luxembourg, December 2011 WP3: Provenance and Access Control
WP3: Work Plan View 24 12 0 6 18 30 36 42 48 D 3.2 Provenance management and propagation through SPARQL query and update languages Task 3.1 Provenance Management FORTH D 3.1 Access control specification language, reasoning and enforcement mechanisms D 3.3 Access control system and privacy-aware language Task 3.2 Privacy, DRM and Access Control FORTH D 3.4 Trust management and inference system Task 3.3 Trust management EPFL
Research Topics, Tasks and Partners • Objective: manage annotations of different forms and semantics over data, related to data access • Research Topics:Provenance, Access Control, Privacy, Digital Rights Management (DRM), Trust Management • Partners: FORTH, EPFL, KIT
Provenance • Wikipedia:“… the originor sourceof something or the history of the ownershipor locationof an object” • W3C Incubator Group:“… is a record that describes entitiesand processesinvolved in producing and delivering or otherwise influencing that resource. […] Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance.”
Provenance • W3C Incubator Group:“With the arrival of massive amounts of Semantic Web Data […], provenance becomes an important factorin developing new Semantic Web applications.” • Applications • Data Trustworthiness, Reputation and Reliability • Information Quality • Data Integration and Exchange • Reproducibility • Argumentation (Decision Justification) • Access Control • Accountability • Reasoning
O I P O´ I´ I P O: coarse grained(workflow or dataflow provenance) I’ P’ O’: fine grained(data provenance) Types of Provenance • Coarse grained provenance used to reproduce a digital object or repeat an experiment (complex programs) P´ • Fine grained provenance refers to the transport of annotations between input and output data (query languages)
Workflow Provenance: Sensor Scenario Readings Sea Temperature & Wind S1 Complex Computation to predict the height of waves Provenance: Complex Program executed on Input Data S2 Readings Sea Temperature & Wind
Sensor Readgs Time Annot. S1 8B t1 00:19 DB Server t2 S2 2B 01:50 R1 R2 Sensor Latitude Readgs Time Annot. Sensor Latitude Annot. {t1,t3} S1 23° 26’ 21”N 8B 00:19 t3 S1 23° 26’ 21”N {t2,t4} S2 23° 26’ 21”N 2B 01:50 t4 S2 23° 26’ 21”N Data Provenance: Sensor Scenario sensor readings R1 R2 Provenance: annotations of the input tuples that contributed to the query results sensor database
R1 R2 Sensor Readgs Time Annot. The annotation of a join tuple is computed using operator x 0 x 0 = 0, 1 x 0 = 0, 1 x 1 = 1 S1 8B Sensor Latitude Readgs Time Annot. 1 00:19 1 0 S1 23° 26’ 21”N 8B 00:19 S2 2B 01:50 0 S2 23° 26’ 21”N 2B 01:50 Sensor Latitude Annot. 1 S1 23° 26’ 21”N 1 S2 23° 26’ 21”N Data Provenance Models • Annotation Models: provenance computation is coupled with a particular application and a particular assignment of the provenance of source data R1 R2 When the annotation of the input tuple changes, we must re-execute the query to obtain the annotation of the result tuples
Data Provenance Models Abstract Models: provenance annotations (referred to as tokens) and operators are abstract. R1 R2 Sensor Readgs Time Annot. The annotation of a join tuple is modeled by the “x” operator S1 8B T1 00:19 T2 S2 2B 01:50 Sensor Latitude Readgs Time Annot. T1 x T3 S1 23° 26’ 21”N 8B 00:19 Sensor Latitude Annot. T2 x T4 S2 23° 26’ 21”N 2B 01:50 T3 S1 23° 26’ 21”N T4 S2 23° 26’ 21”N R1 R2 When the annotation of the input tuple changes, the annotation of the result tuple is re-computed by evaluating the annotation expression only
Data Provenance Models Abstract Models:Abstract tokens and operators are assigned concrete values, only when the concrete value of an annotation must be computed R1 R2 Sensor Readgs Time Annot. S1 8B T1 00:19 T2 1 1 S2 2B 01:50 0 0 Sensor Latitude Readgs Time Annot. T1x T3 S1 23° 26’ 21”N 8B 00:19 Sensor Latitude Annot. T2x T4 S2 23° 26’ 21”N 2B 01:50 T3 S1 23° 26’ 21”N 1 T4 S2 23° 26’ 21”N 1 R1 1 1 R2 • Data Quality Application: • abstract tokens T1, T2, T3, T4 take values 1 and 0 • abstract operator “x” is replaced by logical AND
Abstract Data Provenance Models • Benefits: • in the presence of provenance updates in the input, we need to evaluate the value of the provenance of the affected tuples only • different applications can assign different concrete values to abstract tokens and operators, for the same data • Challenges: Trade-off between provenance storage over computation efficiency • storage of large provenance expressions • efficient computation of provenance for dynamic data
Data Provenance • RDFS reasoning • Given a set of RDF triples whose explicit provenance is known, and RDFS reasoning rules what is the provenance of the implicit RDF triples? • SPARQL • Given a set of RDF triples whose explicit provenance is known, and a SPARQL query, what is the provenance of the query result?
SSN Ontology SSN Ontology (A1, sc, A2) (A2, sc, A3) System System (A1, sc, A3) C2 C2 ? Device Sensor Device Sensor (&r, type, A1) (A1, sc, A2) ? (&r, sc, A2) C1 C3 C1 C3 ? Sensing Device Sensing Device C4 C4 ? type: &s1 &s1 sc (subclassOf): RDFS Reasoning Given a set of RDF triples (RDF Graph) whose explicit provenanceis known, and RDFS entailment rules what is the provenance of the implicit RDF triples?
RDFS Reasoning colorsto capture the provenance of explicitand implicitdataandschemaRDFtriples quadruplesto represent provenance information Provenance model: commutative semi-group structure (C, +) C: set of colors, binary operation “+” to compose colors of the input triples
RDFS Reasoning • Pediaditis P., Flouris G., Fundulaki I., Christophides V. On Explicit Provenance Management in RDF/S Graphs. In Theory and Practice of Provenance (TaPP-2009) • Flouris G., Fundulaki I., Pediaditis P., Theoharis Y., Christophides V. Coloring RDF Triples to capture Provenance. In ISWC 2009.
Provenance for SPARQL • We showed that existing provenance models for positive relational algebra can capture the provenance of SPARQL (without OPTIONAL) • We follow the approach by Karvounarakis et. al. in Provenance Semirings, PODS 2007to develop a model for full SPARQL • records the input tuples and the operators used to compute the query results Given a set of RDF triples (RDF Graph) whose explicit provenanceis known, and a SPARQL query what is the provenance of the result?
SPARQL Query: return the sensor and its latitude subject predicate object prov t1 S1 type Sensor select ?s, ?l where { ?s type Sensor . ?s latitude ?l } t2 S1 Latitude 23° 26’ 21”N t3 S2 type Sensor t4 S2 Latitude 23° 26’ 21”N t5 S1 Readgs &r1 t6 S2 Readgs &r2 t7 &r1 value 8B t8 &r1 time 00:19 &r2 value 2B t9 t10 &r2 time 01:50 Provenance Model for SPARQL+ • K: set of provenance tokens • :operator for SPARQL join • : operator for SPARQL union
?s ?s type Sensor ?s latitude ?l subject predicate object prov S1 1 2 t1 S1 type Sensor S2 t2 S1 Latitude 23° 26’ 21”N 1 3 t3 S2 type Sensor t1 t2 ?l ?s t4 2 4 t3 t4 S2 Latitude 23° 26’ 21”N 23° 26’ 21”N S1 t5 S1 Readgs &r1 23° 26’ 23”N t6 S2 S2 Readgs &r2 t7 &r1 value 8B t8 &r1 time 00:19 &r2 value 2B t9 t10 &r2 time 01:50 Provenance Model for SPARQL+ Q = ?s type Sensor . ?s Latitude ?l The evaluation of a triple pattern over T is a set of mappings (?variable, ?value)
subject predicate object prov t1 S1 type Sensor t2 S1 Latitude 23° 26’ 21”N t3 S2 type Sensor ?l ?s t4 S2 Latitude 23° 26’ 21”N 23° 26’ 21”N S1 t5 S1 Readgs &r1 23° 26’ 23”N t6 S2 S2 Readgs &r2 t7 &r1 value 8B 3 t8 &r1 time 00:19 &r2 value 2B t9 t1 t2 t10 &r2 time 01:50 t3 t4 Provenance Model for SPARQL+ Q = ?s type Sensor . ?s Latitude ?l ?s type Sensor ?s latitude ?l 1 2 ?s ?l ?s 1 3 t1 t2 23° 26’ 21”N S1 S1 2 4 t3 t4 23° 26’ 23”N S2 S2 The result of a join between two triple patterns contains all mappings that have the same value for their common variable(s)
Provenance for SPARQL Theoharis Y., Fundulaki I., Karvounarakis G., Christophides V. On Provenance of Queries on Linked Web Data. In IEEE Internet Computing:Provenance in Web Applications, 2011.
Access Control Refers to the ability to permit or deny the use of a particular resource by a particular entity Crucial for sensitive content since it ensures the selective exposure of information to different classes of users
RDF Access Control • In general, an access control model specifies • the access annotations • conflict resolution policy to resolve ambiguous access annotations • default semantics used to annotate data that are not in the scope of any authorization • Access Authorizations specify (by a query) the access annotations for data
Access Control • Access Annotations can be • boolean values • true/false (grant/deny access permission) • confidentiality levels • low, medium, high • Conflict Resolution Policy depends on the type of access annotations • boolean values: • deny overrides grant access annotation • confidentiality levels • high confidentiality overrides medium, medium overrides low • Default Semantics depend on the type of access annotations
Fine-grained Access Control Framework for RDF Data We encode access annotations of RDF triples using quadruples We propose an abstract access control modeldefined by a set of abstract tokens and abstract operators to model the computation of access annotations of RDF triplesconsideringRDFS inference the propagation of access annotations conflicting and missing access annotations
Abstract Tokens • L:set of abstract access control tokens • Ldefault access token • assigned to triples that have not an explicitly assigned access token
Abstract Operators • Entailment Operator⊙ to compute the access annotations of implied quadruples • Propagation Operator to model the propagation of access annotations • Conflict Resolution Operator to resolve ambiguous access annotations
⊙ ( ) l4 l1 l1 ⊙ l4 ⊙ l4 l1 Entailment Operator ⊙ • binary operator to model the computation of the annotation of an implicit RDF quadruple for the subclass, subproperty and type hierarchies in an RDF graph • Properties: • Associativity: • Commutativity (A1, sc, A2, l1) (A2, sc, A3, l2) (A1, sc, A3, l1⊙l2) ⊙ ⊙ ⊙ ( ) = l2 l2 l4 l1 = The order of the application of inference rules is not important
rdfs:Class rdfs:Class (A2, sc, A3 ,l2) (A1, sc, A2 ,l1) l0 l0 (A1, sc, A3 ,l1 ⊙ l2) System System (A2, sc, A3 ,l2) (&r, type, A2 ,l1) l2 l2 (&r, type, A2 ,l1 ⊙ l2) Device Sensor Device Sensor l1 l2 ⊙ ⊙ l2 l1 l3 l1 l3 ( l4 ⊙ l1 ) Sensing Device Sensing Device l4 l4 l4 l1 ⊙ &s1 &s1 type: sc (subclassOf): Entailment Operator ⊙
Propagation Operator • unary operator to model propagationof access annotationsalong the subclass/subpropertyandtype hierarchiesin an RDF Graph • a class inherits the annotation of its superclass, an instance of a class inherits the annotation of its class, etc. • Properties: • Idempotence: (&r1, type, A1, l2) (A1, type, class, l1) (&r1, type, A1, (l1 )) (( )) = ( ) l0 l0 We do not care how many times an annotation is propagated
rdfs:Class (&r, type, A1 ,l1) (A1, type, rdfs:Class,l2) l0 ((&r, type, A1 , ) l2 System l0 l2 Device Sensor ⊙ l2 l1 l3 ( l4 ⊙ l1 ) Sensing Device l4 &s1 Propagation Operator rdfs:Class l0 System l2 Device Sensor l1 l3 Sensing Device l4 type: &s1 sc (subclassOf):
( l0 l1 ) l2 = l0 ( l1 l2 ) l0 l1 = l1 l0 Conflict Resolution Operator • binary operator to resolve ambiguous access labels • Properties: • Associativity: • Commutativity: • Idempotence: (A1, sc, A2, L1) (A1, sc, A2, L2) (A1, sc, A2, L1L2) l1 l1 = l1
Computing Abstract Access Control Annotations assign access annotations to triples of the RDF graph to obtain quadruples apply RDFS inference rules on quadruples to obtain the implicit annotated quadruples apply propagation rules on quadruples to compute their propagated annotations apply the conflict resolution operator to resolve ambiguities
rdfs:Class l0 l0 System l5 l5 ⊙ ⊙ l3 l3 l2 l5 Device Sensor l1 l3 Sensing Device ( ( ⊙ ⊙ l2 l2 ) ) ⊙ ⊙ l0 l0 l1 l1 ) (SensingDevice, type, rdfs:Class, ( ( ) ) ⊙ ⊙ l0 l0 Computing Abstract Access Control Annotations (example) rdfs:Class l0 System l5 l2 Device Sensor l1 l3 Sensing Device
Concrete Policies • A concrete policy assigns concrete values to the abstract tokens and operators • Example • Boolean values assigned to abstract tokens • false: deny access • true: grant access • Conjunction assigned to entailment operator • an implied triple is accessible iff all its implying triples have been granted access • Disjunction assigned to Conflict Resolution operator • grant overrides deny annotation • Identity assigned to propagation operator
l5 ⊙ l3 ) (( ) ⊙ l0 ( l0 ) Concrete Policy (example) Assignment of abstract tokens to values l0 l2 l1 false (F) l5 l3 true (T) Assignment of abstract operators to concrete ones propagation (¬) negation ⊙ () entailment conjunction () conflict resolution disjunction ) (SensingDevice, type, rdfs:Class, l1 ⊙ l2 ) ⊙ l0 ) (( (¬ F) T T) T) (F F ) F ) ) (SensingDevice, type, rdfs:Class, ( (( T
References Flouris G., Fundulaki I., Michou M., Papakonstantinou V., Antoniou G. Access Control for RDFS Graphs Using Abstract Models. Ongoing work.
l0 l0 l5 l5 l5 ⊙ ⊙ ⊙ l3 l3 l3 l1 l2 ⊙ ( ( ⊙ ⊙ l2 l2 ) ) ⊙ ⊙ l0 l0 l1 l1 ( ( ) ) ⊙ ⊙ l0 l0 Computing Abstract Access Control Expressions (example) (A2, sc, A3 ,l2) (A1, sc, A2 ,l1) (A2, sc, A3 ,l2) (&r, type, A2 ,l1) (&r, type, A1 ,l1) (A1, type, rdfs:Class,l2) (A1, sc, A3 ,l1 ⊙ l2) (&r, type, A2 ,l1 ⊙ l2) (&r, type, A1 , ) l2 rdfs:Class rdfs:Class l0 l0 System System l5 l2 l2 l5 Device Sensor Device Sensor l1 l3 l1 l3 Sensing Device Sensing Device ) (SensingDevice, type, rdfs:Class,