• 270 likes • 349 Views
WP3: Data Provenance and Access Control. Giorgos Flouris, Irini Fundulaki, Vassilis Papakonstantinou, FORTH September 9-10, 2013, Heraklion. Presentation Outline. WP3 status and outline Research achievements D3.2 status Review comments Health use case description Demo
E N D
WP3: Data Provenance and Access Control Giorgos Flouris, Irini Fundulaki, Vassilis Papakonstantinou, FORTH September 9-10, 2013, Heraklion
Presentation Outline • WP3 status and outline • Research achievements • D3.2 status • Review comments • Health use case description • Demo • Next steps (on demo)
WP3: Work Plan View 24 12 0 6 18 30 36 42 Task 3.1 Provenance Management FORTH D 3.3 Access control system and privacy-aware language D 3.2 Provenance management and propagation through SPARQL query and update languages Task 3.2 Privacy, DRM and Access Control FORTH, KIT D 3.1 Access control specification language, reasoning and enforcement mechanisms D 3.4 Trust management and inference system Task 3.3 Trust Management EPFL
Research So Far (Outline) • Abstract models for access control (FORTH) • Abstract models for provenance (FORTH) • Provenance for SPARQL query • Provenance for SPARQL update • Privacy (KIT) • Privacy in smart grids (not integrated) • Some integration in the demo • Problems (non-critical) – to be discussed • Trust (EPFL)
Access Control • The selective exposure of information to different users/roles • Useful for applications involving sensitive information • In the context of LOD: • Encourages publication of data that may include sensitive information • Standard approach: • Data annotates with specific tags determining whether it should be accessible by specific users/roles
Abstract Labels • Triples associated with abstract labels • A set of abstract tokens (a1, a2, …) • Explicit triples associated with such tokens via authorizations • Abstract operators (⊙, , ) • a1 ⊙a2: the triple occurred via inference from triples with labels a1, a2 • a1: the triple occurred via propagation from a triple with label a1 • a1 a2: the triple occurred in two different manners, one via a1, one via a2 (e.g., two different authorizations) • a1 (a2 ⊙(a3)): …
Determining Accessibility • Concrete policy • Associate tokens to concrete values • Associate operators to concrete operations • Determine whether the final value corresponds to an accessible triple (access function) • Example • a1=1, a2=2, a3=3 • ⊙=min, =max, =ID function • Accessible iff result >1 • a1 (a2 ⊙(a3)) evaluates to 2 (i.e., triple is accessible)
SPARQL Query Provenance • What is the provenance of the result of a complex SPARQL query? • Adapting relational solutions • Positive fragment (semirings) • Works fine • Non-monotonic fragment (m-semirings) • Problem with OPTIONAL, DIFFERENCE • Different semantics than SQL • Two alternative approaches • m-semirings: translation to SQL • spm-semirings: a new operation (and the corresponding properties) to capture the provenance of OPTIONAL, DIFFERENCE
SPARQL Update Provenance • What is the provenance of a new triple, inserted via a complex SPARQL Update? • Similar to CONSTRUCT (query) • But still different • CONSTRUCT creates a new triple but does not modify the dataset • Updates specify explicitly the named graph to put the new triple(s) • Triples with different provenance may be put in the same named graph • Named graphs alone are not sufficient for capturing the provenance of updates
D3.2 Status • Contents of D3.2 • Abstract models for provenance (very similar to the abstract models for access control) • Provenance for SPARQL query results • Provenance for SPARQL update (inserted triples) • Review version uploaded on the wiki on 05/09/13 • http://wiki.planet-data.eu/web/D3.2 • Only one reviewer at the moment (Oscar) • Volunteers?
Review Comments • Generally happy (“impressed by D3.1”) • Applicability • Usefulness: convince industry to look into that • Focus on a real-world use case to demonstrate value • In a nutshell • Some implementation to show value • Solution: demo (use case) • Health use case • Also suitable to show synergy
Health Use Case • A use case to show applicability and usefulness • In collaboration with Computational Medicine Laboratory (CML) of FORTH • Health-related data are sensitive • Proposed by the reviewers (Anders Tornquist) • Insurance companies need controlled access to sensitive medical data to determine premiums, insurance policies, contract terms etc • Relevant to access control/privacy challenges • But also related to streaming, data quality and trust
Personal Health Record • Personal Health Record (PHR) • Collection of data regarding a patient • Diseases, personal information, medications, clinical observations and findings, measurements, … • Properties • Sensitive • Dynamic, sometimes streaming • Not always of good quality
Relation to Other WPs • Relation to WP1 • Part of the PHR data may be of streaming nature • E.g., vital signs’ measurements of hospitalized patients • Relation to WP2 • Data often of poor quality • Up to 26,9% of the data can be erroneous • Patient provides data, faulty readings, sensors etc • Suggestion (for the review) • Outline how the technologies developed in WP1, WP2 could be used (potentially) to address these issues • Specific and concrete, but no implementation needed
Access Control and Privacy • PHR (normally) accessible only by the patient • Sensitive data • Doctors, nurses, hospitals, insurance companies, public services may require access • Informed Consent • Patient allows access to (parts of) his PHR to specific entities, for a specific purpose, in a specific timeframe etc • Via Consent Forms • Formal, legal document
Objectives • We will use this use case to demonstrate the benefits of our approach • Different entities have access to the same data, without accessing sensitive information • Unless the owner of the data has explicitly allowed so (via the consent form) • Without replication
Health Use Case Setting Dataset (collection of PHRs) Dataset Dataset Dataset Dataset Dataset
PACEM API SPARQL to SQL Translation Module AAC API Annotation Module Evaluation Module • MonetDB • Abstract expressions DB Update Module Architecture (Data Access) AUTH API • User credentials for authentication • User interface • authentication • queries AUTH Module AUTH DB user request (accessing entity, SPARQL query) result(triples) CPRP API • Purpose and role hierarchy • Assignment of concrete policies to accessing entities accessing entity SPARQL CPRP Module concrete policy CPRP DB result (triples) SQL,concrete policy
Dataset • Advanced Patient Data Generator (APDG) • Synthetic, but realistic data • Developed in the context of EURECA (FP7 IP) • Data associated with large medical schemas • HL7-RIM, SNOMED-CT • 10K patients • 750K instance triples
Data on HL7-RIM (2/2) … Role Participation Entity Observation “Sally Berry” foaf:name http://kandel…./entityno/BC_ZSH2012A1000000 http://kandel.…/obsno/5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3 …
Data on SNOMED-CT (1/2) http://purl.bioontology…./408643008 skos:prefLabel “Infiltrating duct carcinoma of breast” http://kandel.…/obsno/5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3 Observation indicating that the patient has“infiltrating duct carcinoma of breast”
Data on SNOMED-CT (2/2) Neoplasm of breast Carc. in situ of breast Malignant tumor of breast Carc. of breast Lobular carc. in situ of breast Intraductal carc. in situ of breast Infiltrating duct carc. of breast Infiltrating lobular carc. of breast
HL7-RIM and SNOMED-CT Neoplasm of breast Carc. in situ of breast Malignant tumor of breast Carc. of breast Lobular carc. in situ of breast Intraductal carc. in situ of breast Entity Infiltrating duct carc. of breast Infiltrating lobular carc. of breast “Sally Berry” Observation foaf:name http://kandel…./entityno/BC_ZSH2012A1000000 http://kandel.…/obsno/5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3 …
Demo Scenario • Breast Cancer Action Fund (BCAF) provides benefits for cancer patients • Requires info on patients’ status to give the benefit • Sally Berry wants to apply for the benefit • Alternative: insurance company wants access to (part of) the data for determining the insurance premium and the contract terms • Demo: http://daphne.ics.forth.gr:8084/pd-demo/login.jsp
Next Steps • Make more explicit the benefit of abstract models • Efficient updates (no recomputation required) • Efficient change of policies (no recomputation required) • Try more scenarios • Purpose and role hierarchies • More functionality