1 / 20

A Model for Sharing of Confidential Provenance Information in a Query Based System

A Model for Sharing of Confidential Provenance Information in a Query Based System. Meiyappan Nagappan Mladen A. Vouk North Carolina State University. June 17 th , 2008 IPAW 2008. Agenda. Problem Motivation A scenario: Sharing Provenance Research Objective Implementation Model

galena
Download Presentation

A Model for Sharing of Confidential Provenance Information in a Query Based System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IPAW 2008 A Model for Sharing of Confidential Provenance Information in a Query Based System Meiyappan Nagappan Mladen A. Vouk North Carolina State University June 17th, 2008 IPAW 2008

  2. Agenda Problem Motivation A scenario: Sharing Provenance Research Objective Implementation Model Discussions Conclusion Future Work IPAW 2008

  3. Problem Motivation Provenance is increasingly being used as part of analyses to speed-up the process, extend its scope beyond raw data, and enable handling of very large data sets. Attendant problem: Sharing of provenance information Keeping this information appropriately but selectively confidential/protected Confidentiality: “Ensuring that information is accessible only to those authorized to have access” – ISO/IEC - 17799 IPAW 2008

  4. IPAW 2008 Problem Motivation • Unauthorized access of provenance could be used to • Reverse engineer a process • Compromise the privacy of the user • Etc. • On the other hand, lack of sharing for the sake of confidentiality could hinder scientific discovery • Frequent current solution: export and mail the data that is to be shared • Duplication of data – large meta-data sets and growing • A typical simulation may generate ~ 1GB of meta data • Cannot revoke access

  5. Scenario: Sharing Provenance IPAW 2008 R1 S11 S12 B R2 S21 A R3 S31 S32 S33 C

  6. Research Goal The goal of current work is to develop a model, in the context of provenance for scientific simulations that Enables easy sharing of provenance data Allows for dynamic changes in the confidentiality levels to serve multiple and different users Does not compromise the confidentiality of the provenance data (including privacy) IPAW 2008

  7. Implementation Model - Architecture IPAW 2008 Authorization Service Web Interface to Query Provenance Super Computer running Simulations Record Query API API Provenance Store Laptop running Kepler MGMT. API

  8. Sub Goals Sub Goal 1: Person who generates simulation data – owner of original provenance data Sub Goal 2 : Users cannot edit/delete Administrator can but must leave audit trail Sub Goal 3: Owner can annotate their data Sub Goal 4: Owner can choose collaborators Sub Goal 5: Auditors have full read only access IPAW 2008 • Goal is to build a model that enables sharing provenance in an environment where the confidentiality level changes dynamically • We attempt to achieve the Goal through the following 5 objectives (sub-goals)

  9. What? Person who generates simulation data is owner of original provenance data Why? Each dataset is clearly traced to one owner What is the risk? Dispute on who has the authority to share the data in the first place Implementation? 3 Tiered: Client – Application Logic – Database Approach Sub Goal 1 IPAW 2008

  10. What? Editing and Audit Trail No edits/deletes by owner, collaborator, other users Administrator can edit, but must leave audit trail Why? Consistency of data (particularly shared data) Auditing Risk? Each time the collaborator may get different results How? Restrict privileges at DB level Log all super user actions Sub Goal 2 IPAW 2008 Provenance Store MGMT. API

  11. Sub Goal 3 What? Data Annotation Why? User specified meta data Collaborator may have different interpretation Risk? Loss of valuable meta data about provenance Cannot flag inaccurate data – therefore need delete privileges How? Annotation field in all tables of schema. Through WI, annotate Provenance Data and Saved Queries IPAW 2008 Query WI API Provenance Store

  12. Sub Goal 4 What? Data Sharing with dynamically changing confidentiality levels Why? To share data on “What You See Is What You Want To Share” basis Each time a different subset of the data Risk? Share entire data set or nothing Disk space wasted for saving a separate copy of subset How? Query Sharing IPAW 2008

  13. IPAW 2008 Sub Goal 4(contd.) API User Authorization DB Username Password Authenticate Request Data Execute Query Return Data • Query Table • Query ID • Saved by • Saved for • Query • Timestamp • Allow Cascading • Revoke Active Save data for Collaborator Save the Query View Queries Saved for me by other Collaborators View Data Saved in Query for me by other Collaborators • Annot Table • Query ID • User ID • Annotation • Viewable Annotate the Query

  14. Why Query Sharing Dynamically decide what to share Size of the set of information to be shared is large Subset of information rather than individual records Sub Goal 4(contd.) IPAW 2008

  15. What? Data Audit and Verification Why? Prevent tampering by malicious users Maintain Accuracy Risk? Collaborators may try to break system Administrators may misuse super user privileges How? Authorized and authenticated auditors Full Read only access to – Original data, Provenance data, Annotations Edit trails and logs of super user actions Sub Goal 5 IPAW 2008

  16. Issues The model is Query Centric Automatic run time collection of provenance data required. Restricted to provenance data from scientific workflow systems. Collaborator can annotate shared subset only as a whole. Does not address issues in long term storage and scalability IPAW 2008

  17. Conclusion With increase in emphasis on provenance data collection in scientific workflows, the issue of its confidentiality becomes more important Not much research done in this area of provenance This model addresses the confidentiality in a collaborative environment. Tradeoff – Disk Space:Time :: Query Sharing:Data Sharing IPAW 2008

  18. Validating our model against other solutions using different threat scenarios Responsibility of sharing data is with user Privacy of user is at stake Tools required to foresee inferences from provenance data Large data sets: Provenance data and shared queries grow steadily in size Accessing them will be difficult Tools required to improve the HCI aspect Future Work IPAW 2008

  19. IPAW 2008 Questions?

  20. Related Work: References [1] Hasan, R., Sion, R. and Winslett, M.: Introducing secure provenance: problems and challenges Proceedings of the 2007 ACM workshop on Storage security and survivability, ACM, Alexandria, Virginia, USA, (2007). pp 13-18 [2] Griffiths, P.P. and Wade, B.W.: An authorization mechanism for a relational database system. ACM Transactions on Database Systems,(Sep 1976)., 1 (3). 242-255. [3] Sandhu, R. and Samarati, P. 1996.: Authentication, access control, and audit. ACM Computer Survey 28, 1 (Mar. 1996), 241-243. DOI = http://doi.acm.org/10.1145/234313.234412 [4] Tan, V., Groth, P., Miles, S., Jiang, S., Munroe, S., Tsasakou, S. and Moreau, L.: Security Issues in a SOA-Based Provenance System. LNCS, Volume 4145 (Provenance and Annotation of Data). pp. 203-211. Springer Berlin / Heidelberg (2006) IPAW 2008

More Related