1 / 21

Provenance Views for Module Privacy

Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova Milo Tel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn. Provenance Views for Module Privacy. Data-oriented Workflows Must Be Secure. Secure. Ref . Tova Milo’s keynote, PODS 2011. Discrete.

chika
Download Presentation

Provenance Views for Module Privacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Susan B. Davidson U. Penn SanjeevKhanna U. Penn Tova Milo Tel-Aviv U. DebmalyaPanigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy

  2. Data-oriented Workflows Must Be Secure Secure Ref. Tova Milo’s keynote, PODS 2011 Provenance Views for Module Privacy Discrete

  3. Workflows TGCCGTGTGGCTAAAT CTGTGC … CTAAATGTCTGTGC… GGCTAAATGTCTG Vertices = Modules/Programs TGCCGTGTGGCGTC… Split Entries ATCCGTGTGGCT.. d1 Edges = Dataflow Align Sequences Format-1 d3 d2 • In an executionof the workflow, data (values) appear on the edges Functional Data Curate Annotations d5 Format-3 Format-2 d6 d7 d4 Construct Trees Provenance Views for Module Privacy

  4. Need for Provenance TGCCGTGTGGCTAAAT CTGTGC … CTAAATGTCTGTGC… s GGCTAAATGTCTG Biologist’s workspace TGCCGTGTGGCGTC… ATCCGTGTGGCT.. Split Entries Align Sequences Format Curate Annotations Functional Data How has this tree been generated? ? ? ? ? Format Format • Enable sharing and reuse • Ensure repeatability and debugging Which sequenceshave been used to produce this tree? Construct Trees t Provenance Views for Module Privacy

  5. Need for Provenance Need for Privacy s How has this result been produced? Split Entries … TGCC… ATGGCC Align Sequences All data values Workflow USER Workflow OWNER Format Curate Annotations Functional Data The flow/structure should not be revealed! My module is proprietary! Format Format My data is sensitive! Construct Trees t Provenance Views for Module Privacy

  6. Module Privacy • Module f takes input x, produces output y = f(x) • User should not be able to guess (x, f(x)) pairs with high probability (over any number of executions) • Output value f(x) is private, not the algorithm for f x3 x2 x4 x1 Module f f(x1,x2, x3, x4) = <y1,y2, y3> y3 y2 y1 Provenance Views for Module Privacy

  7. Module Privacy: Motivation Process Record Medical Record of patient P Patient P’s concern: Whether P has AIDS should not be inferred given his medical record Module owner’s concern: No one should be able to simulate the module and use it elsewhere f = Check for AIDS Check for Cancer x’ = x = Create Report f(x) = Does P have AIDS? Does P have cancer? report Provenance Views for Module Privacy

  8. Module Privacy in a Workflow a1 a2 • PrivateModules (no a priori knowledge to the user) • Module for AIDS detection • PublicModules (full knowledge to the user) • Sorting, reformatting modules Data Sharing m1 a5 a3 a4 m2 m3 n modules are connected as DAG a6 a7 • Private module f,  input x, f(x) should not be revealed Provenance Views for Module Privacy

  9. Module Privacy with Secure View • Privacy Definition: L-diversity [MGKV’ 06] • By hidingsome input/output attributes, eachx has L differentequivalentpossibilities for f(x) • Output viewiscalled a ‘Secure-view’ Differential privacy? [Dwork’ 06, DMNS’ 06, …] • (Usual) Random noise cannot be added • Scientific experiments must be repeatable • Any f should always map any x to the same f(x) Provenance Views for Module Privacy

  10. Standalone Module Privacy • A view: Projection of R on visible attributes • Privacy parameter Γ (eg. Γ = 2) • Γ-standalone-private View: every input x can be mapped to Γ different outputs by the “possible worlds” • Possible World: Relation that agrees with R on visible attributes (and respects the functional dependency) Module f Relation R for f Functional dependency: x1, x2y x1 x2 y = (x1  x2) y = (x1 ≠ x2) y Provenance Views for Module Privacy

  11. Workflow Module Privacy a1 a2 1. a1, a2 a3, a4, a5 2. a3, a4a6 3. a4, a5a7 • A view: Same as before • Γ-workflow-private view: privacy for each private module as before • Possible world: Relation that agrees with R on visible attributes (and respects ALLfunctional dependencies) Workflow W Relation R has n func. dependencies m1 a3 a4 a5 m2 m3 a6 a7 Provenance Views for Module Privacy

  12. Secure-View Optimization Problem • Conflicting interests of Owner and User • Hiding each data/attribute has a cost Owner: Privacy User: Provenance • Secure-view problem:Minimize the sum of the cost of the hidden attributes while guaranteeing Γ-workflow-privacy of all private modules Provenance Views for Module Privacy

  13. Let’s start with a Single Module How hard is the secure-view problem for a standalone module? V V (Visible attributes) PROBLEM-2 PROBLEM-1 V is safe? ORACLE A safe subset V* with minimum cost V is safe? PROBLEM-1 • Communication Complexity: (N), N = #rows in R • R is given explicitly • Computation Complexity: Co-NP-hard in k = #attributes of R • R is given succinctly PROBLEM-2 • Communication Complexity: 2(k) oracle calls are needed Provenance Views for Module Privacy

  14. Any Upper Bound? • The trivial brute-force algorithm solves the problem in time O(2kN2) • k = #attributes of R, N = #rows of R • Can return ALL safe subsets: useful for the next step • Not so bad: • k is not too large for a single module • A module is reused in many workflows • Expert knowledge from the module designers can be used to speed up the process Provenance Views for Module Privacy

  15. Moving on to General Workflows • Workflows have • Arbitrary data sharing, arbitrary (DAG) connection • Interactions between private and public modules • Trivial algorithms are not good • Leads to running time = exponential in n • We use the (list of) standalone safe subsets for private modules • First consider:Workflows with all private modules • Two Steps: • Show that, any combination of safe-subsets for standalone privacy is also safe for workflow privacy (Composability) • Find the minimum cost safe subset for workflow (Optimization) Provenance Views for Module Privacy

  16. Composability • Key idea: When a module m is placed in a workflow, and the same attribute subset V is hidden, • #possible worlds shrinks but not #possible outputs of • the inputs • Proof involves showing existence of a possible world • “All-private workflow” assumption is necessary Provenance Views for Module Privacy

  17. Optimally Combining Standalone Solutions • Any combination of safe subsets works • We want one with minimum cost • Solve the optimization problem for workflow given the list of options for each individual module • The simplest version (no data sharing) is NP-hard • In the paper: • Approximation and matching hardness results of different versions • Bounded data sharing has better approximation ratio Provenance Views for Module Privacy

  18. Workflows with Public Modules • Public modules are difficult to handle • Composability does not work Solution: Privatize some public modules • Names of “privatized” modules are not revealed • Now composability works Privatization has an additional cost Worse approximation results f1(x) = y Private f2(y) = y Public Provenance Views for Module Privacy

  19. Related Work • Workflow privacy (mainly access control) • Chebotko et. al. ’08, Gil et. al. ’07, ’10 • Secure provenance • Braun et. al. ’08, Hasan et. al. ’07, Lyle-Martin ’10 • Privacy-preserving data mining • Surveys by Aggarwal-Yu ’08, Verykios et. al. ’04 • Privacy in statistical databases • Survey by Dwork ’08 Provenance Views for Module Privacy

  20. Conclusion and Future Work • This is a first step to handling module privacy in a network of modules • Future Directions: • Explore alternative notion of privacy/partial background knowledge • Explore alternative “privatization” techniques for public modules • Handle infinite/very large domains of attributes Provenance Views for Module Privacy

  21. Thank You. Questions? Provenance Views for Module Privacy

More Related