230 likes | 373 Views
Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova Milo Tel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn. Provenance Views for Module Privacy. Data-oriented Workflows Must Be Secure. Secure. Ref . Tova Milo’s keynote, PODS 2011. Discrete.
E N D
Susan B. Davidson U. Penn SanjeevKhanna U. Penn Tova Milo Tel-Aviv U. DebmalyaPanigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy
Data-oriented Workflows Must Be Secure Secure Ref. Tova Milo’s keynote, PODS 2011 Provenance Views for Module Privacy Discrete
Workflows TGCCGTGTGGCTAAAT CTGTGC … CTAAATGTCTGTGC… GGCTAAATGTCTG Vertices = Modules/Programs TGCCGTGTGGCGTC… Split Entries ATCCGTGTGGCT.. d1 Edges = Dataflow Align Sequences Format-1 d3 d2 • In an executionof the workflow, data (values) appear on the edges Functional Data Curate Annotations d5 Format-3 Format-2 d6 d7 d4 Construct Trees Provenance Views for Module Privacy
Need for Provenance TGCCGTGTGGCTAAAT CTGTGC … CTAAATGTCTGTGC… s GGCTAAATGTCTG Biologist’s workspace TGCCGTGTGGCGTC… ATCCGTGTGGCT.. Split Entries Align Sequences Format Curate Annotations Functional Data How has this tree been generated? ? ? ? ? Format Format • Enable sharing and reuse • Ensure repeatability and debugging Which sequenceshave been used to produce this tree? Construct Trees t Provenance Views for Module Privacy
Need for Provenance Need for Privacy s How has this result been produced? Split Entries … TGCC… ATGGCC Align Sequences All data values Workflow USER Workflow OWNER Format Curate Annotations Functional Data The flow/structure should not be revealed! My module is proprietary! Format Format My data is sensitive! Construct Trees t Provenance Views for Module Privacy
Module Privacy • Module f takes input x, produces output y = f(x) • User should not be able to guess (x, f(x)) pairs with high probability (over any number of executions) • Output value f(x) is private, not the algorithm for f x3 x2 x4 x1 Module f f(x1,x2, x3, x4) = <y1,y2, y3> y3 y2 y1 Provenance Views for Module Privacy
Module Privacy: Motivation Process Record Medical Record of patient P Patient P’s concern: Whether P has AIDS should not be inferred given his medical record Module owner’s concern: No one should be able to simulate the module and use it elsewhere f = Check for AIDS Check for Cancer x’ = x = Create Report f(x) = Does P have AIDS? Does P have cancer? report Provenance Views for Module Privacy
Module Privacy in a Workflow a1 a2 • PrivateModules (no a priori knowledge to the user) • Module for AIDS detection • PublicModules (full knowledge to the user) • Sorting, reformatting modules Data Sharing m1 a5 a3 a4 m2 m3 n modules are connected as DAG a6 a7 • Private module f, input x, f(x) should not be revealed Provenance Views for Module Privacy
Module Privacy with Secure View • Privacy Definition: L-diversity [MGKV’ 06] • By hidingsome input/output attributes, eachx has L differentequivalentpossibilities for f(x) • Output viewiscalled a ‘Secure-view’ Differential privacy? [Dwork’ 06, DMNS’ 06, …] • (Usual) Random noise cannot be added • Scientific experiments must be repeatable • Any f should always map any x to the same f(x) Provenance Views for Module Privacy
Standalone Module Privacy • A view: Projection of R on visible attributes • Privacy parameter Γ (eg. Γ = 2) • Γ-standalone-private View: every input x can be mapped to Γ different outputs by the “possible worlds” • Possible World: Relation that agrees with R on visible attributes (and respects the functional dependency) Module f Relation R for f Functional dependency: x1, x2y x1 x2 y = (x1 x2) y = (x1 ≠ x2) y Provenance Views for Module Privacy
Workflow Module Privacy a1 a2 1. a1, a2 a3, a4, a5 2. a3, a4a6 3. a4, a5a7 • A view: Same as before • Γ-workflow-private view: privacy for each private module as before • Possible world: Relation that agrees with R on visible attributes (and respects ALLfunctional dependencies) Workflow W Relation R has n func. dependencies m1 a3 a4 a5 m2 m3 a6 a7 Provenance Views for Module Privacy
Secure-View Optimization Problem • Conflicting interests of Owner and User • Hiding each data/attribute has a cost Owner: Privacy User: Provenance • Secure-view problem:Minimize the sum of the cost of the hidden attributes while guaranteeing Γ-workflow-privacy of all private modules Provenance Views for Module Privacy
Let’s start with a Single Module How hard is the secure-view problem for a standalone module? V V (Visible attributes) PROBLEM-2 PROBLEM-1 V is safe? ORACLE A safe subset V* with minimum cost V is safe? PROBLEM-1 • Communication Complexity: (N), N = #rows in R • R is given explicitly • Computation Complexity: Co-NP-hard in k = #attributes of R • R is given succinctly PROBLEM-2 • Communication Complexity: 2(k) oracle calls are needed Provenance Views for Module Privacy
Any Upper Bound? • The trivial brute-force algorithm solves the problem in time O(2kN2) • k = #attributes of R, N = #rows of R • Can return ALL safe subsets: useful for the next step • Not so bad: • k is not too large for a single module • A module is reused in many workflows • Expert knowledge from the module designers can be used to speed up the process Provenance Views for Module Privacy
Moving on to General Workflows • Workflows have • Arbitrary data sharing, arbitrary (DAG) connection • Interactions between private and public modules • Trivial algorithms are not good • Leads to running time = exponential in n • We use the (list of) standalone safe subsets for private modules • First consider:Workflows with all private modules • Two Steps: • Show that, any combination of safe-subsets for standalone privacy is also safe for workflow privacy (Composability) • Find the minimum cost safe subset for workflow (Optimization) Provenance Views for Module Privacy
Composability • Key idea: When a module m is placed in a workflow, and the same attribute subset V is hidden, • #possible worlds shrinks but not #possible outputs of • the inputs • Proof involves showing existence of a possible world • “All-private workflow” assumption is necessary Provenance Views for Module Privacy
Optimally Combining Standalone Solutions • Any combination of safe subsets works • We want one with minimum cost • Solve the optimization problem for workflow given the list of options for each individual module • The simplest version (no data sharing) is NP-hard • In the paper: • Approximation and matching hardness results of different versions • Bounded data sharing has better approximation ratio Provenance Views for Module Privacy
Workflows with Public Modules • Public modules are difficult to handle • Composability does not work Solution: Privatize some public modules • Names of “privatized” modules are not revealed • Now composability works Privatization has an additional cost Worse approximation results f1(x) = y Private f2(y) = y Public Provenance Views for Module Privacy
Related Work • Workflow privacy (mainly access control) • Chebotko et. al. ’08, Gil et. al. ’07, ’10 • Secure provenance • Braun et. al. ’08, Hasan et. al. ’07, Lyle-Martin ’10 • Privacy-preserving data mining • Surveys by Aggarwal-Yu ’08, Verykios et. al. ’04 • Privacy in statistical databases • Survey by Dwork ’08 Provenance Views for Module Privacy
Conclusion and Future Work • This is a first step to handling module privacy in a network of modules • Future Directions: • Explore alternative notion of privacy/partial background knowledge • Explore alternative “privatization” techniques for public modules • Handle infinite/very large domains of attributes Provenance Views for Module Privacy
Thank You. Questions? Provenance Views for Module Privacy