1 / 22

A Propagation Model for Provenance Views of Public/Private Workflows

A Propagation Model for Provenance Views of Public/Private Workflows. Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington. A Propagation Model for Provenance Views of Public/Private Workflows. Susan Davidson U. of Pennsylvania

najila
Download Presentation

A Propagation Model for Provenance Views of Public/Private Workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington ICDT 2013

  2. A Propagation Modelfor Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington ICDT 2013

  3. Workflows start Split Entries Align Sequences Vertices = Modules/Processes Format Functional Data Edges = Dataflow d1 Curate Annotations d3 <x1, x2, x3> • Visual representation of a number of processes that interact to produce one or more outputs given some inputs • Modeled as a directed acyclic graph • In an executionof the workflow, data values appear on the edges <z1> Format d2 Format <y1, y2> Construct Trees end ICDT 2013

  4. Data Provenance in Workflows start d1 Provenance d3 Split Entries d2 Align Sequences ? Format Functional Data Curate Annotations • TrackProvenance: • Record and show all data values in all executions • Helps validate the experiment • Ensures repeatability and debugging How has this tree been generated? Format Which processes were executed? Format Construct Trees But, many private/proprietary elements … Our focus: Module Privacy end ICDT 2013

  5. Motivation: Module Privacy • Goal: • Partially hide provenance • to protect the privacy of modules • when they belong to a workflow d1 d2 d3 <x1, x2, x3> <y1, y2> <z1> Revealing all data as provenance in an execution can reveal module behavior ICDT 2013

  6. Public/Private Workflows Private Private Modules (no a priori knowledge to the user) e.g.Modules for gene sequencing, drug synthesis, etc. Public Modules (full knowledge to the user) e.g. Modules for reformatting, sorting, display, etc. Public Private Public Reformatting ICDT 2013

  7. Definition: Module Privacy Module f takes input x, produces output y = f(x) x3 x2 x4 x1 Module f f(x1,x2, x3, x4) = <y1,y2, y3> y3 y2 y1 • Given privacy requirementL, • for all inputs x to a private module f, • f(x) has ≥ L ‘equivalent’ candidate values w.r.t. visible provenance data • (similar to L-diversity [MKGV’07]) ICDT 2013

  8. ‘Equivalent Candidates’ and Provenance Views Module executions as relations with func. dep. Workflow Func. dep. x  y x Run 1 = Run 2 y Possible worlds y x  y y  z Run 1 Run 2 Func. dep. y  z z Not a possible world • Output a provenance view (incomplete provenance): Projection on visible attributes • Possible Worlds:Same projection and respect the functional dependency • Standalone-private View: Each input maps to L=2 different outputs by possible worlds • Workflow-private View:Possible worlds should respect all func. dep ICDT 2013

  9. Previous Work: Module Privacy for Workflow Provenance[Davidson–Khanna-Milo-Panigrahi-R. : PODS’11] • “Composability Theorem” • if all modules are private (no public modules) • Any combination of standalone-private-views • gives • workflow-private-views for all of them No public modules x Hiding union of hidden attributes in standalone solutions y z ICDT 2013

  10. Why care about composability? • Compose local standalone-private solutions arbitrarily to get a global workflow-private solution (which is hard) • Local solutions are NP-hard too, but in the #attributes of a single module – smaller than all attributes in a workflow • We can do preprocessing, or exploit module designers’ knowledge • But composability fails with public modules common in workflows ICDT 2013

  11. Problem with Public Modules 0 Composability theorem does not hold any more Private 1 Our solution in [DKMPR ’11]: “Privatize” some public modules Public = 1 Does not work when module’s identity can be guessed from attribute names, connections etc. • This work: Propagate hiding through public modules ICDT 2013

  12. This paper: A Propagation Model • Find standalone-private solution for private modules (only outputs are hidden, hiding inputs may not work in public/private workflows) • In a workflow, propagate hiding attributes through public successors • Repeatedly propagate hiding • Can we stop at a private successor? • Yes: For single-predecessor workflows • No: For general workflows = = ICDT 2013

  13. Single-Predecessor Workflows • (Intuitively) Every public module has at most one private predecessor • Still can have complex structure • Special cases: Chains/Trees • Propagate hiding in “public closure” (reachable through undirected public path from a hidden output attribute) • Next, how much to hide ICDT 2013

  14. Upstream/Downstream Safety for Public Modules • Visible attributes of public modules should not reveal any information • Upstream/Downstream-safe (UD-safe): Equivalent inputs  Equivalent outputs Equivalent outputs  (all) Equivalent inputs Hiding everything is trivially UD-safe a1 a2 a3 a4 Inputs Outputs UD-safe Not UD-safe ICDT 2013

  15. Composability Theorem forSingle-Predecessor Workflows Theorem: Each private module is workflow-private if the hidden attributes satisfy … The private module is standalone-private 2. Public modules in public-closure are UD-safe 3. No unnecessary hiding Two levels of composability Inside public closure for a given private module Among different private modules Single-pred wf, UD-safety are necessary ICDT 2013

  16. Optimal Composition for Single-Predecessor workflows Theorem: Each private module is workflow-private if the hidden attributes satisfy … Find list of standalone-private solutions for private modules The private module is standalone-private Find list of UD-safe solutions for public modules 2. Public modules in public-closure are UD-safe 3. No unnecessary hiding Optimally compose to find solution for a single private module • NP-hard for general DAG • PTIME for trees/chains Arbitrarily compose to find solution for all private modules Easy for single-pred wfs ICDT 2013

  17. Proof Sketch of Composability Theorem - 1 Analysis for a single-private module is sufficient: Public closures are disjoint Step 1: Assume only one composite module in public closure If individual modules are UD-safe, the composite module is also UD-safe (by induction) ICDT 2013

  18. Proof Sketch of Composability Theorem - 2 x • Step 2: • Standalone to Workflow Privacy • Privacy  Many candidates for f(x) • If y is a candidate of f(x) when f is standalone, • y is still a candidate when f is in a workflow • Show existence of possible worlds • by redefining private modules f z y g h • Need to handle new conflicts at other inputs/outputs • Cannot redefine public modules: UD-safety helps • More complex structure in general Expected Observed Conflict No conflict ICDT 2013

  19. About General Workflows • Find standalone-private solution for private modules (only outputs are hidden, hiding inputs may not work with public modules) • In a workflow, propagate hiding attributes through public successors • Repeatedly propagate hiding • Can we stop at a private successor? • Yes: For single-predecessor workflows • No: For general workflows • Solution: propagate through private successors as well = = = = ICDT 2013

  20. Related Work • Workflow privacy (mainly access control) • Chebotko et. al. ’08, Gil et. al. ’07, ’10 • Secure Provenance • Tan et. al. ’06, Hasan et. al. ’07, Braun et. al. ’08, Ni et. al. ’09, Chong ’09, Cadenhead et. al. ’11, Cheney ’11 • L-Diversity and its limitations • Machanavajjhala et. al. ’06, Ganta et. al. ’08, Kifer ’09, Fang et. al. ’08, Cormode et. al. ’11, Xiao et. al. ’10, Wong et. al. ’07 • Privacy-preserving data mining • Surveys by Aggarwal-Yu ’08, Verykios et. al. ’04 • Differential Privacy/Privacy in statistical databases • Survey by Dwork ’08 ICDT 2013

  21. Conclusions • Workflow-Privacy of modules by data hiding in public/private wfs • Propagating hiding through public modules • Composability Theorem and Optimization Problems Future Work: • Extend to stronger notion of privacy • Differential Privacy? – Randomization may not work for Sc. Expts. • Can our possible world model be useful? • Applicability in practice ICDT 2013

  22. Thank You Questions ICDT 2013

More Related