220 likes | 418 Views
A Propagation Model for Provenance Views of Public/Private Workflows. Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington. A Propagation Model for Provenance Views of Public/Private Workflows. Susan Davidson U. of Pennsylvania
E N D
A Propagation Model for Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington ICDT 2013
A Propagation Modelfor Provenance Views of Public/Private Workflows Susan Davidson U. of Pennsylvania Tova Milo Tel Aviv U. Sudeepa Roy U. of Washington ICDT 2013
Workflows start Split Entries Align Sequences Vertices = Modules/Processes Format Functional Data Edges = Dataflow d1 Curate Annotations d3 <x1, x2, x3> • Visual representation of a number of processes that interact to produce one or more outputs given some inputs • Modeled as a directed acyclic graph • In an executionof the workflow, data values appear on the edges <z1> Format d2 Format <y1, y2> Construct Trees end ICDT 2013
Data Provenance in Workflows start d1 Provenance d3 Split Entries d2 Align Sequences ? Format Functional Data Curate Annotations • TrackProvenance: • Record and show all data values in all executions • Helps validate the experiment • Ensures repeatability and debugging How has this tree been generated? Format Which processes were executed? Format Construct Trees But, many private/proprietary elements … Our focus: Module Privacy end ICDT 2013
Motivation: Module Privacy • Goal: • Partially hide provenance • to protect the privacy of modules • when they belong to a workflow d1 d2 d3 <x1, x2, x3> <y1, y2> <z1> Revealing all data as provenance in an execution can reveal module behavior ICDT 2013
Public/Private Workflows Private Private Modules (no a priori knowledge to the user) e.g.Modules for gene sequencing, drug synthesis, etc. Public Modules (full knowledge to the user) e.g. Modules for reformatting, sorting, display, etc. Public Private Public Reformatting ICDT 2013
Definition: Module Privacy Module f takes input x, produces output y = f(x) x3 x2 x4 x1 Module f f(x1,x2, x3, x4) = <y1,y2, y3> y3 y2 y1 • Given privacy requirementL, • for all inputs x to a private module f, • f(x) has ≥ L ‘equivalent’ candidate values w.r.t. visible provenance data • (similar to L-diversity [MKGV’07]) ICDT 2013
‘Equivalent Candidates’ and Provenance Views Module executions as relations with func. dep. Workflow Func. dep. x y x Run 1 = Run 2 y Possible worlds y x y y z Run 1 Run 2 Func. dep. y z z Not a possible world • Output a provenance view (incomplete provenance): Projection on visible attributes • Possible Worlds:Same projection and respect the functional dependency • Standalone-private View: Each input maps to L=2 different outputs by possible worlds • Workflow-private View:Possible worlds should respect all func. dep ICDT 2013
Previous Work: Module Privacy for Workflow Provenance[Davidson–Khanna-Milo-Panigrahi-R. : PODS’11] • “Composability Theorem” • if all modules are private (no public modules) • Any combination of standalone-private-views • gives • workflow-private-views for all of them No public modules x Hiding union of hidden attributes in standalone solutions y z ICDT 2013
Why care about composability? • Compose local standalone-private solutions arbitrarily to get a global workflow-private solution (which is hard) • Local solutions are NP-hard too, but in the #attributes of a single module – smaller than all attributes in a workflow • We can do preprocessing, or exploit module designers’ knowledge • But composability fails with public modules common in workflows ICDT 2013
Problem with Public Modules 0 Composability theorem does not hold any more Private 1 Our solution in [DKMPR ’11]: “Privatize” some public modules Public = 1 Does not work when module’s identity can be guessed from attribute names, connections etc. • This work: Propagate hiding through public modules ICDT 2013
This paper: A Propagation Model • Find standalone-private solution for private modules (only outputs are hidden, hiding inputs may not work in public/private workflows) • In a workflow, propagate hiding attributes through public successors • Repeatedly propagate hiding • Can we stop at a private successor? • Yes: For single-predecessor workflows • No: For general workflows = = ICDT 2013
Single-Predecessor Workflows • (Intuitively) Every public module has at most one private predecessor • Still can have complex structure • Special cases: Chains/Trees • Propagate hiding in “public closure” (reachable through undirected public path from a hidden output attribute) • Next, how much to hide ICDT 2013
Upstream/Downstream Safety for Public Modules • Visible attributes of public modules should not reveal any information • Upstream/Downstream-safe (UD-safe): Equivalent inputs Equivalent outputs Equivalent outputs (all) Equivalent inputs Hiding everything is trivially UD-safe a1 a2 a3 a4 Inputs Outputs UD-safe Not UD-safe ICDT 2013
Composability Theorem forSingle-Predecessor Workflows Theorem: Each private module is workflow-private if the hidden attributes satisfy … The private module is standalone-private 2. Public modules in public-closure are UD-safe 3. No unnecessary hiding Two levels of composability Inside public closure for a given private module Among different private modules Single-pred wf, UD-safety are necessary ICDT 2013
Optimal Composition for Single-Predecessor workflows Theorem: Each private module is workflow-private if the hidden attributes satisfy … Find list of standalone-private solutions for private modules The private module is standalone-private Find list of UD-safe solutions for public modules 2. Public modules in public-closure are UD-safe 3. No unnecessary hiding Optimally compose to find solution for a single private module • NP-hard for general DAG • PTIME for trees/chains Arbitrarily compose to find solution for all private modules Easy for single-pred wfs ICDT 2013
Proof Sketch of Composability Theorem - 1 Analysis for a single-private module is sufficient: Public closures are disjoint Step 1: Assume only one composite module in public closure If individual modules are UD-safe, the composite module is also UD-safe (by induction) ICDT 2013
Proof Sketch of Composability Theorem - 2 x • Step 2: • Standalone to Workflow Privacy • Privacy Many candidates for f(x) • If y is a candidate of f(x) when f is standalone, • y is still a candidate when f is in a workflow • Show existence of possible worlds • by redefining private modules f z y g h • Need to handle new conflicts at other inputs/outputs • Cannot redefine public modules: UD-safety helps • More complex structure in general Expected Observed Conflict No conflict ICDT 2013
About General Workflows • Find standalone-private solution for private modules (only outputs are hidden, hiding inputs may not work with public modules) • In a workflow, propagate hiding attributes through public successors • Repeatedly propagate hiding • Can we stop at a private successor? • Yes: For single-predecessor workflows • No: For general workflows • Solution: propagate through private successors as well = = = = ICDT 2013
Related Work • Workflow privacy (mainly access control) • Chebotko et. al. ’08, Gil et. al. ’07, ’10 • Secure Provenance • Tan et. al. ’06, Hasan et. al. ’07, Braun et. al. ’08, Ni et. al. ’09, Chong ’09, Cadenhead et. al. ’11, Cheney ’11 • L-Diversity and its limitations • Machanavajjhala et. al. ’06, Ganta et. al. ’08, Kifer ’09, Fang et. al. ’08, Cormode et. al. ’11, Xiao et. al. ’10, Wong et. al. ’07 • Privacy-preserving data mining • Surveys by Aggarwal-Yu ’08, Verykios et. al. ’04 • Differential Privacy/Privacy in statistical databases • Survey by Dwork ’08 ICDT 2013
Conclusions • Workflow-Privacy of modules by data hiding in public/private wfs • Propagating hiding through public modules • Composability Theorem and Optimization Problems Future Work: • Extend to stronger notion of privacy • Differential Privacy? – Randomization may not work for Sc. Expts. • Can our possible world model be useful? • Applicability in practice ICDT 2013
Thank You Questions ICDT 2013