100 likes | 265 Views
Privacy-Preserving Schema Matching Using Mutual Information. Isabel F. Cruz University of Illinois at Chicago Roberto Tamassia Danfeng Yao Brown University. Supported in part by the National Science Foundation under ITR awards IIS–0326284, IIS–0324846, and IIS–0513553.
E N D
Privacy-Preserving Schema Matching Using Mutual Information Isabel F. Cruz University of Illinois at Chicago Roberto Tamassia Danfeng Yao Brown University Supported in part by the National Science Foundation under ITR awards IIS–0326284, IIS–0324846, and IIS–0513553 DBSec 2007, Redondo Beach, CA
Heterogeneous databases Query: join patients’ records in medical database A, B, and C DBSec 2007, Redondo Beach, CA
The need for schema matching Database A Database B How to find out the correspondence of attribute names in A and B ? DBSec 2007, Redondo Beach, CA
Privacy in schema matching • Data interoperability requires schema matching • However, data owners may consider schema sensitive • Need to develop privacy-preserving schema matching methods • Related work: • Privacy-preserving data sharing [Clifton Kantarcioglu Doan Schadow Vaidya Elmagarmid Suciu 04] • Privacy-preserving ontology matching [Mitra Liu Pan 05] • Privacy-preserving access control to heterogeneous databases [Mitra Liu Pan Atluri06] • Privacy-preserving schema and data matching [Scannapieco Figotin Bertino Elmagarmid 07] DBSec 2007, Redondo Beach, CA
Key observation 1: same attributes have similar data distributions and correlate similarly to other attributes Probability distribution of attributes (e.g., heart rate, age, height, blood pressure) Mutual information (MI) captures the correlation of attributes (e.g., age and blood pressure) Key observation 2: we reduce private schema matching to 2-party private set intersection Only intersected elements are returned and nothing else Our approach for private schema matching View self and mutual information (MI) values of each schema as sets of numbers Match the MIs of two schemas using private set intersection Overview of our approach DBSec 2007, Redondo Beach, CA
Patient type 1.5 1.0 1.5 2.0 1.0 Heart rate Blood type 1.0 Node A 1.5 1.0 1.5 Node B Node C 2.0 1.0 1.0 Building block: pair-wise mutual information (MI) Assume that schemas are not private info [Kang Naughton 03] 1. Party A with schema A computes MI and constructs a graph 2. Party B with schema B constructs its graph 3. Both parties then find a correspondence of the two graphs DBSec 2007, Redondo Beach, CA
Building block: Privacy-preserving set intersection • An efficient protocol based on homomorphic encryption was proposed by [Freedman Nissim Pinkas 04] Interactive protocol 3, 6, 15, 20, 88 3, 7, 17, 20, 80 Output 3, 20 Alice and Bob only learn the intersected elements Secure against malicious adversaries DBSec 2007, Redondo Beach, CA
(1.5, 1.5, 1.0) MI set (1.0, 1.0, 1.0) Our approach: Privacy-preserving schema mapping • Two players: A and B, each with a private schema • A and B compute MI of schema attributes and graphs, respectively • Each attribute has a MI set: attribute entropy and pair-wise MI • A and B sort the entropies (self MIs) of attributes, respectively • For each attribute, A and B carry our private set intersection • If attributes match, then set intersection returns the entire MI set Patient type A 1.5 1.0 1.5 Blood type 2.0 1.0 Heart rate 1.0 DBSec 2007, Redondo Beach, CA
Properties • Support of three types of schema mappings • One-to-one, onto, partial mappings • Security property • Basic metod secure against semi-honest adversaries • Advanced method (uses zero knowledge) secure against malicious adversaries • Complexity property • Assuming entropies of distinct attributes are different, we perform a linear (proportional to the number of attributes of A and B) number of privacy-preserving set intersections Partial Onto One-to-one DBSec 2007, Redondo Beach, CA
Theorems • Theorem 1 (Security): Assuming the existence of a private set intersection protocol against malicious adversaries, our privacy-preserving schema matching protocol for one-to-one, onto, and partial mappings is secure against malicious adversaries. • Definition: The multiplicity value mi of element ai in a list L with l elements and k distinct elements is the number of times element ai (1 ≤ i ≤ k) appears in L. The multiplicity sequence of L is (m1, m2, …, mk) where m1+ m2 + … + mk = l • Theorem 2 (Complexity): Consider a schema A with m attributes and a schema B with n attributes. Let (m1, m2, …, mk) be the multiplicity sequence of the entropy list of A and let (n1, n2, …, nk) be the multiplicity sequence of the entropy list of B by removing the elements not present in the entropy list of A. We have that the number of set intersections performed in our privacy-preserving schema matching protocol is at most: k mi ni i=1 DBSec 2007, Redondo Beach, CA