10 likes | 90 Views
2D representation of the original and generalized table. Name. Age. Start-year. Salary. Alice. 25. 2001. 7k. Bob. 30. 2004. 1k. Christina. 35. 1990. 2k. Complexity and Approximation Ratio d : dimensionality n : the size of dataset. Daniel. 40. 1995. 3k. Emily.
E N D
2D representation of the original and generalized table. Name Age Start-year Salary Alice 25 2001 7k Bob 30 2004 1k Christina 35 1990 2k Complexity and Approximation Ratio d: dimensionality n: the size of dataset Daniel 40 1995 3k Emily 45 2000 6k William 55 1985 3k Algorithm Time Complexity Approximation Ratio The original payroll table DAG O(3ddnlog2n) 8d MMG O(dn2d+1) 2d+1 Age Start-year Salary NNG O(dn2) 6d [25, 45] [2000, 2004] 7k [25, 45] [2000, 2004] 1k [35, 55] [1985, 1995] 2k [35, 55] [1985, 1995] 3k [25, 45] [2000, 2004] 6k [35, 55] [1985, 1995] 3k A 3-anonymous generalization The Institute for Information AssuranceOn Multidimensional k-Anonymity with Local Recoding GeneralizationPresented by: Yang DuCollege of Computer and Information ScienceNortheastern University, Boston, MA 02115duy@ccs.neu.edu • Overview • Study an important privacy preserving method, namely k-anonymity • Show it is provably hard, even to find a good enough approximate answer • Develop three algorithms with different tradeoffs between the approximation ratio and complexity • Introduction • Motivation is privacy preserving • Publish sensitive data to allow accurate analysis without revealing the privacy • Simply removing the id column is not enough • Attackers can use some other attributions, called quasi-identifiers, to restore the identities • Generalization is necessarily • The quasi-identifiers are replaced by values in more general forms • K-anonymity is often a requirement • Make the quasi-identifiers of each tuple undistinguishable with at least those of other (k-1) tuples • Approximation Algorithms • The Divide-and-Group (DAG) Algorithm • Divide the space into square cells with proper size • Find a set of non-overlapping tiles of 2 x 2 cells to cover the points, such that each tile covers at least k points • Assign the rest of (uncovered) points to the nearest tile • Problem Mapping • Given a table R containing d quasi-identifier attributes • Map each quasi-identifier attribute to one dimension • Map each tuple in the table to a point in d-dimensional space • Map the k-anonymous generalization problem to a partition problem • Partition a set of d-dimensional points into some groups • Each point belongs to one and only one group • Each group contains at least k points • Each point is generalized to the minimum bounding rectangle (MBR) of its group • Quality Measuring • The smaller the MBRs are, the more accurate the analysis results are. • The size of each MBR is measured by its perimeter. • Objective • Find the optimal partition that minimizes the maximum size (perimeter) among all MBRs. • The Min-MBR-Group (MMG) Algorithm • For each point p, find the smallest MBR which covers at least k points including p • Find a set of non-overlapping MBRs from the result of previous step • Assign the points to the nearest MBR • The Nearest-MBR-Group (NNG) Algorithm • For each point p, find the MBR which covers p and its k-1 nearest neighbors • Find a set of non-overlapping MBRs from the result of previous step • Assign the points to the nearest MBR • Hardness of the Problem • Finding the optimal partition is NP-hard (cannot be done within polynomial time). • Finding a partition with approximation ratio less than 5/4, i.e. the maximum perimeter is 5/4 of the maximum perimeter of the optimal partition, is also NP-hard. • For more information: • http://www.ccs.neu.edu/research/dblab • Prof. Donghui Zhang – donghui@ccs.neu.edu • Yang Du – duy@ccs.neu.edu