300 likes | 452 Views
On method-specific record linkage for risk assessment. Jordi Nin Javier Herranz Vicenç Torra. On method-specific record linkage for risk assessment Contents. Disclosure Risk Scenario: How an intruder re-identifies an individual Preliminaries : Protection methods and Record Linkage
E N D
On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra
On method-specific record linkage for risk assessmentContents • Disclosure Risk Scenario: How an intruder re-identifies an individual • Preliminaries: Protection methods and Record Linkage • Location record linkage: A new way to compute the disclosure risk • Conclusions and future work:
Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work
On method-specific record linkage for risk assessment Disclosure Risk Scenario Attribute classification a Identifiers: Passport number n Quasi-Identifiers: Age, postal code Confidential: Income X
X’ = X’nc || Xc On method-specific record linkage for risk assessment Disclosure Risk Scenario Re-identification scenario X = id || Xnc || Xc Privacy is ensured, quasi-identifiers are anonymized Data quality is preserved, confidential attributes are preserved
Problem: Find a correct mapping between data file 1 and data file 2 On method-specific record linkage for risk assessment Disclosure Risk Scenario Record Linkage Data set 1 Data set 2 X1 X2 X3 X4 X’1 X’2 X’3 X’4 X1 X2 X3 X4 X’1 X’2 X’3 X’4 X’1 X’2 X’3 X’4 X1 X2 X3 X4
On method-specific record linkage for risk assessment Disclosure Risk Scenario Distance based Record linkage Probabilistic Record linkage • The nearest pairs of record are considered as linked pairs • It is very easy to tune • Results very dependent of the parameters • Moderated time cost • Linked pairs are computed using conditional probabilities • Tuning is difficult • Few parameters • High time cost
Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work
On method-specific record linkage for risk assessment Preliminaries Rank swapping - p Algorithm For all attrj where 1 j n Attrj is sorted all values xij are swapped with xil where i < l l+p Sorting Attrj is reversed End for End algorithm Simple Preserve µ and All combinations disappear
On method-specific record linkage for risk assessment Preliminaries Rank swapping - p example p = 20% 1 2 3 4 5 6 7 8 9 10 8 6 10 7 9 2 1 4 5 3
k=3 On method-specific record linkage for risk assessment Preliminaries Microaggregation - k a a a a k k k k a = 1 Optimal a > 1, NP-Hard Heuristic
x1 x2 k = 2 x4 x3 On method-specific record linkage for risk assessment Preliminaries Optimal univariate Microaggregation Result 1. When the elements are sorted according to an attribute, for any optimal partition, the elements in each cluster are contiguous (non overlapping clusters exist) Result 2. All clusters of any optimal partition have between k and 2k-1 elements. Clusters are built using the nodes of the shortest path algorithm
On method-specific record linkage for risk assessment Preliminaries MDAV Microaggregation k=2 X X’ MDAV is multivariate heuristic microaggegation
On method-specific record linkage for risk assessment Preliminaries Score: Protection method evaluation Score = 0.5 IL + 0.5 DR DR = 0.25 DLD+0.25 PLD+0.5 ID IL = 100(0.2 IL1+0.2 IL2+0.2 IL3+0.2 IL4+0.2 IL5) IL1 = mean of absolute error DLD = number of links using DBRL IL2 = mean variation of average PLD = number of links using PRL IL3 = mean variation of variance ID = protected values near orginal IL4 = mean variation of covariancie IL5 = mean variation of correlation
Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work
It is unnecessary to compare all the records On method-specific record linkage for risk assessment Location Problem Desciption L-RL: Location Record Linkage Standard record linkage compares all records Rank swapping, univariate microaggregation and other methods only use some original records to create the protected data set
On method-specific record linkage for risk assessment Location record linkage Method Description Xext X’
Distance 17 6 13 14 16 19 12 5 16 On method-specific record linkage for risk assessment Location record linkage Example: Rank swapping P=20%
On method-specific record linkage for risk assessment Location record linkage Rank Swapping Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Rank swapping configurations: p = 2 … 20 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
On method-specific record linkage for risk assessment Location record linkage L-RL: Rank Swapping Linkage Results
On method-specific record linkage for risk assessment Location record linkage L-RL: Rank Swapping Score Results
On method-specific record linkage for risk assessment Location record linkage Univariate Microaggregation Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Univariate microaggregation configurations: k = 10 … 50 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
On method-specific record linkage for risk assessment Location record linkage L-RL: Univariate Microaggregation Linkage Results
On method-specific record linkage for risk assessment Location record linkage L-RL: Univariate Microaggregation Score Results
On method-specific record linkage for risk assessment Location record linkage MDAV Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Univariate microaggregation configurations: k = 10 … 50 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
On method-specific record linkage for risk assessment Location record linkage L-RL: MDAV Linkage Results
On method-specific record linkage for risk assessment Location record linkage L-RL: MDAV Score Results
Disclosure Risk Scenario Preliminaries Location Problem Description Location Record Linkage Conclusions and future work
On method-specific record linkage for risk assessment Conclusions and future work Conclusions • We have presented a new type of record linkage designed to exploit the limitations of some protection methods • L-RL method obtains a more accurate DR evaluation for rank swapping and univariate microaggregation • MDAV is immune to the location problem Future work • We plan to study the DR of MDAV and other protection methods using other ad-hoc methods
On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra