1 / 30

On method-specific record linkage for risk assessment

On method-specific record linkage for risk assessment. Jordi Nin Javier Herranz Vicenç Torra. On method-specific record linkage for risk assessment Contents. Disclosure Risk Scenario: How an intruder re-identifies an individual Preliminaries : Protection methods and Record Linkage

colum
Download Presentation

On method-specific record linkage for risk assessment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra

  2. On method-specific record linkage for risk assessmentContents • Disclosure Risk Scenario: How an intruder re-identifies an individual • Preliminaries: Protection methods and Record Linkage • Location record linkage: A new way to compute the disclosure risk • Conclusions and future work:

  3. Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work

  4. On method-specific record linkage for risk assessment Disclosure Risk Scenario Attribute classification a Identifiers: Passport number n Quasi-Identifiers: Age, postal code Confidential: Income X

  5. X’ = X’nc || Xc On method-specific record linkage for risk assessment Disclosure Risk Scenario Re-identification scenario X = id || Xnc || Xc Privacy is ensured, quasi-identifiers are anonymized Data quality is preserved, confidential attributes are preserved

  6. Problem: Find a correct mapping between data file 1 and data file 2 On method-specific record linkage for risk assessment Disclosure Risk Scenario Record Linkage Data set 1 Data set 2 X1 X2 X3 X4 X’1 X’2 X’3 X’4 X1 X2 X3 X4 X’1 X’2 X’3 X’4 X’1 X’2 X’3 X’4 X1 X2 X3 X4

  7. On method-specific record linkage for risk assessment Disclosure Risk Scenario Distance based Record linkage Probabilistic Record linkage • The nearest pairs of record are considered as linked pairs • It is very easy to tune • Results very dependent of the parameters • Moderated time cost • Linked pairs are computed using conditional probabilities • Tuning is difficult • Few parameters • High time cost

  8. Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work

  9. On method-specific record linkage for risk assessment Preliminaries Rank swapping - p Algorithm For all attrj where 1 j  n Attrj is sorted all values xij are swapped with xil where i < l  l+p Sorting Attrj is reversed End for End algorithm Simple Preserve µ and  All combinations disappear

  10. On method-specific record linkage for risk assessment Preliminaries Rank swapping - p example p = 20% 1 2 3 4 5 6 7 8 9 10 8 6 10 7 9 2 1 4 5 3

  11. k=3 On method-specific record linkage for risk assessment Preliminaries Microaggregation - k a a a a k k k k a = 1  Optimal a > 1, NP-Hard  Heuristic

  12. x1 x2 k = 2 x4 x3 On method-specific record linkage for risk assessment Preliminaries Optimal univariate Microaggregation Result 1. When the elements are sorted according to an attribute, for any optimal partition, the elements in each cluster are contiguous (non overlapping clusters exist) Result 2. All clusters of any optimal partition have between k and 2k-1 elements. Clusters are built using the nodes of the shortest path algorithm

  13. On method-specific record linkage for risk assessment Preliminaries MDAV Microaggregation k=2 X X’ MDAV is multivariate heuristic microaggegation

  14. On method-specific record linkage for risk assessment Preliminaries Score: Protection method evaluation Score = 0.5 IL + 0.5 DR DR = 0.25 DLD+0.25 PLD+0.5 ID IL = 100(0.2 IL1+0.2 IL2+0.2 IL3+0.2 IL4+0.2 IL5) IL1 = mean of absolute error DLD = number of links using DBRL IL2 = mean variation of average PLD = number of links using PRL IL3 = mean variation of variance ID = protected values near orginal IL4 = mean variation of covariancie IL5 = mean variation of correlation

  15. Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work

  16. It is unnecessary to compare all the records On method-specific record linkage for risk assessment Location Problem Desciption L-RL: Location Record Linkage Standard record linkage compares all records Rank swapping, univariate microaggregation and other methods only use some original records to create the protected data set

  17. On method-specific record linkage for risk assessment Location record linkage Method Description Xext X’

  18. Distance 17 6 13 14 16 19 12 5 16 On method-specific record linkage for risk assessment Location record linkage Example: Rank swapping P=20%

  19. On method-specific record linkage for risk assessment Location record linkage Rank Swapping Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Rank swapping configurations: p = 2 … 20 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

  20. On method-specific record linkage for risk assessment Location record linkage L-RL: Rank Swapping Linkage Results

  21. On method-specific record linkage for risk assessment Location record linkage L-RL: Rank Swapping Score Results

  22. On method-specific record linkage for risk assessment Location record linkage Univariate Microaggregation Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Univariate microaggregation configurations: k = 10 … 50 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

  23. On method-specific record linkage for risk assessment Location record linkage L-RL: Univariate Microaggregation Linkage Results

  24. On method-specific record linkage for risk assessment Location record linkage L-RL: Univariate Microaggregation Score Results

  25. On method-specific record linkage for risk assessment Location record linkage MDAV Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Univariate microaggregation configurations: k = 10 … 50 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

  26. On method-specific record linkage for risk assessment Location record linkage L-RL: MDAV Linkage Results

  27. On method-specific record linkage for risk assessment Location record linkage L-RL: MDAV Score Results

  28. Disclosure Risk Scenario Preliminaries Location Problem Description Location Record Linkage Conclusions and future work

  29. On method-specific record linkage for risk assessment Conclusions and future work Conclusions • We have presented a new type of record linkage designed to exploit the limitations of some protection methods • L-RL method obtains a more accurate DR evaluation for rank swapping and univariate microaggregation • MDAV is immune to the location problem Future work • We plan to study the DR of MDAV and other protection methods using other ad-hoc methods

  30. On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra

More Related