1 / 13

ESSnet DI WP2: Record Linkage

ESSnet DI WP2: Record Linkage. Luca Valentino Istat. Task 2.1: Record linkage – a practical problem. The problem, already illustrated in the Den Haag meeting, is to link these registers: P4 (New born inclusion in the residents administrative register)

thomascox
Download Presentation

ESSnet DI WP2: Record Linkage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ESSnet DIWP2: Record Linkage Luca Valentino Istat

  2. Task 2.1: Record linkage – a practical problem The problem, already illustrated in the Den Haag meeting, is to link these registers: • P4 (New born inclusion in the residents administrative register) • CEDAP (survey on the assistance certificates in the childbirth moment) In one data set there are the characteristics of the new born as weight, type of birth, how many brothers he/she has, week of birth, while the other data set contains data on the characteristics of the household as marital status, nationality, education of the parents

  3. Harmonization of populations Common units: Alive newborns New borns in Italy Italian residents Italian newborns in other countries Non residents P4 CEDAP Molise and Calabria Dead newborns

  4. Common variables The linkage was performed on the month of March 2005 Exclusion of the non eligible newborns lead to the following file sizes: The common variables available for linkage are:

  5. Different approaches The objective was the application of different approaches and comparing result: • deterministic record linkage approach: rule defined from survey experts • probabilistic record linkage approach : based on Fellegi – Sunter method (Relais) • Liseo and Tancredi approach : to be done

  6. Deterministic approach This approach is based on deterministic rule defined by survey experts (equivalence on all the common variables or on all but one common variable). This rule is performed by SAS procedures ad hoc and results are considered very reliable (declared matches are considered actual matches) The number of declared matches in this case is 32’595

  7. Probabilistic approach Search space reduction by blocking in variables: Newborn’s birthdate – Newborn’s gender Matching variables: Reduction to 1:1 solution

  8. Probabilistic approach Before applying the EM algorithm and the 1:1 reduction, the probabilistic approach finds a set of pairs with a probability to be a match (P_POST) The final result depends on the choice of match threshold that depends on the quality required for the linkage In this case, high precision is required (in order to prevent as much as possible false matches). Hence the match threshold is fixed at 0.9 The number of declared matches is 36’562

  9. Comparison between approaches The comparison between the deterministic approach (or Expert’s rule) and probabilistic approach (or Relais) shows a strong congruence The pairs declared as matches for both approaches are 31’931 87% of matches according to Relais are matches also for the expert’s rule 98% of matches according to the expert’s rule are matches also for Relais

  10. Comparison by clerical review An assessment of the quality of the linkage procedures can be performed through an evaluation of samples of pairs to be carefully evaluated by clerical review The clerical review consists in the analysis of all common variables of two records: • If there are minimal differences between those variables that do not coincide, including when these variables are missing, the pair is classified as a true link • otherwise the pair is classified as a false link x v v

  11. Comparison by clerical review A - common matches B - common matches consisting of twins C - pairs declared as matches only by the expert’s rule D - pairs declared as matches only by Relais similar on at least half of the variables not used in the probabilistic linkage procedure E - other pairs declared as matches by Relais

  12. Comparison by clerical review F - pairs in the Relais solution but p_post value is below the match threshold G - pairs that coincide in at least one of the most significant variables

  13. Comparison by clerical review The results obtained on the checked samples give the following false match and false non match rates: Deterministic approach (Expert’s rule) Probabilistic approach (Relais)

More Related