140 likes | 166 Views
ESSnet DI WP2: Record Linkage. Luca Valentino Istat. Task 2.1: Record linkage – a practical problem. The problem, already illustrated in the Den Haag meeting, is to link these registers: P4 (New born inclusion in the residents administrative register)
E N D
ESSnet DIWP2: Record Linkage Luca Valentino Istat
Task 2.1: Record linkage – a practical problem The problem, already illustrated in the Den Haag meeting, is to link these registers: • P4 (New born inclusion in the residents administrative register) • CEDAP (survey on the assistance certificates in the childbirth moment) In one data set there are the characteristics of the new born as weight, type of birth, how many brothers he/she has, week of birth, while the other data set contains data on the characteristics of the household as marital status, nationality, education of the parents
Harmonization of populations Common units: Alive newborns New borns in Italy Italian residents Italian newborns in other countries Non residents P4 CEDAP Molise and Calabria Dead newborns
Common variables The linkage was performed on the month of March 2005 Exclusion of the non eligible newborns lead to the following file sizes: The common variables available for linkage are:
Different approaches The objective was the application of different approaches and comparing result: • deterministic record linkage approach: rule defined from survey experts • probabilistic record linkage approach : based on Fellegi – Sunter method (Relais) • Liseo and Tancredi approach : to be done
Deterministic approach This approach is based on deterministic rule defined by survey experts (equivalence on all the common variables or on all but one common variable). This rule is performed by SAS procedures ad hoc and results are considered very reliable (declared matches are considered actual matches) The number of declared matches in this case is 32’595
Probabilistic approach Search space reduction by blocking in variables: Newborn’s birthdate – Newborn’s gender Matching variables: Reduction to 1:1 solution
Probabilistic approach Before applying the EM algorithm and the 1:1 reduction, the probabilistic approach finds a set of pairs with a probability to be a match (P_POST) The final result depends on the choice of match threshold that depends on the quality required for the linkage In this case, high precision is required (in order to prevent as much as possible false matches). Hence the match threshold is fixed at 0.9 The number of declared matches is 36’562
Comparison between approaches The comparison between the deterministic approach (or Expert’s rule) and probabilistic approach (or Relais) shows a strong congruence The pairs declared as matches for both approaches are 31’931 87% of matches according to Relais are matches also for the expert’s rule 98% of matches according to the expert’s rule are matches also for Relais
Comparison by clerical review An assessment of the quality of the linkage procedures can be performed through an evaluation of samples of pairs to be carefully evaluated by clerical review The clerical review consists in the analysis of all common variables of two records: • If there are minimal differences between those variables that do not coincide, including when these variables are missing, the pair is classified as a true link • otherwise the pair is classified as a false link x v v
Comparison by clerical review A - common matches B - common matches consisting of twins C - pairs declared as matches only by the expert’s rule D - pairs declared as matches only by Relais similar on at least half of the variables not used in the probabilistic linkage procedure E - other pairs declared as matches by Relais
Comparison by clerical review F - pairs in the Relais solution but p_post value is below the match threshold G - pairs that coincide in at least one of the most significant variables
Comparison by clerical review The results obtained on the checked samples give the following false match and false non match rates: Deterministic approach (Expert’s rule) Probabilistic approach (Relais)