1 / 9

SHARELIFE Meeting Vienna – November, 5-6 The Italian experience in SHARE data cleaning

SHARELIFE Meeting Vienna – November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar. Omar Paccagnella SHARELIFE meeting November 6, 2007. In general …. Topics of this presentation are based ONLY on the Italian experience

lrivas
Download Presentation

SHARELIFE Meeting Vienna – November, 5-6 The Italian experience in SHARE data cleaning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SHARELIFE Meeting Vienna – November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6, 2007

  2. In general … Topics of this presentation are based ONLY on the Italian experience of data cleaning in the 2 waves of SHARE Checks are divided in 4 groups Omar Paccagnella SHARELIFE meeting November 6, 2007

  3. In general … • No general rules, every household is a different story • Use remarks ! • Strong cooperation with survey agency (interviewers…) • Check data that can be compared with information available from other sources (e.g. gross sample, booklet, other administrative data?) • Be conservative ! Omar Paccagnella SHARELIFE meeting November 6, 2007

  4. Group 1: demographic info - id matching Matching within wave : • Gender and year of birth must be the same in CV, DN & XT sections and drop-off. • At least one household member must have the same gender and year of birth of the selected individual (gross sample information). • Check mixing up of respondents in a hh, e.g. in a couple the interview to the husband was done in the SMS row of the wife. Omar Paccagnella SHARELIFE meeting November 6, 2007

  5. Group 1: demographic info - id matching Matching between waves : • Gender and year of birth must be the same in CV, DN & XT sections and drop-off of both waves. • In case of mixing up of respondents in a hh, check whether the error was made linking the respondents (baseline vs longitudinal interview) or selecting the wrong individual row in the SMS (preload info). • Household composition (eligible & non-eligible individuals who moved in, moved out or died between waves). Omar Paccagnella SHARELIFE meeting November 6, 2007

  6. Group 2 : Amounts In all questions where an amount is asked : • Check too large values: typing errors? Pre-Euro currency? • Check too small values: could be? Typing errors? • Zero values (financial questions): a way to avoid UBs? A way to consider it a very small value? • Comparing the distribution of that variable with distributions from other sources: important differences? Could be any problems in the text of the question? • Results by interviewer Omar Paccagnella SHARELIFE meeting November 6, 2007

  7. Group 3 : Physical and cognitive test results For all tests whose results were reported in the booklet : • Check too large values: typing errors? • Check the value of 1 in the “Ten words recall test”: total numbers of words recorded instead of the cited words. • Results by interviewer (tests non completed, rounding off of the results, same result, etc.) Omar Paccagnella SHARELIFE meeting November 6, 2007

  8. Group 4 : Other checks All issues that could be misunderstood (by respondent and/or interviewer) : • Answer category: if the question includes also the “Other” option, check whether some of the answers may be recoded in one of the categories already defined. A large number of “other” answers: do we miss something ? • Year/age of some events: are they compatible with the age of respondents? Omar Paccagnella SHARELIFE meeting November 6, 2007

  9. Some final thoughts Data cleaning is not only the corrections of some errors, but it is a way to check and evaluate the quality of our datasets: we can find sections where data are less good (compared to other similar surveys), the variables that need more attention (both analyzing the data and preparing the briefings). A good data cleaning begins at the beginning of the field Omar Paccagnella SHARELIFE meeting November 6, 2007

More Related