1 / 19

Assessing quality of income data in a survey vs an administrative source

Assessing quality of income data in a survey vs an administrative source. Dmitri Romanov and Yuri Gubman. Q2014 Vienna, 5 June 2014. Outline. Motivation Literature Research hypotheses Data Methodology Key findings and conclusions. Motivation.

kemal
Download Presentation

Assessing quality of income data in a survey vs an administrative source

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assessing quality of income data in a survey vs an administrative source Dmitri Romanov and Yuri Gubman Q2014 Vienna, 5 June 2014

  2. Outline Motivation Literature Research hypotheses Data Methodology Key findings and conclusions Romanov&Gubman: Quality of income data in survey vs. administrative file

  3. Motivation Income items are often presented in the surveys by a banded question, when income variable does not serve for substantive analysis, but as a socioeconomic classificatory variable, e.g. in Israel’s Social Survey. This form is considered to be mitigating the issues of item non-response and inaccuracy associated with the sensitivity of income inquiry (Tourangeau and Smith, 1996). Administrative data on incomes, usually from the national tax or social security authorities, are available in many NSO. These are regarded as a (partial) substitute for the survey data. Romanov&Gubman: Quality of income data in survey vs. administrative file

  4. Motivation (cont.) Income data from both types of sources, administrative and survey, differ in definitions, timeliness, periods of reference, and populations covered. Any unadjusted comparison should yield discrepancies. Even after a thorough adjustment on both sides, meaningful discrepancies are normally found. There is a lively debate on the issue: Which source better represents the true income? Is under-reporting in a survey preferable--for statistical purposes--to cheating to tax authorities? Without taking a stand on that, we analyze the distribution of discrepancies, by income source. Romanov&Gubman: Quality of income data in survey vs. administrative file

  5. Literature Micklewright and Schnepf (2007): the use of a banded question may result in loss of information within the bands and, in turn, may impair the accuracy of the aggregate estimates derived from the survey. Moore et al. (JOS 2001): “Response bias estimates for wage/salary income amounts are generally small and without a consistent sign, indicating neither under- nor over-reporting bias given accurate source reporting”. Martin et al. (1996): reporting of income from business and self-employed activity, is more difficult in terms of conceptualization, definition and questioning than reporting wage/salary income and transfer payments. Romanov&Gubman: Quality of income data in survey vs. administrative file

  6. Literature (cont.) Romanov and Furman (2006): found a “regression to the mean” of the reportage of income in a census around the data from an administrative file. If errors among the wealthy in one direction are offset by errors among the poor in the opposite direction, the average error may be negligible. Abowd and Stinson (2011): both survey and administrative data on earnings are viewed as noisy measures of some underlying true amount of annual earnings. Romanov&Gubman: Quality of income data in survey vs. administrative file

  7. Research hypotheses The distribution of discrepancies between the survey and administrative income measures exhibits a “regression to the mean”, i.e. the discrepancies at the ends of the distribution are of opposite sign and, possibly, heteroscedastic. Different factors explain the discrepancies on each end of the income distribution. The factors are related to patterns of employment and regularity of earnings: number of jobs, part-time employment and periods of unemployment, fringe benefits. Different factors explain the discrepancies among the wage-earners and among the self-employed individuals. A leading factor is a cognitive complexity of response on the income item for the SE. Romanov&Gubman: Quality of income data in survey vs. administrative file

  8. Data In 2008’ Social Survey (CAPI, response rate 83%), 4,493 working individuals, employees or self-employed, were asked the GROSS INCOME question Employees: Last month, what was your gross income, before deductions, from all places where you worked? Self-employed: Last month, what was your gross income, before deductions, from all places where you worked, including wages and income from a business? Overall item non-response rate was 8.26% The respondents were asked to choose a value from a card listing 10 predetermined income bands Romanov&Gubman: Quality of income data in survey vs. administrative file

  9. Data (cont.) These records were matched, by unique PIN, with the administrative file that contains information on all taxable incomes from wage/salary and self-employment/business of individual taxpayers. Annual administrative records were adjusted by the number of months of individual’s employment in all jobs (12 for the SE), and traced to the “last month” inferred to in the Social Survey. Romanov&Gubman: Quality of income data in survey vs. administrative file

  10. Data (cont.) For the 2008’ Social Survey, Blaise Audit Trail (AT) log files are available. This log captures, inter alia, time of entrance to fill in a field (a question), time of exit, value entered by interviewer. If the respondent corrects a previous response, as many lines are recorded in the relevant field in the AT file as the number of times the interviewer corrected the value of the field. Matching between the survey, the administrative file, and the AT log file yielded 3,417 records. Romanov&Gubman: Quality of income data in survey vs. administrative file

  11. Data (cont.) Thick line—income from administrative file by survey reportage bands. Thin line—income reported in survey. Broken line—continuous estimator of income reported in survey based on fitted Romanov&Gubman: Quality of income data in survey vs. administrative file

  12. Methodology Dependent variable: discrepancy (in %) between the value of individual’s income in the administrative file and an approximation of her income from the survey. The approximation of the banded values of gross income was found by ML fitting of different theoretic distributions. Best fit was received for the skew-t distribution. “No errors” are considered when both values fall within the same band. Romanov&Gubman: Quality of income data in survey vs. administrative file

  13. Methodology (cont.) Explanatory variables: Basic socio-demographic characteristics: gender, age, immigrant status, marital status, ethnic group, education. Employment traits for WE: number of jobs held, usual working hours, part-time main job, more/less-than usual hours last week, job satisfaction, wage raise/cut last year. Contract-dependent fringe benefits: receives full pay for sick days, pension and advance training fund contributions paid by employer, participation in profit-sharing, receives company car, reimbursement of transportation expenses. Ln of income from the administrative file. Romanov&Gubman: Quality of income data in survey vs. administrative file

  14. Methodology (cont.) Audit Trail variables: Response time. After the GROSS income question, a NET income question was asked. When facing it, some respondents realized they had given net income value (which people in Israel are more familiar with) instead of gross. Then they returned to the gross income question and corrected the previous answer. Two variables constructed on this basis: • Dummy variable of correction of response about gross income during interview. • Size of response correction. Romanov&Gubman: Quality of income data in survey vs. administrative file

  15. Key findings and conclusions • Regression to the mean occurs in the income reported in the survey as against that obtained from the administrative source. Positivediscrepancies were more common among low-income respondents while negativediscrepancies were more common among those of high income. • A significant negative monotonic relation was found between the income recorded in the administrative file and the measurement error. The relation was stronger for negative discrepancies than for positive ones (elasticity -0.5 and -0.2). Romanov&Gubman: Quality of income data in survey vs. administrative file

  16. Key findings and conclusions • High-income persons tend to “forget” income that they received from additional jobs, overtime, self-employed income, fringe benefits, and windfall gains such as bonuses and profit-sharing. As a result, they tend to under-report their labor income in the survey relative to the administrative data. • In contrast, low-income workers, who hold part-time and/or irregular jobs, tend to report the income they receive in a full month of work, a level that may be not representative of their average income. Consequently, they are likely to over-report their income in the survey relative to the administrative data. • This proves that the factors related to negative discrepancies (at the high end of income distribution) are different from those that are associated with negative discrepancies (at the low end of income distribution). Romanov&Gubman: Quality of income data in survey vs. administrative file

  17. Key findings and conclusions • Employees and self-employed should not be pooled into one model due to material differences in the conceptual definition of income, how income is measured, and volatility in income level during the year. • The response time to the gross income question was 27% longer among the self-employed than among employees, indicating that the former found the question harder to answer. Among the self-employed, positive discrepancies were almost twice as frequent as negative ones but the average was close to zero. • Only a few factors were found to be related to discrepancies among the self-employed: subjective variables such as satisfaction with income and expectations of business progress in the near future. Romanov&Gubman: Quality of income data in survey vs. administrative file

  18. Key findings and conclusions • Inserting a question about NET income immediately after inquiring about GROSS income, as a logical way to allow respondents to control their responses, caused 22% of the respondents to go back and check the accuracy of their responses to the gross income question. Reversionto the gross income question during the interview and correction of the answer reduced the discrepancies among both employees and the self-employed. • When analyzing measurement errors, one would be advised to do it separately for the employees and the self-employed, for positive and negative discrepancies, and with different factors for each group. Romanov&Gubman: Quality of income data in survey vs. administrative file

  19. Thank you

More Related