200 likes | 230 Views
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of a sampling strategy in the 2011 Italian Population Census. Giancarlo Carbonetti, Mariangela Verrascina
E N D
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009)Accuracy evaluation of Nuts level 2 hypercubes with the adoption of a sampling strategy in the 2011 Italian Population Census Giancarlo Carbonetti, Mariangela Verrascina Istat – Italian National Institute of StatisticsDivision for General Censuses Geneva, October 29th 2009
Joint UNECE Eurostat Meeting Why do we adopt sampling techniques in the Italian Census? • The 2011 population census has been planned in order to: • improve the efficiency of the survey operations; • reduce the workload of the municipalities; • minimize the statistical burden for the people. • The main solutions proposed are related to: • use of population registers; • census forms mail out; • mixed mode of data collection. • A high response rate is needed. Sampling is crucial for the new census strategy.
Joint UNECE Eurostat Meeting Which effects by adopting a sampling technique? Advantages • To keep high level of quality (reducing non-sampling error sources) → this is an opportunity • Timeliness for a smaller amount of data to process (hypercubes must be delivered to Eurostat by 1 April 2014)→ this is a constraint Disadvantages • Introduction of sampling error → an evaluation of accuracy of the sampling estimates is required
Joint UNECE Eurostat Meeting The framework • POPULATION: private households. • LISTS: population registers managed by the municipalities. • VARIABLES: non-demographic variables. • DOMAINS: Census Areas. • DIFFERENT STRATEGY: municipality demographic thresholds. A simulation study has been conducted in order to define which methodology performs the most accurate estimation.
Joint UNECE Eurostat Meeting Some results of the simulation study • Simple Random Sampling of HOUseholds (SRSHOU) from population registers. • Area Frame Sampling where reliable population registers are not available. • Calibrated Estimators. • Definition of Census Areas of about 15,000 inhabitants. • Sampling Ratio of 33%.
Joint UNECE Eurostat Meeting Distribution of average and maximum cv% for classes of cell counts for three tested sampling ratios (SRSHOU design) • for cells of 1,000 units cv is about 4% • for cells of 100 units cvis about 13% • for cells of 10 units cvis about 40%
Joint UNECE Eurostat Meeting Curves of sampling errors drawn by the simulation results
Joint UNECE Eurostat Meeting Relevant issue • Which is the impact of the sampling error on the dissemination hypercubes? The answer is the core of this presentation where the impact of the sampling strategy on the final results will be carefully explained.
Joint UNECE Eurostat Meeting Impact of sampling errors on dissemination hypercubes Having chosen the sampling strategy (SRSHOU; calibrated estimator), for an area and a dissemination hypercube: Example 1: if less than 1/3 of cell counts have a cv>12.5% Example 2: if less than 10% of persons are classified in cell counts where cv>12.5% “When can the quality of the statistical table be considered acceptable?” For a fixed cv (for instance, a critical level should be 12.5%), the global quality of a dissemination hypercube can be acceptable: → if the percentage of cell counts estimated with a cv higher than the critical value is low; → if the percentage of persons classified in those cells is low.
Evaluation of the sets of estimates with critical accuracy by means of a sampling errors curve Joint UNECE Eurostat Meeting cv The lower the amount of information estimated with high levels of cv (referred to persons classified in cells with absolute frequencies lower than the threshold TS), the higher the quality of the related dissemination hypercubes. High sampling errors Absolute frequencies estimated with a critical quality cv_max critical threshold 12.5% sampling error TS Absolute Frequencies T Set of estimates with cv>12.5% Set of estimates with cv<12.5%
Joint UNECE Eurostat Meeting Quality evaluations for hypercubes at NUTS level2 • Evaluations are related to 8Eurostat hypercubes crossing demographic variables with one or more long form variables and referred to NUTS level 2. • The considered hypercubes contain topics with breakdowns used in 2001 Italian Census dissemination, close (in terms of number and information content) to breakdowns to be provided for the next census round. • The number of cells goes from 1,000 to more than 20,000 depending on the complexity of the statistical table.
Joint UNECE Eurostat Meeting Hypercubes at NUTS level2 considered in the study (draft version, April 2009) Hypercube computations are simulated with 2001 Census data Long Form variables Each non-demographic variable has been individually crossed with sex and age (single ages). More than one non-demographic variables have been crossed with sex and age (age classes).
Joint UNECE Eurostat Meeting Number of potential cells and acceptable cells for each hypercube considered in the study Number of potential cells = the product of the number of categories Number of acceptable cells = the number of potential cells without “structural zeros”
Joint UNECE Eurostat Meeting Indicators of global accuracy Two indicators are proposed to measure the global accuracy of census data produced by adopting a sampling strategy and referred to a dissemination hypercube: 1) Percentage of critical cells = number of cell counts (>0) lower than the critical threshold Ts / number of acceptable cells 2) Percentage of persons in critical cells = persons classified in critical cells / total of persons In particular, the second indicator quantifies the percentage of people classified in cells which will be estimated with a low accuracy (10% could be considered a tolerable limit).
Example 1: Hypercube H.B1.E1.R3. Quality indicators related to NUTS2 areas of Italy: Molise, Marche and Sicilia Joint UNECE Eurostat Meeting Hypercube H.B1.E1.R3: sex (2) by age (21) by current activity status (6) by industry (17) by educational attainment (7). Number of acceptable cells = 5,574 (no structural zeros). The cells are critical if the related absolute frequency is lower than the threshold TS observed in correspondence of cv_max =12.5% .
Example 2: Hypercube H.B1.E1.R4. Quality indicators related to NUTS2 areas of Italy: Molise, Marche and Sicilia Joint UNECE Eurostat Meeting Hypercube H.B1.E1.R4: sex (2) by age (13) by occupation (10) by industry (17) by educational attainment (7). Number of acceptable cells = 26,350 (no structural zeros). The cells are critical if the related absolute frequency is lower than the threshold TS observed in correspondence of cv_max =12.5% .
Joint UNECE Eurostat Meeting Expected quality for hypercubes at NUTS level2 Distribution of all 20 Italian Nuts2 areas by percentage of persons classified in critical cells for the Eurostat hypercubes considered in the study and the three tested sampling ratios. * * * * * * * * *
Joint UNECE Eurostat Meeting Concluding remarks • The adoption of a sampling strategy doesn’t seem to bring a reduction of accuracy. • The sampling error could have a considerable impact only to estimate very small frequencies. • NUTS2 hypercubes with different complexity could be estimated with good accuracy even for lower sampling ratios. • The revised version of the hypercubes considered in the work seems to be less detailed. This will hopefully bring more accuracy.
Joint UNECE Eurostat Meeting Some solutions to enhance accuracy Enhancements of estimates regarding rare events and small domains in order to increase their efficiency and to reduce the number of critical cells. • Adopting small area estimators. • Increasing the set of variables to be observed on the whole population, reducing the set of variables that have to be surveyed on samples of households: • adoption of a medium/long form.
Joint UNECE Eurostat Meeting Thank you for your attention. carbonet@istat.it - verrasci@istat.it