1 / 30

U rban Audit II Methodological Workshop Paris, 7-8 April 2003 INSEE Dir. Gén. - Room 1245

U rban Audit II Methodological Workshop Paris, 7-8 April 2003 INSEE Dir. Gén. - Room 1245. Estimation for Domains and Small Areas: A REVIEW Prof. Risto Lehtonen University of Jyväskylä, Finland Outline Definitions of technical concepts Basic approaches for domain estimation

cchancellor
Download Presentation

U rban Audit II Methodological Workshop Paris, 7-8 April 2003 INSEE Dir. Gén. - Room 1245

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Urban Audit II Methodological Workshop Paris, 7-8 April 2003 INSEE Dir. Gén. - Room 1245 Estimation for Domains and Small Areas: A REVIEW Prof. Risto Lehtonen University of Jyväskylä, Finland Outline • Definitions of technical concepts • Basic approaches for domain estimation • Computational tools • Selected literature

  2. GENERAL UAII STRATEGY Output harmonisation • Key principle: The use of jointly agreed definitions of the basic concepts behind the indicators • ”Definitions Handbook” of UAII • The production of the indicators are based on Current Best Methods (CBM:s) available in a country

  3. Current best methods, CBM A CBM for a given indicator in a given country depends on the statistical infrastructure, data availability, the availability of computational tools, and similar conditions • Any overall standardized methodology to be used for each special case in each country will not be assumed • The feasibility of the methods will be assessed by the Landsis expert group, and alternative methods will be proposed if needed • Co-ordination of the methodology will take place, and similar solutions will be encouraged for relevant countries when possible • Here, an active role of the country coordinators is important • See “Methods Handbook” of UAII (forthcoming)

  4. ESTIMATION • Estimation: A procedure to produce the type B indicators for UAII • Alternative data sources for estimation: • Census data, register data (population census, administrative registers, statistical registers, population registers, business registers) • Sample survey data covering only a part of the population • Combinations of census/register data and sample survey data • Unit-level: Data at the micro level • Area-level (domain-level): Data at aggregate level

  5. Population subgroup = Domain of interest Non-overlapping subgroups of population EXAMPLE: Regional subgroups, Nuts4, Nuts5, Core City, LUZ, Sub-City Breakdown by sex-age groupings within regional areas ESTIMATION FOR DOMAINS

  6. Estimation for domains: The estimation of unknown population parameters for domains – regional areas – Core City, LUZ, Sub-City EXAMPLE 1: Use of sample survey dataThe estimation of the total number of ILO unemployed for Core City areas using unit-level data from LFS ESTIMATION FOR DOMAINS (contd.)

  7. EXAMPLE 2: Use of register dataThe estimation of disposable income per OECD consumer unit for Core City sub-regions using data from a census register EXAMPLE 3: Combined use of register data and sample survey data The estimation of the total number of ILO unemployed for Core City regions using data from LFS strengthened by data from a register of unemployed job-seekers ESTIMATION FOR DOMAINS (contd.)

  8. EXAMPLE 2: Disposable Income based on register data

  9. Small area: Refers to a sample survey where the number of sampled units in one or several domains of interest happens to be small EXAMPLE: A division of the population by sex-age groupings within Core City regions using data from a LFS SMALL AREA ESTIMATION

  10. Small area estimation (SAE): The estimation of unknown population parameters for small areas NOTE: SAE is a statistical concept specific for sample surveys! EXAMPLE: The estimation of the total numbers of ILO unemployed and ILO employed for Core City regions by sex-age groupings using data from a LFS and the available auxiliary register data sources SAE (contd.)

  11. DOMAIN ESTIMATION AND SAE • Estimation for domains and small areas using sample survey data: • Large domains (where the number of sampled units is large): Standard statistical techniques usually apply • Small domains (where the number of sampled units is small): Special SAE techniques are often needed

  12. EXAMPLE Simple example: Direct HT estimators for domains • Indicator ILO unemployment rate • Concepts ILO unemployed, ILO employed • Definitions ”A person who…” • Data set LFS, pooled annual data at unit level • Variables y1 = ILO unemployed 0: No, 1: Yes y2 = ILO employed 0: No, 1: Yes • Population Population totals for regional areasparameters d=1,…,D

  13. EXAMPLE (Contd.) Direct Horvitz-Thompson HT estimators of unknown population totals for regional areas d=1,…,D #Unemployed #Employed Indicator: UE rate in aread

  14. Direct methodsUse of y-data from domain d onlyEXAMPLE: HT estimator (2) Indirect methodsUse of y-data from other domains also”Borrowing strength”EXAMPLE: EBLUP estimator (3) Design-based methods Design-based model-assisted estimation Option: Use of models and auxiliary information EXAMPLE: Generalized regression estimator GREG (4) Model-based methods Model-dependent estimators Option: Use of models and auxiliary information EXAMPLE: Synthetic estimator BASIC APPROACHES

  15. (5) Methods based on data at unit (micro) level EXAMPLES: GREG estimator Unit-level EBLUP estimator (6) Methods based on aggregate data at domain level EXAMPLE: Fay-Herriott estimator NOTE: (1) – (6) are aspects of estimation methodologies for domains and small areas based on the use of combinations of sample survey data with auxiliary information from administrative registers and similar data sources BASIC APPROACHES (Contd.)

  16. TRADITIONAL DIRECT ESTIMATORS

  17. TRADITIONAL INDIRECT ESTIMATORS

  18. TRADITIONAL INDIRECT ESTIMATORS (Contd.)

  19. CURRENT SAE METHODS Key features: • Use of complex statistical models • Incorporation of efficient auxiliary information into the estimation procedure • Borrowing strength if possible • In temporal dimension • In spatial dimension

  20. Enhancing Small Area Estimation Techniquesto Meet European Needs EURAREA is a three year project, funded by the European Community, which is being undertaken by a consortium of 6 European National Statistical Institutes (NSIs) and 5 universities, covering 7 European countries. The overall aim of the project is to improve small area estimation methods currently used within European NSIs. The project runs from 1st January 2001 to 31st December 2003. Main project objectives. To assess the effectiveness of 'standard' small area estimation techniques in estimating European data, taking into account the survey designs used to collect the data. To develop enhancements to the 'standard' techniques which reflect the requirements, and strengths, of European statistical systems. To provide an extensive external validation of the estimators - which will be carried out using design-based simulation experiments conducted using real (or as real as possible) population data from six of the seven European countries involved. To ensure the results of the project can be easily implemented by third arties we will be producing fully documented and tested pieces of program code written in SAS language to accompany the theory and results. To ensure the project results are effectively disseminated a 'project reference volume' will be produced at the end of the project containing all the outcomes from this project. This will be available on this website and presented at an end of project conference during 2004. Expected achievements/impact The results of this validation exercise will enable statisticians from countries all over Europe to make an informed judgement about the practical benefits of adopting small area estimation methods; and the rest of the project will provide them with the theory and software needed to apply the methods. In addition, the enhancements made to the standard methods as a result of this project should provide a greater range of small area estimators at a statistician's deposal, which will result in improved estimates using more appropriate estimators. EURAREA Project

  21. Objectives The first part of the project consists of assessing the effectiveness of 'standard' small area estimation techniques. By 'standard' techniques we mean the techniques of domain estimation (synthetic estimators, GREGs, and composite estimators) which entered into use in the United States and Canada in the 1980s, and have been the subject of steady theoretical refinement since. In this part of the project we will be focusing on up-to-date, but relatively straightforward, linear and logistic versions of these estimators. The project will be assessing their effectiveness in estimating European data, taking into account the survey designs used to collect the data. In the main, theoretically innovative, part of the project we will be enhancing the 'standard' techniques in four major ways which reflect the requirements, and strengths, of European statistical systems. The four major themes for the innovative research will be: borrowing strength over time - using time series data borrowing strength over space (i.e. taking account of spatial correlation and allowing for the modifiable area unit problem) investigating the effect of complex sample designs and developing sample design criteria that are optimal for small area estimation providing improved estimates of cross-classifications (using a modified version of the SPREE approach) PARTNERS: ONS, U. Southampton, Statistics Finland, U. Jyväskylä, SCB, ISTAT, U. Roma 3, Statistics Norway, INE, UMH, U. Poznan See: www.statistics.gov.uk/methods_quality/eurarea/ EURAREA Project (Contd.)

  22. CURRENT SAE METHODS (Contd.) Duality between estimator and model • Estimators of parameters for areas d=1,…,DEXAMPLE: Area totals, Area means • Synthetic estimator • GREG estimator

  23. CURRENT SAE METHODS (Contd.) • Fitted (predicted) values for all population elements are obtained from the same model by where f refers to the functional form of the model, x are the vectors of auxiliary variable values, are the estimated fixed effects, are the estimated domain-specific random effects, d=1,…,D

  24. CURRENT SAE METHODS (Contd.) NOTE: The model used in the estimation of the predicted values can be complex, for example: • Family of generalized linear mixed models • Linear mixed models (multilevel model) • Nonlinear mixed models • The models also can incorporate temporal data and/or spatial data by using appropriate parametrization

  25. CURRENT MODEL-BASED METHODS • Composite estimators • Empirical Best Linear Predictor, EBLUP • ”Pseudo” EBLUP incorporating sampling weights • Bayesian techniques • Empirical Bayes (EB) procedures • Hierarchical Bayes procedures (HB) • Markov Chain Monte Carlo MCMC techniques

  26. CURRENT DESIGN-BASED METHODS • Extended family of generalized regression (GREG) estimators • Multilevel-model assisted GREG estimators • Multinomial logistic GREG estimators

  27. COMPUTATIONAL TOOLS Design-based techniques • SAS procedures and macros: PROC SURVEYMEANS – HT estimation for domainsPROC SURVEYREG – Direct linear regression estimation Macro CLAN – Linear GREG estimation and calibration Others may exist as well… • Other software products Stata, S+ and R programs,…

  28. COMPUTATIONAL TOOLS (Contd.) Model-based techniques • SAS procedures: PROC MIXED – linear mixed models, EBLUP, ”Pseudo” EBLUP type estimators • MLwiN, HLM – Multilevel modelling • WinBUGS – Bayesian techniques, MCMC • Other software products Stata, S+ and R programs,…

  29. SELECTED LITERATURE Ghosh M. (2001) Model-dependent small area estimation: Theory and practice. In Lehtonen R. and Djerf K. (eds.) Lecture Notes on Estimation for Population Domains and Small Areas. Helsinki: Statistics Finland, Reviews 2001/5, 51-108.Lehtonen R., Särndal C.-E. and Veijanen A. (2003) The effect of model choice in estimation for domains, including small domains. Survey Methodology 29 (in press).Rao J.N.K. (1999) Some recent advances in model-based small area estimation. Survey Methodology 25, 175-186.

  30. SELECTED LITERATURE (Contd.) Rao J.N.K. (2003) Small Area Estimation. New York: Wiley. Särndal C.-E., Swensson B. and Wretman J. (1992) Model Assisted Survey Sampling. New York: Wiley. Särndal C.-E. (2001) Design-based methodologies for domain estimation. In Lehtonen R. and Djerf K. (eds.) Lecture Notes on Estimation for Population Domains and Small Areas. Helsinki: Statistics Finland, Reviews 2001/5, 5-49.

More Related