1 / 41

SAE-SPORD Project Team Statistics Research and Innovation Division Statistics Canada, Ottawa

Statistics Canada’s Small Area Estimation Product: BUPF 1.0 (Best Unbiased Prediction via Filtering). SAE-SPORD Project Team Statistics Research and Innovation Division Statistics Canada, Ottawa (for presentation to FLMM_LMIWG Workshop on Oct 17, 2007, Vancouver, BC).

osric
Download Presentation

SAE-SPORD Project Team Statistics Research and Innovation Division Statistics Canada, Ottawa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics Canada’s Small Area Estimation Product: BUPF 1.0(Best Unbiased Prediction via Filtering) SAE-SPORD Project Team Statistics Research and Innovation Division Statistics Canada, Ottawa (for presentation to FLMM_LMIWG Workshop on Oct 17, 2007, Vancouver, BC)

  2. Project: SAE-SPORD(Small Area Estimation for Statistical Product Oriented R&D) Team: Avi Singh (Project Leader) François Verret Claude Nadeau Pin Yuan Acknowledgments: Meth Res Block Fund, Labour Stat Div, FLMM-LMIWG

  3. Outline 1. SAE: Introduction 2. SAE: Visual Depiction 3. Product BUPF: Description 4. BUPF Application to Labour Force Survey 5. BUPF Demonstration (GUI Sample Screen-shots) 6. Concluding Remarks and Future Work

  4. 1. SAE: Introduction • Direct estimates for small areas (or domains) not reliable; e.g., for provinces, annual LFS estimates of Managers in Manufacturing and Utilities (a three-digit occupation code A39) are not reliable. Here provinces could be deemed as small areas. • Data Requirements: Provincial estimates of employment by 3-digit occupation codes

  5. 1. SAE: Introduction …cont. • Need more sample to get more reliable estimates • A cost effective alternative-- use a model such as the common mean model; e.g., the proportion employed in A39 is common across provinces • Quality of estimates depends on the validity of the model.

  6. 1. SAE: Introduction …cont. • Model provides an indirect (or synthetic) estimate at the area level. • For the common mean model, multiply the national total by the provincial population proportion to get indirect the estimate, e.g., for NL • 1.7% times 92,734 = 1582

  7. 1. SAE: Introduction …cont. • A combination of the two estimates ( direct and indirect) may provide a reasonable estimate with adequate precision depending on the level of small area. • The direct estimate is not precise but unbiased, while the indirect estimate is generally precise but not unbiased.

  8. 1. SAE: Introduction …cont. • SAE combines the direct and the indirect in an optimal way: • SAE for Area d = (shrinkage factor for d) x (direct Estimate for d) + (1- shrinkage factor for d) x (indirect estimate for d) • If the shrinkage factor is 10%, then only 10% of direct and 90% of indirect are used for SAE. If it is 50%, then both direct and indirect have equal say in compositing the two for SAE.

  9. 1. SAE: Introduction …cont. • The relative size of the shrinkage factor depends on variability in modeling error (in the indirect estimate) and sampling error (in the direct estimate). • Effective sample size for SAE is more than that for the direct estimate.

  10. 1: SAE: Introduction (Modeling Requirements) • Direct estimates from other small areas (termed indirect data) needed for modeling purposes; i.e., for predicting estimate for the area of interest. • Need enough small areas for adequate modeling. Subdivide provinces into subprovincial areas: • ER or ER by age by gender instead of province although it is the province level that is of interest.

  11. 1: SAE: Introduction (Modeling Requirements) • Beneficial to have an Auxiliary Information Source (Administrative/ Census): need true population totals at the area level for all areas. • Using auxiliary source can improve modeling with the indirect data.

  12. 1. SAE: Introduction (Modeling Requirements…cont.) • Examples of Auxiliary Information for LFS Application Administrative Source • Number of employment beneficiary claims at the area level • Number with employment income Population Census based demographic projections • Subpopulation counts

  13. 1: SAE: Introduction (Modeling Requirements) • The model predictor based on indirect data and auxiliary data provides an indirect estimate for the area of interest. • The model can be simple such as the common mean model which doesn’t use any auxiliary data or can be advanced.

  14. 1: SAE: Introduction (Modeling Requirements) • All indirect estimates are biased but bias can be low if model is good. • Combining direct and indirect estimates gives rise to estimates more precise than either one. • Benchmarking (Sum of small area total estimates within a subgroup of areas equals the direct estimate of the subgroup) helps in reducing model bias.

  15. SAE: Introduction (User Concerns) • Detailed area-level requirements may vary from user to user. • However, cannot go to a very low level for two reasons: precision of SAEs may not be adequate, and auxiliary data may not be available. • Bias concerns due to use of indirect estimates for borrowing information; models may not be perfect but one chosen with care may be useful. • SAE methodology involves a trade-off between bias and precision

  16. SAE: Introduction (User Concerns…cont.) • External validation of SAE; can be done periodically using census. • Also, validation by ‘local area’ knowledge • Confidentiality concerns ( this may or may not be a problem because smaller the area, more the error in SAE; built-in protection)

  17. 2. SAE: A Visual Depiction For Employment in A39 • However, with the usual SAE model the overall total is not preserved!

  18. 2. SAE: A Visual Depiction...cont. For Employment in A39 • Benchmarking ensures that the total stays the same after modeling

  19. 3. Product BUPF: Description • STC’s SAE product based on the client need identification (re: SAE Workshop in Feb ’05,see www.flmm-lmi.orgfor proceedings) • Main Features • Menu-driven software system • Sampling design is fully taken into account • Self-benchmarking for protection against model breakdowns • Area collapsing to include areas with no or few observations in the modeling process • Extensive model diagnostics and evaluation of estimates • Existing software (such as SAS PROC MIXED, MLwiN, WinBUGS) are not satisfactory

  20. 3. Product BUPF 1.0: Description • Part I : Data Preparations • Part II: Modeling Preparations • Part III: Model Selection and Diagnostics • Part IV: Small Area Estimation and Evaluation • Part V: Summary Report

  21. 4. BUPF Application to LFS • Empirical results presented here are still not final. • Two Main components of the product • Modeling component (for increasing effective sample size) • Estimation Component ( combining direct and indirect)

  22. 4. BUPF Application to LFS…cont • Model: Direct Estimate for Area d = True value + sampling error • True Value= Predictor + Model error • Predictor = x1β1+ x2β2+…; it gives rise to indirect or synthetic estimates. • X-variables considered: # reported income, # employment beneficiary, age-sex counts, etc. all at the small area level

  23. 5. STC’s SAE Product Demonstration BUPF 1.0 Demo

  24. Part I: Data Preparations

  25. Part II: Modeling Preparations

  26. Part II: Modeling Preparations

  27. Part III: Model Selection and Diagnostics

  28. Part III: Model Selection and Diagnostics

  29. Part IV: Small Area Estimation and Evaluation

  30. 6. Concluding Remarks and Future Work • Several unique features in the BUPF product for SAE such as self-benchmarking, domain collapsing for nonsampled domains, and extensive diagnostics. • The Graphical User Interface (GUI) for the product is useful as a systematic checklist or as a virtual analyst for efficient production; also useful for training and product demonstration.

  31. 6. Concluding Remarks and Future Work • Complete beta-version of BUPF 1.0; current version is only alpha or a prototype and is not suitable for production. • Plan for validation study with Census 2006.

  32. For more information, please contact avi.singh@statcan.ca Thank you…Merci

  33. Appendix Product BUPF 1.0: Detailed Description

  34. A1. Product BUPF 1.0: Description • Part I : Data Preparations • M1 : Data Specification • M2 : Task Specification • The definition of Small Area Modeling domains (SAM domains) is very important • Direct estimates, population counts and auxiliary data must be available at this level • # of SAM domains should be high enough for proper modeling • Here, SAM domain = ER(73) by Age(4) by Gender(2)

  35. A2. Product BUPF 1.0: Description • Part II : Modeling Preparations • M3 : Benchmark Constraints & Baseline Model • Self-benchmarking is important to protect against model breakdowns as no model is perfect • Option: No BC, Global BC, Regional BC • M4 : Domain Collapsing • Improved alternative to leaving small sample size SAM domains outside of the model • M5 : Variance Smoothing

  36. A3. Product BUPF 1.0: Description • Part III : Model Selection and Diagnostics • M6 : Model Selection • Standard Forward and Backward procedures implemented • M7 : Variance Component • Needed to find the proper shrinkage to move indirect to direct • M8 : Innovation Sequence • Makes it possible to diagnose the model with standard “iid N(0,1)” error tests • M9 : Model Diagnostics • Residual Plots, QQ-plots, R-square, Chi-square test for overdispersion and for model adequacy…

  37. A4. Product BUPF 1.0: Description • Part IV : Small Area Estimation and Evaluation • M10 : Small Area Estimation • M11 : Evaluation of Estimates • Check for relative difference between direct and SAE • Other measures

  38. A5. Product BUPF 1.0: Description • Part V : Summary Report • M12 : Overall Summary • Sampling Design and Data Sources (Part I) • Input Diagnostics (Part II) • Modeling Diagnostics (Part III) • Ouput Diagnostics (Part IV)

More Related