1 / 19

APPLICATION OF THE DEVELOPED SAS MACRO FOR EDITING AND IMPUTATION AT STATISTICS LITHUANIA

APPLICATION OF THE DEVELOPED SAS MACRO FOR EDITING AND IMPUTATION AT STATISTICS LITHUANIA. Jurga Rukšėnaitė Chief specialist Methodology and Quality division. TOPICS. Methods of detection of errors and outliers Methods of data imputation

kacia
Download Presentation

APPLICATION OF THE DEVELOPED SAS MACRO FOR EDITING AND IMPUTATION AT STATISTICS LITHUANIA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. APPLICATION OF THE DEVELOPED SAS MACRO FOR EDITING AND IMPUTATION AT STATISTICS LITHUANIA JurgaRukšėnaitė Chief specialist Methodology and Quality division Work Session on Statistical Data Editing

  2. TOPICS • Methods of detection of errors and outliers • Methods of data imputation • The use of the developed SAS Macro at Statistics Lithuania (practical example) Work Session on Statistical Data Editing

  3. I Methods of detection of errors and outliers For quantitative variables Universal method Interval method Standard deviation rule Testing of hypothesis Work Session on Statistical Data Editing

  4. I.1 Universal method Work Session on Statistical Data Editing

  5. I.2 Interval method Work Session on Statistical Data Editing

  6. I.3 Standard deviation rule Work Session on Statistical Data Editing

  7. I.4 Testing of hypothesis Work Session on Statistical Data Editing

  8. II Imputation • Imputation using distributions • Imputation using donors • Imputation using models Work Session on Statistical Data Editing

  9. II.1 Imputation using distributions (1) Figure 1. Statistical models Work Session on Statistical Data Editing

  10. II.1 Imputation using distributions (2) Work Session on Statistical Data Editing

  11. II.2 Imputation using donors • Historical (cold-deck) imputation replaces the missing value of an item with a constant value from an external source (previous survey). • Hot-deck imputation replaces missing data with comparable data from the same data set. • Nearest neighbor imputation replaces missing data with the donor value. The right donor is found by calculating the distance function from a set of auxiliary information. Work Session on Statistical Data Editing

  12. II.3 Imputation using models Work Session on Statistical Data Editing

  13. Practical examples Work Session on Statistical Data Editing

  14. Example 1. Detection of outliers • Quarterly statistical survey on short-term statistics on service enterprises • The study variable is income in each quarter (PAJ3), • The auxiliary variable is the number of employees. The output Work Session on Statistical Data Editing

  15. Example 2. Verification of imputation for quantitative data The verification table shows the percentage difference between the predicted and the real value Work Session on Statistical Data Editing

  16. Example 3. Verification of imputation for qualitative data Simulated data was used. The study variable y4 has two possible values: 1 and 2. Three auxiliary variables: x1, x2, and x3. Work Session on Statistical Data Editing

  17. Conclusions and future work • SAS Macro program consists of five parts: detection of errors, detection of outliers, imputation using the nearest neighbor method, imputation using models, and imputation using distributions. • Several trainings were organized for the employees of Statistics Lithuania. 37 employees attended the training of this program. Half of them is using or going to use the SAS Macro in their work. • The program was tested using real data. The results showed that time spent for data editing/imputation was reduced. • The program not only gives a new data set with imputed values but also calculates several statistics (sample mean before and after imputation, standard deviation before and after imputation), which can be used to assess the quality of imputation. • The latest improvement to this program enables the identification of strata variable. This improvement allows finding errors or outliers and imputing missing values separately in each stratum, group or domain. • The methods programed now are the simplest one; therefore, later, more complicated methods for the imputation and detection of outliers will be added to the program. Work Session on Statistical Data Editing

  18. Questions? Work Session on Statistical Data Editing

  19. References • Chen J. and Shao J. Nearest neighbor imputation for survey data. Journal of Official Statistics, 16: 113–131, 2000. • Čekanavičius V., Murauskas G. Statistikairjostaikymai // 1 dalis. TEV, Vilnius, 2000. • Čekanavičius V., Murauskas G. Statistikairjostaikymai // 2 dalis. TEV, Vilnius, 2002. • Granquist L. Macro-editing. A review of some methods for rationalizing the editing of survey data. http://www.unece.org/stats/publications/editing/SDE1chB.pdf • McFadden, D. Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics, ed. P. Zarembka, New York: Academic Press: 105-42. 1974. • Krapavickaitė, D., Plikusas, A. Imčiųteorijospagrindai. Vilnius: Technika, 2005. • Little R.J.A. and Rubin D. B. Statistical analysis with missing data. Wiley, 1987. • Luzi O., et al. Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys. EDIMBUS-RPM, 2007. http://epp.eurostat.ec.europa.eu/portal/page/portal/quality/documents/RPM_EDIMBUS.pdf • Nordholt E. S. Imputation: Methods, Simulation Experiments and Practical Examples. International Statistical Review, 66: 157–180, 1998. • Statistical data editing. Methods and techniques. Vol. 1, United Nations, 1994. • Statistical data editing. Impact on data quality. Vol. 3, United Nations, 2006. Work Session on Statistical Data Editing

More Related