190 likes | 389 Views
United States Department of Agriculture National Agricultural Statistics Service. International Conference on Establishment Surveys III Montreal • June 18-21, 2007. Slide 1. Slide. Slide 1. Generalized Census Processing System at the National Agricultural Statistics Service.
E N D
United States Department of Agriculture National Agricultural Statistics Service International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 1 Generalized Census Processing System at the National Agricultural Statistics Service Thomas Jacob, Carol House National Agricultural Statistics Service
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 2 Presentation Outline • Census of Agriculture Overview • 2002 Census Processing System • Reasons for redesign • Redesign initiatives • Dashboard for continuous monitoring • Can the system be more generalized? • Acknowledgements • Questions
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 3 Census of Agriculture Overview • In 1997 Census of Agriculture was transferred from U.S Bureau of the Census • 2002 -3 Million report forms mailed out • 400+ system users in Headquarters and Field Offices • Over 1,500 variables • Over 110 published tables per state and US • Volume, volume, volume
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 4 2002 Census Processing System • NASS contracted National Processing Center (NPC) for - Mail out, Check in, Capturing images and Capturing data ( OMR +ICR) • SAS based system for Edit, Imputation and Analysis using Sybase and Redbrick databases - Edit Specifications captured using Decision logic table (DLT) - Micro level and macro level analysis - Automated edit using DLT - Tried to implement Fellegi-Holt (FH) methodology and DLT as a two-tier edit - Goal of 80% data not touched by analysts. OMR=Optical Marker Recognition ICR=Intelligent Character Recognition
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 5 What Worked Well • Completed Census on Schedule • Questionnaire Imaging • Analysis - Macro and Micro tools • % of records touched • Disclosure routines worked well but independently
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 6 Reasons for Redesign • Increase system speed - Edit and Imputation was extremely slow (could only edit 75 records at a time) - Issues with loads between databases - Slow communication lines - Database design was inefficient - Nearest Neighbor Imputation using sequential search
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 7 Reasons for Redesign • Increase effectiveness and quality of process - Minimize data capture errors - Time consuming analysis - Inadequate dashboard for identifying influential records - Need for true interactive edit (IE) - Disclosure routine in old FORTRAN code
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide Slide 1 Slide 8 Disclosure/ Tabulation Edit/Imputation/IE Replication Server Replication Server CATI Sybase/OLTP Redbrick/OLAP Web Raw Data Paper Forms PRD Analysis SCAN KFI Donor Pool Batch Edit DLT Edit Data Review Interactive Edit Data Review Interactive Edit Data Review Interactive Edit Images
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide Slide 1 Slide 9 Disclosure/ Tabulation Edit/Imputation/IE Replication Server Replication Server CATI Sybase/OLTP Redbrick/OLAP Qua Web Raw Data Paper Forms PRD Analysis SCAN KFI Donor Pool Batch Edit DLT Edit Data Review Interactive Edit Data Review Interactive Edit Data Review Interactive Edit Images
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 10 Redesign Initiatives • Multiple modes of data collections ( CATI, WEB, KFI, …)- but use the same module for loading data • Key from Image (KFI) instead of scanning (OCR&OMR) • Create an indicator denoting additional information occurred on the report form ( Respondent notes, Remarks, Altered Stubbs) • Create images for respondents who responded through CATI, Web
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide Slide 1 Slide 11 Disclosure/ Tabulation Edit/Imputation/IE Replication Server Replication Server CATI Sybase/OLTP Redbrick/OLAP Web Raw Data Paper Forms PRD Analysis SCAN KFI Donor Pool Batch Edit DLT Edit Data Review Interactive Edit Data Review Interactive Edit Data Review Interactive Edit Images
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 12 Redesign Initiatives • Batch edit in Unix, IE in PC( local) using the same code and same donors • True interactive edit (IE) • Dual screens for Data Review and Image comparisons • Improve donor search strategies- scalable using daemons & SAS/SHARE • More use of Previously reported Data (PRD)
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide Slide 1 Slide 13 Disclosure/ Tabulation Edit/Imputation/IE Replication Server Replication Server CATI Sybase/OLTP Redbrick/OLAP Web Raw Data Paper Forms PRD Analysis SCAN KFI Donor Pool Batch Edit DLT Edit Data Review Interactive Edit Data Review Interactive Edit Data Review Interactive Edit Images
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 14 Redesign Initiatives • Creating new data models for both Transactional (OLTP) and Analytic databases (OLAP) • Editing is in OLTP environment. Analysis is in OLAP environment • Introduce Replication server- moves and synchronizes data between OLTP and OLAP • Perform more server side processing using SAS/CONNECT to reduce interactive response times OLTP=Online Transaction Processing OLAP=Online Analytic Processing
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 15 Redesign Initiatives • Disclosure module converted to SAS/BASE • The system is more metadata driven. • Provide quality control grids to monitor the editing effects on the data
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 16 Dashboard for Continuous Monitoring • Implementing a Quality Control module to track four major areas in a proactive mode - Administrative Management Information System (MIS) reports to track weekly progress - Data Monitor what the system is doing to the data. Tables, maps, graphs, outlier grids Independent check of record level inconsistencies - Elapsed Times Track how long key processes are taking to run - System Stability Track key indicators that can impact performance of databases, UNIX machines, SAS, etc.
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 17 Can the system be more generalized? • Wanted to have one system for Surveys and Censuses • Metadata can handle both • Imputation can handle different types of imputation • A few Surveys are using the system • Survey Analysts are reluctant to use DLT for Survey edits • FH methodology sent back to research for further evaluation.
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 18 Acknowledgment We want to thank each and every member in the 2007 Census Team for their tireless efforts to make the redesign initiatives a reality.
International Conference on Establishment Surveys III Montreal • June 18-21, 2007 Slide 1 Slide Slide 19 Questions?