390 likes | 409 Views
Ethiopian 2007 CENSUS DATA CAPTURING AND PROCESSING CENTRAL STATISTICAL AGENCY (CSA) APRIL, 2008. Background Information. Population and Housing Census process is the largest data capturing exercise a country can undertake. It involves capturing of millions of forms
E N D
Ethiopian 2007 CENSUS DATA CAPTURING AND PROCESSING CENTRAL STATISTICAL AGENCY (CSA) APRIL,2008
Background Information • Population and Housing Census process is the largest data capturing exercise a country can undertake. • It involves capturing of millions of forms • The Central Statistics Agency (CSA) started using old techniques like Punched Card Reader as early 1960’s. • Two Population and Housing Censuses have so far been conducted in Ethiopia. • The first Population and Housing Census was • carried out in 1984.
Background Information Cont’d . . . • During the 1984 Census: • Data capture was done on manual keyboard based entry using mainframe computer • FORMSPEC data entry system was used • It took more than 2 years to capture the data for about 42 million people. • In the case of the 1994 Census: • Data capture was again done on manual keyboard entry basis using PC’s • CENTRY data entry system (IMPS) was used
Background Information Cont’d . . . • It took about 18 months to capture the data for the population of about 53 million. • About 180data entry clerkswere involved • Around 90 Pc’s were used • The entry work was done on 2-shift basis
Some Limitations of the Keyboard Manual Entry Method • Time consuming • Does not allow the availability of timely data • The data will be weaker in representing the current or existing situation • Subject to additional non-sampling errors • Human error due to manual keying • Due to the volume of the data, a 100% verification, as in the case of sample surveys, is difficult.
Limitations Cont . . . • Involves a great deal of human resource management. • Large number of data entry operators and equipment required
The Need for Alternative Solutions • The need to have timelycensus results and the limitations discussed above forced the Agency to look for other alternatives • This is obviously very important with regards to large volume of data like census. • Hence the need to use the Scanning Technology
The Scanning Technology • The Scanning Technologyin generalimplements two basic techniques • Mark recognition, like the Optical Mark Reader (OMR) • Character recognition, like the Optical Character Recognition (OCR), and the Intelligent Character Recognition (ICR)
Scanning Technology Cont . . . • OMR is the recognition of shaded marks (blobs) on the forms • The positioning of these blobs on a form • determines the alphanumeric characters • they represent • The character recognition is the recognition of alphanumeric characters on forms and they are of 2 types: • OCR which is the recognition of machine printed characters and . .
Scanning Technology Cont . . . • ICR which refers to the capture of • hand- printed characters from a form • For scanning of the 2007 Census the Optical • Mark Reader (OMR) technique has been selected • TheScanning Technology we use: • PhotoScribe Series PS900 Scanners • (DRS Scanning Technology Product)
DRS • Photo Scribe Series PS900 • High speed ImagingMark Reader • Windows XP professional • CD R/WR drive • Network connectivity • A TFT monitor, Keyboard, mouse • Speed: up to 8,500 forms / hour
The Scanning Process in General It mainly involves: • Scanning / Data Capture – including IMAGE capturing • ValidationandKey-correction of scanned data • Exporting the scanned and key-corrected data into ASCII or Text format • The format suitable for electronic processing
Learning from Experiences of Other Countries • Study tour made to two African countries • Tanzania • To learn from their successes • Data capture of the 2002 Census of Tanzania was done in about 26 days • General report tables were produced within 3 months from the start of the scanning
Experiences of Other Countries . . . • Ghana • To learn from their difficulties • Data capture of the 2000 Census took about 6 months - ( forms from 29,000 EAs) • 3 Scanners were used (Kodak, Fujitsu) • The larger scanner was Kodak 500D • Speed: About 500 forms/min • Power failure was one of the major problems • Loss of some data occurred as a result • A large generator was installed to minimize the effect of the frequent power cut
Major Benefits of the Scanning Technology • Significant decrease in time required to capture the data • This helps to get timely data • Users’ need satisfied (policy makers, planners, researchers, etc.) • No need to worry to store millions of forms for long time in the future • Scanning captures the whole content of a questionnaire in an electronic image format
Requirements for Effective Scanning • Proper training • Both on Hardware and Software • This helps to “own” the technology • Being able to use the technology after the departure of the trainers / technical advisors • A reliable Network System • A well organizedspace for forms and data flow is required
Data Processing Center Retrieval Warehouse RegisteringEA’s for Scanning 1 Registering & Organizing EA’s Received from the Field 4 Waiting Room 3 5 Scanning Room 2 Receiving the Questionnaires 6 Store Key-Correction Room 7 8 Processing Center STRUCTURED SPACE FOR FILE FLOW
Requirements for Effective Scanning - - - • Proper file management and care • Checking Batch (EA) IDs and orientation of forms • Ensuring the EA code on each box is the same as the one on the questionnaires • Proper recording of the in-coming and out- going questionnaires • Close attention in detecting errors in the scanning process is required
Requirements for Effective Scanning - - - • Ensuring the proper paper throughput through the scanner • Ensuring smooth running of the scanning machines • Maintenance • Cleaning (daily) • An arrangement to minimize the effect of Power Interruption is required
Major Activities Accomplished in the Course of the Census Taking • Data from the Pilot Census was successfully scanned (OMR), key-corrected, exported to text format, tabulated and tested. • One scanner (PS 900 Photo Scribe) was used to capture the pilot data • Technical experts from the DRS company assisted in capturing, validating and exporting the pilot data • Training in scanning technology was given : • 16 professionals were trained
MajorActivitiesAccomplished - - - • Hardware and Software training conducted • The training in general took about 7 working days • SOSKITW for Windows :- a DRS software package for scanning was introduced • Components of the SOSKITW Software : • SOSGen : - used to generate scanning decodes for completed OMR forms (How marks on forms are interpreted and stored) • SOSInp : - used to scan, validate and export scanned data.
MajorActivitiesAccomplished - - - • Equipment purchased and installed • 10 additional PS900 iM2 DRS Scanners • 16 high capacity PC’s for key-correction • Census data processing work plan prepared • Recruitment of temporary staff • Staff training (scanning technology, CSPro) • Retrieval and organization of completed forms • Scanning and validation • Computer editing and tabulation (For each activity: duration and responsible body are indicated)
MajorActivitiesAccomplished- - - • Census data processing teams organized • Batch header database group • Scanning and validation team • Technical desk heads • Shift supervisors • Two senior programmers responsible for the overall scanning process • Other sub-professional staff assigned • 4 batch headerscanning technicians • 16 data validation workers
MajorActivitiesAccomplished- - - • The scanning room organized • An air conditioner for the scanning room installed • A high capacity automatic generator installed to ensure uninterrupted power supply • Batch Header Database organized • EA Control Forms completed in 2 parts during dispatch • Same EA ID on both parts of the control form • Same Enumerator Number on each part • No. of Households in theEA filled-in • The scannable part detached and scanned in office
CompletedCensus Forms • Completed forms retrieved from the field (about 90,000 EA’s) • Reception and organization of filled-in forms completed • About 33 teams for registering and organizing forms were organized • 3 persons assigned per team • Retrieval of each EA checked and registered • Presence of all form types checked (each EA) • Control forms are also used to check the completeness of EA’s
CompletedCensus Forms- - - • Types of the 2007 Census Forms • Short questionnaires • Long questionnaires • Household Listing Forms • Summary Forms • Community Level Forms • EA Control Forms (Batch Header Forms) • EA ID’s and no. of households filled-in • Unique Enumerator No. assigned • Scanned to create EA Database
BatchControl Form Summary Form
Actual Scanning Process - Census Forms • Organized forms taken from store to the waiting room • Batch Header information printed and associated with its respective EA box • The existence of each EA verified • Checked EAs sent to the scanning room • Scanned forms are finally sent back to the stores • Captured data are validated and key-corrected • Key-correction involved checking and correcting: • Missing marks • Multi-marks • Partial marks
Actual Scanning Process - - - • Scanned and validated data is exported to TEXTformat • Format suitable for computer editing and tabulation • Backup of the scanned / captured data is taken : • on the Database Server • externally, on high capacity tape cartridges HP Ultrium Data Cartridge 400 GB
Actual Scanning Process - - - • All Census forms have been scanned : • The scanning of the 10 sedentary Regions was carried from mid Aug. 2007 to mid Dec 2008 • The scanning for Affar and Somali Regions took about one month including checking (mid Jan - mid Feb 2008) • 44 scanning operators were assigned • 11 scanners used • 2 shifts per day, 7 days per week • Validation and key-correction of the scanned data is done
Census Forms Scanning Process Scanning Key-Correction
Data Cleaning / Computer Editing • Scanned, key-corrected and exported data • Batch Edit Program based on Edit Specs provided by subject matter specialists developed and run on the data. • The software to be used in editing the data is the Census and Survey Processing System (CSPro) • And Batch Edit Application (.bch) is the component of CSPro used to clean the data through editingand imputation processes
Report Generation / Tabulation • Raising factors attached to the edited long questionnaire data • Tabulation programs (in CSPro) are prepared and tested • Tables in accordance with the Tabulation Plan will be produced • Final data will be organized in various formats (ASCII, SPSS) • Final data will be sent to the Central Databank for achieving and dissemination purposes.
Problems Encountered I. Scanning : • A batch might slip through un scanned during data capture • A batch might also be scanned in parts only • Misplacement of scanned forms in wrong boxes • Limited storage space on the scanning machines • Scanners become full– that makes scanning difficult • Scanned images should constantly be moved to the storage server • The location of scanned images on the storage server may sometimes not be found
Problems Encountered - - - II. Key Correction: • Problems in retrieving scanned images for key correction was encountered • Key correction took longer time as it is done manually • The key correction process, as stared earlier, was based on fixing: • Missing marks • Multi-marks • Partial marks
Problems Encountered - - - III. Processing the data : • Large volume of data – takes long time (8 hrs) • Frequent power failure highly affects the processing sessions • The tabulation component of CSPro software sometimes fails unpredictably (It is a newly developed tabulation system)
In summary : • Registration and organization of all completed Census Forms done • The scanning and key correction of the Census questionnaires completed • The scanning of the Household Listing forms is done • Draft Census preliminary results have been produced Additional Comment: • Quick manual review (editing and coding) of the • filled-in forms might be needed prior to the scanning • process