570 likes | 760 Views
DATA CAPTURE IN CENSUS OF INDIA. Registrar General & Census Commissioner, India Visit Our Website at www.censusindia.gov.in. FEATURES OF INDIAN CENSUS.
E N D
DATA CAPTURE IN CENSUS OF INDIA Registrar General & Census Commissioner, India Visit Our Website at www.censusindia.gov.in
FEATURES OF INDIAN CENSUS • India – a large country with more than a billion population Censuses is then one of the world largest administrative and statistical exercise • Diversity in languages – Schedules filled in 16 languages • 2 million enumerators deployed in 2001 Census – likely to increase further in 2011 census.
FEATURES OF INDIAN CENSUS (Contd..) Census which is conducted using ‘canvasser’ method is in two phases: House-listing Population Enumeration Census Organization has experimented with new IT innovations since the beginning Technology is required particularly for data capture/processing – mainly due to large volume and for speedier tabulation & release of Census results
DATA CAPTURE & PROCESSING IN 2001 CENSUS Important Considerations • Conventional data entry not suitable for large volume (228 million schedules for 102.8 million population) of data. • Availability of advanced IT tools and techniques. • Capture and process all the collected information. • Complexities in data entry due to multiplicity of languages/responses and size (A3) Census Schedule.
DATA CAPTURE & PROCESSING IN 2001 CENSUS Important Considerations (Contd..) • Retrieval of original documents for correction labor – intensive. • Reduce the time span from 5-8 years to 3-5 years. • Compact , reliable and efficient archival system. • Better workflow management.
DATA CAPTURE & PROCESSING IN 2001 CENSUS Selection and Consequent Action • Evaluation of various available technologies (OMR/OCR/ICR). • Trial run with NCS and DRS OMR. • Trial Run with various ICR vendors. • Opted for ICR technology(TIS eFlow) • IT Infrastructure in all the 15 Data Centers upgraded to meet the new requirement.
DATA CAPTURE & PROCESSING IN 2001 CENSUS Model Conceived for implementation Services of System Integrator hired to guide and assist in the implementation of ICR technology. An unique model for Outsourcing SI to work in our premises for better communication and control maintain data security, safety and confidentiality Capacity building (Training and guiding to IT staff) Production Linked payment to SI
DATA CAPTURE & PROCESSING IN 2001 CENSUS Work Flow of ORGI (TIS Eflow characteristic) Design data capture workflow Presents a graphical view of the system Monitors the processing and workflow in real time Enables to customize applications and add custom features
DATA CAPTURE & PROCESSING IN 2001 CENSUS Work flow Modules Scan Portal, File Portal, Controller FormID, Manual FormID RC Processing [OCR/ICR] Tile, Completion, CAC & Exception Export
DATA CAPTURE & PROCESSING IN 2001 CENSUSORGI Workflow Stages Server ASCII FILE Export/Archival Exception Completion Tiling Recognition Scanning Prepare Batch
DATA CAPTURE & PROCESSING IN 2001 CENSUS LANSETUP - ORGI DATA CENTERs Scanning station Export station Controllerstation Supervisor Export completed batches as ASCII file for further processing Supervisor Monitor the workflow & Balance the load at different stages of operation Forms are fed thru SCANNER(S) batch by batch Supervisors Handle Exceptional cases referred by Operators Form IMAGES stored in Network DISK Recognition stations Server Exception stations Tile/Correction station - Un-recognised Characters are corrected by OPERATORS Field by field character images are automatically RECOGNISED Tiling & Completion stations
DATA CAPTURE & PROCESSING IN 2001 CENSUS eFlow customization • customization of Scanning software for Batching the images • optimization of Batch Size for Network movement of images and data • Customization of workflow management to reduce the workload on Manual Identification station
DATA CAPTURE & PROCESSING IN 2001 CENSUS eFlow customization (Contd..) Development of new Management Information tools for operators and daily production status etc creation of JUSTICR.mdb to recognize the Indian enumerators writing patterns Creation and implementation of various static and Dynamic Dictionaries for CAC
DATA CAPTURE & PROCESSING IN 2001 CENSUS • Results Achieved • First time 100% data captured, processed and released within five year of Census • Auto Recognition Rate 90% & false positive < 2% • Considerable financial saving • Assimilation of IT skills internally in the organisation.
DATA CAPTURE & PROCESSING IN 2001 CENSUS • Results Achieved (Contd..) • Manual Coding was replaced by Computer Assisted Coding • Schedule Caste/ Schedule Tribe • Languages spoken, Education level • Migration particulars, NIC and NCO • Indigenous data capture for other projects • Economic Census • Sample Registration System • Verbal Autopsy
DATA CAPTURE & PROCESSING IN 2001 CENSUSDifficulties Experienced • Unable to use color drop-out at scanning stage • Difficult to handle bad images during scanning stages. • Bad/Back Images due to variation in paper/print quality • Over writing/use of whitener, grid line recognize as 1 • Limitation of recognizing Indian languages affected the through put
DATA CAPTURE & PROCESSING IN 2001 CENSUSDifficulties Experienced (Contd..) Operational Constraints in Manual Identification No powerful tools for online Load balancing among various stages of eflow Lack of concurrent quality check at each stage of eflow Lack of Auto coding features for textual responses Even Single image non recognition leads to redo whole batch
LESSONS LEARNT FOR FUTURE • Outsourcing in controlled environment beneficial and cost-effective • Good quality of paper • ICR friendly Form Design • Use of Bar Code for better work flow and Inventory management • Good quality printing
LESSONS LEARNT FOR FUTURE • (Contd..) • Special training to enumerators for filling the forms • For CAC, use knowledge Based dictionaries to increase throughput • Use of concurrent quality check procedures on the line of USA and UK
DATA CAPTURE & PROCESSINGTechnology for 2011 Census • Continuation of ICR Technology • International and national experience shows as on date no better substitute for scanning & ICR technology • Expertise and competence gained in using ICR technology available in the organization
DATA CAPTURE & PROCESSINGTechnology for 2011 Census (contd..) Use more efficient scanners having facility for image enhancement, noise removal, color drop-out, better throughput and on-spot detection and correction (through in-built software) of bad images to be used. Use of improved version of ICR software with better recognition and built-in enhanced workflow management capability. Use new features in Auto/Computer Assisted Coding in ICR software
Thank you. Visit Our Website at www.censusindia.gov.in
Steps involved in e-Flow Process • Intelligent Character Recognition (ICR) Technology is used to extract the handwritten/machine printed (typeset) character(s) from the scanned images to generate the computer processable data file. In brief, following steps are involved in using ICR technology. • Scanning:- Paper based forms are scanned to create bit map image file • File Portal::- It is an Image File Registration module in eflow as an input to next activity. • Form Identification:- Automatically identifies the Images of various schedules based on the Empty Form Image (EFI) template created during the designing stage.
Steps involved in e-Flow Process • Manual Identification: Unidentified forms due to bad images are matched by the operator manually on computer with the help of EFIs . • Processing: This module is heart and brain of the ICR technology. It automatically recognize the data (numerals/alpha) from the images with the help of various engines (CGK, AEG,KADMOS,TISICR etc) • Tile: This module displays the images of similar digit at one place to identify any wrongly recognized character by system for correction and thus, enhances the accuracy and quality of data.
STEPS INVOLVED IN eFLOW PROCESS • Completion:- Unrecognized or wrongly marked recognized characters in the Tiling will be presented for correction using images displayed simultaneously. • Exception:- If any character image is not understood by operator at completion station (module), that will be corrected in Exception station by an officer competent to make decision. • Export:- System exports the data generated in above steps to server for further processing like editing/aggregation/tabulation etc.
EXAMPLE – USE OF WHITENER Casual writingpattern
VOTING IN PROCESSING ICR1 ICR 2 ICR 3 ICR 4 3 3 8 3 Majority = 3 Unanimous = ?
COMPLETION STATION [Field mode display]
EXCEPTION STATION Form Field Date Original Form Image Viewer Exception Area
HOUSEHOLD SCHEDULE- SIDE A Religion Name of SC/ST Mother Tongue & Other languages Education
HOUSEHOLD SCHEDULE- SIDE B NCO NCO NIC Place of Birth & Last residence
DATA CAPTURE & PROCESSING Selection of technology OMR/OCR / ICR in 2001 • Recognition of hand written descriptive entries in different languages is beyond the capabilities of the known ICR SW and hence a conscious decision was taken to go in for the recognition of Only Numeric Characters, leaving the rest to be handled thru Image enabled computer assisted coding (CAC) . Following key features were introduced in the data capture solution. • Parameters for selecting the ICR Software • Highest recognition rate and lowest percentage of false positive with customization and assured support & Training • Facility of organized workflow in LAN environment with centralized controls with Computer Assisted Coding facility. • In built quality enhancement tools to trap the wrongly recognized characters so as to facilitate corrective action. • Use of multiple engines with voting algorithm.Ability to incorporate validation rules to trap inconsistent entries/wrong recognition. Learning capabilities of engines.
DATA CAPTURE & PROCESSING • Parameters for selecting the scanner • Speed to match with our volume • Duty cycle (life and production tolerance) • Must be duplex scanning • Resolution minimum to 200dpi • Image enhancement facility like noise removing, skewing, cropping, contrast • Hopper size and scanning path(U,J or flat belt) • Maintenance & Training services