280 likes | 425 Views
CENSUS DATA ANALYSIS TOOLS, AREAS, ISSUES & NEEDS. Neena Sharma, IAS Director of Census Operations, Uttar Pradesh Office of the Registrar General & Census Commissioner, India. Stages in Census Operations-2011. DATA COLLECTION. Census of India 2011-Data Collection.
E N D
CENSUS DATA ANALYSIS TOOLS, AREAS, ISSUES & NEEDS Neena Sharma, IAS Director of Census Operations, Uttar Pradesh Office of the Registrar General & Census Commissioner, India
Census of India 2011-Data Collection • Census 2011 is the 15th Census of India since 1872 • Census 2011 was held in two phases: • Houselisting & Housing Census (April to September 2010) • Population Enumeration (9th to 28th February 2011) • Reference Date: 0:00 Hours of 1st March 2011 • In Snow Bound areas the Population Enumeration was conducted from 11th to 30th September 2010 • Reference Date: 0:00 Hours of 1st October 2010
Capturing Information and Processing huge volume of Census Data • Indian Census - Always been in the forefront of using latest technology • 1961 Census – Unit Record machines used • 1971 Census – Key-punching (electrical cum mechanical) machines used – An IBM 1401 computer with IBM card Reader used • 1981 Census – Data Entry made using Key to Disk machines. Processing by HP 1000 CD-Cyber 730 & NEC - 1000 Computer System at NIC
Capturing Information and Processing huge volume of Census Data • 1991 Census - Medha 930 Main Frame Computer System used for Data processing. Unix based dumb terminals used for data entry • 2001 Census – First large country to use image based Automatic Form Processing Technology, High Speed Duplex Scanners used for image capturing • 2011 Census – Using more developed ICR Technology with advanced features.
Census Data Processing-17 locations ASCII FILE Export/Archival Exception Completion Tiling Recognition Scanning Prepare Batch
Tile • The unique TILE module optimize data accuracy with a systemized display of characters grouped together to allow easy identification • Possible to identify which characters are correct and which are not and allows to mark as reject. • Makes the completion more accurate
IMAGE BASED FORMSPROCESSING TILING STATION
Data Analysis • Provisional Population Totals for India and States compiled from Enumerator’s Abstract manually declared within about four weeks • Population, 0-6 population, No. of literates • Filled-in Schedules are collected, scanned and processed in two phases – Houselisting & Housing Census and Population Enumeration • Extensive quality check and data validation undertaken • CSPro software used for tabulation • More than 300 tables to be published on Census 2011 at National, State, District levels including Primary Census Abstracts
Administrative Units in India Country State District Sub-district C D Block Panchayat Village Town Village Ward
Census - Not merely a head count Biggest source of comprehensive data with information on • Literacy & Educational Status • Economic activity • Migration & Urbanization • Fertility & Mortality • Disability • Housing • Availability of amenities. • Population • Age • Marital Status • Scheduled Castes • Scheduled Tribes • Mother Tongue & Language • Religion • Village Directory and Town Directory
Issues in Data Analysis • Census creates two separate databases • Houselisting & Housing Census Data (at Household level) (April to September 2010) • Population Enumeration Data (at individual member of the Household level) (February 2011) • In Census 2011, attempt is being made to link these two databases to cross-tabulate information (an issue in the past to be tested now) • Possible to tabulate cross tabs on Condition of Housing with Economic Condition, etc
Issues in Data Analysis • Boundary of the Enumeration Areas (EA) kept unchanged during the two phases of operation • Provision made in the Household Questionnaire (Phase 2 Operation) to record the Household Number marked in the Phase 1 Operation • The EA and HH Numbers to serve as link fields in the two databases
Issues in Data Analysis • Generating time series tables from the previous censuses • As boundary of Enumeration Areas (EAs) are not permanent – it is not possible to link the EA from one census to the next • EAs are carved out on the basis of population size and therefore if the population changes the number of EAs carved out also varies • Consequently, every Census has generated stand-alone databases
Issues in Data Analysis • New districts, sub-districts, towns and villages have been created and has impeded time series analysis • Number of these administrative units have changes significantly over the last three Censuses • An attempt is underway to link the databases available since 1991 Census on jurisdictional changes up to Town and Village levels.
Issues in Data Analysis • In Census 2001, 1%/5% micro-data files on housing census released • India and States (1% data) • States and Districts (5% for large states and 10% for smaller states) • Sample micro-data files from census on population enumeration not released in public domain • Planning to make available micro-data files for research in institutions/universities through work-stations
Needs Linking of files pilot-tested Enhancing capacity of staff members in data processing and analysis unit in SPSS, SAS etc. at national and state levels Organizing jurisdictional changes (redistricting) for trend analysis
Needs • Developing architecture for data warehousing and mining to enable trend and in-depth analysis • Feasibility study to be undertaken • Supportin setting up work-stations for research in micro-data (anonymized) - good practices from other countries