90 likes | 105 Views
Explore the critical questions and considerations for high-speed data processing to capture census data accurately. Learn about form design factors, preventing data loss strategies, and the importance of Optical Character Recognition (OCR) and Optical Mark Recognition (OMR) in achieving quality data.
E N D
U.S. Census Bureau Decennial Response Integration System (DRIS) Presenter: Tracy Wessler June 5, 2007 The Use of High Speed Data Processingto Capture Census Data
Often Overlooked Critical Success Questions • Is the Form design compatible with the data capture system design? • How does the system prevent loss of data? • Is the optical mark recognition capable of picking the intended answer correctly in complex situations? • Is it appreciated that use of Optical Character Recognition effectively requires a significant investment in tuning and testing? • Does the system have adequate quality assurance and quality control.
Form Design Considerations • Chief cause of automation failures • Respondent confusion sets stage for data capture errors • Respondent Friendly vs. machine friendly design. • Significantly impact data capture accuracy • Forms are considered as inputs to the data capture system, including considerations of variability contributed by respondents.
Forms Design Success Factorsfor Census • Established representative team of knowledge people in areas of content, layout, printing and mailing considerations, and data capture considerations. • Acquisition required vendors to demonstrate a thorough understanding of the complexities, interactions, and tradeoffs. • Technologically superior systems are capable of processing forms optimized for the respondent vs. forms optimized for the computer.
Preventing Data Loss • How does the system control inventory? • Bar Code tracking • Detecting Double Feeds during scanning • Forms Check Out Process Established • Data acknowledgement (receipt for delivered data)
OMR Considerations • Defined as capture of data from multiple choice boxes • Placed emphasis on what we call Optical Answer Recognition. Census wants to know the answer the respondent was trying to communicate, and not just which boxes contained some sort of a mark. • Optical Answer Recognition is a specialized form of OMR – Many OMR products do not do Optical Answer Recognition
Optical Character Recognition Considerations • Beware of exaggerated vendor claims for data accuracy • Census believes to obtain both a high percentage of work captured by OCR (80% or higher accept rate) and a high accuracy rate (99% or higher measured at the field level) requires a significant upfront development investment
OCR Considerations Continued • For Census, return on investment is significant for most forms due to extremely large census volumes. • For example, prior to Census 2000 and the use of OCR, it was not possible to capture full names on all forms. Name capture was critical to resolve a large number of duplicates experienced.
Quality Data • Census experience is that many commercial applications lack adequate quality assurance and quality control. • Significant focus by Census - Quality is engineered into all process • Testing is not enough to ensure quality data – the most rigorous testing cannot completely simulate the live Census environment. • Data quality is measured during live processing so that errors can be detected and corrected