430 likes | 650 Views
December 2007. The Census Challenge. Data Capture for Census Projects. “Counted” by eFLOW world wide. 1,374,026,304. TIS’s Experience in Census Projects. Largest market share worldwide in census project s information capture. Governmental projects Australian Department of Defense
E N D
December 2007 The Census Challenge Data Capture for Census Projects
“Counted” by eFLOW world wide 1,374,026,304
TIS’s Experience in Census Projects Largest market share worldwide in census projects information capture
Governmental projects Australian Department of Defense Brazilian Department of Statistics Chilean Social Security Office US Social Security Administration Turkish Ministry of Finance Argentinean National Institute of Statistics Population Census worldwide: India, Italy, South Africa, Brazil, Ireland ,Kenya,… And from other segments IBM T-Mobile BP-British Petroleum 3M Europe AQA Examination Board U.K Comicrom Service Bureau BCBG Vermont BKK And few more….
eFlow Application Forms Export Image files What we supply?eFLOWUnified Content Platform Office Docs Email Web Pages PDFs Text files
eFlow CensusData Eflow platform in Censuses Paper File Census Images
Traditional Data Capture Back-Office Mail Room Scanning Data Entry End Users Document prep Sorting Manual Key from image
Intelligent Document Capture Back-Office Mail Room Scanning Data Entry End Users Document prep No sorting Reduce manual data entry by 40-70% Increase accuracy and consistency
TIS’s Experience in Census Projects • India 2002 (912 million A4 images in 18 months) • Italy 2002 (800 million forms in 180 days) • Brazil 2000 (333 million forms in 100 days) • South Africa 2001 (144 million forms in 130 days) • Ireland 2001 (40 million forms in 120 days) • Germany (DP) 1999 (36 million Forms in 30 days) • Cyprus 2002 (5.2 million forms in 80 days) • Turkey 1997, 2000 (18 million forms in 30 days) • Kenya 2000 (16 million forms in 80 days) • Slovak Republic 2001 (16 million forms in 80 days) • Hong Kong 2001 (10 million forms in 50 days) • Irish Census 2006 (50 million forms in 100 days)
TIS Main Advantages in the Census Arena • Largest market share worldwide in the processing of census projects • Extensive experience in the design, development and implementation of data capture for census projects • Proven solution in census data capture • Data capture platform (paper, electronic, mobile) and not a recognition product • Successful cooperation with local partners in providing census solutions (knowledge transfer, co-implementation, support) • Coding tasks & data validations performed on the data capture platform : a ‘cost-effective’ solution
The evolution of data capture in census projects eFLOW From OCR into IDR Solution
The evolution of data capture in census projects Key From Paper Key From Image • Manual data entry (key from paper) • Slow • High error rate in the data entry process • Recruitment, training and management of personnel • key from Image: • Archive • Approx 30-40% faster than key from paper
The evolution of data capture in census projects OMR OMR (hardware readers for checkbox) • Requires specially printed forms and special scanners • Cannot handle handwritten/printed data • Forms are not user-friendly • Cannot handle double-sided forms • OMR requires more answers => more space => increased paper expenditures => more handling and printing costs • Not flexible, difficult to adjust to other applications once census is over • No possibility to add business rules: computation, validations, coding
The evolution of data capture in census projects Automated data capture Requires less human intervention, enables to complete the census data capture much faster (less space, less salaries, less hardware) Ensures data integrity – enables the use of automatic AND manual: online validations, exception handling, coding The most advanced and proven technology today, recommended by the UN and used by most countries for census projects Full flexibility in the type of data gathered (checkbox, handwritten, alpha and numeric, barcode…) Provides all capabilities of the OMR and plus much more Creates a correlation between the image and the actual form Remote capabilities enable all forms to be scanned locally and then sent to a central site for processing Automated Data Capture eFLOW
Intelligent Data Capture The evolution of data capture in census projects Intelligent data recognition (IDR) • Automated data capture + • Smart - automatic classification for documents • Smart understands and differentiates between various types of documents and languages and Based on state-of-the-art Machine Learning algorithms • Freedom • artificial intelligence algorithms which provides enough information for the system to find the location of the fields on its own
Census-Specific Issues (common issues) and how TIS’s answers it • Peak volume challenge • Long term project • Data integrity • Capture of form identification • Data validation procedures • Automated recognition • Voting algorithms • Data Storage • Image Storage • Personal Data Confidentiality • Statistical Coding
Peak Volume Challenge • The Challenge : • Process very high volumes of forms in a pre-defined period • The Goal : • To successfully gather population data, while meeting a planned schedule and budget • Proposed Solution : • Utilize a data capture platform approach and not “Character Capture” approach. • Optimal combination of technological and operational solutions • Utilize a data capture platform for coding and Edits. • Reducing risks by using an ‘Off the shelf’ product - Extensive experience in similar projects. • On-line operation control tools
On-line operation control tools Eflow’s Controller work station
A Long-Term Project • Challenge No. 1: • Rapid changes of technology • The Goal : • Utilize new technologies in actual census • Proposed Solution : • Open system (recognition engines, connectivity) • Continuously developed product • Census-focused company.
A Long-Term Project (cont.) • Challenge No. 2: • Post census usage of the data capture system. • The Goal : • Utilize the system for ongoing data capture • Proposed Solution : • Business: Outsourcing/ Renting/ Purchase • Technical: • Break down the system into a few smaller scale independent systems (Scalable system, Flexibility of software and hardware infrastructure). • Powerful set-up utility enables to later use the system for other on-going projects (statistical surveys; governmental service bureau)
Statistical Coding & Editing • The Challenge: • ‘Bottlenecks’ occurs due to insufficient number of statistical experts and\or due to inefficient procedures. • The goal : • Maintain general ‘throughput’ of the system, by avoiding pre- and post-data capture coding. • Proposed Solution : • Using automated recognition and\or ‘key-from-image’ : • Computer-assisted coding as part of the data capture system. • The Code & Edit tasks performed on the data capture platform - a ‘cost-effective’ solution.
ICR & Look-up table Computer Assisted Coding by statistical experts as part of the data capture system (2nd level repair).
Data Storage • TheChallenge : • The need for large volume data and images storage. • The Goal : • Optimization of resources (network, storage facilities). • Proposed Solution : • Using TiS’ unique “Form Out” module; • Reduce network traffic • Reduce storage media • No need for dropout ink (saves printing costs).
Uncompressed census form (200 dpi) occupies 950 Kb; Compressed with CCITT Group 4 reduced it to 100 Kb; FormOut! reduced the same form to only 6 Kb!
ROI Original TIFF EFI DIF How do we do it?
Personal Data Confidentiality (Security) • TheChallenge : • Avoid the exposure of personal information. • The Goal : • Minimize image and data exposure in data capture system by complete access control. • Proposed Solution : Multi level access control: • Overall system\ segment level - set amount of workstations. • User level - personal log-in and permissions for each user. • Computer screens - anonymous images in ‘field mode’. • On-line centralized security control (“Controller’).
Data Validation Procedures • TheChallenge : • Substitution errors (“computer mistakes”) occur. • The Goal : • Eliminating substitution errors and handling invalid responses during the data capture stage - to quicken results release. • Proposed Solution : • Limiting the possible answer; i.e. look-up tables, dictionaries, dates, single OMR response, set numeric range….. • Use of multiple recognition engines - “voting”. • Multi level comparisons - field level; form level; batch level. • Logical validations – automatic + manual
JUSTICR ABBYY KADMOS RICOH OCE INLITE EXPERVISION PARASCRIPT A2IA TIS OCR/ICR Engines
ICR B ICR A ICR C *oshua Jo*hu* J*sh*a VotingMethod Joshua Virtual Engine Example
Form Design • TheChallenge : • System efficiency – throughput. • The Goal : • Increase the recognition results • Proposed Solution: • Recommended guidelines (paper developed by TIS): • Considering the need to restrict the optional answers to a limited number of desired possibilities. • Choosing between: • Mark response (Check box). • Numeric response. • Alphabet response. • Combination of the above.
Data Types • OCR – Optical Character Recognition (Machine Print and barcodes) • ICR – Intelligent Character Recognition (Handwriting) • OMR – Optical Marking Recognition (Checkboxes)
Data Types OCR ICR OMR
Why choose TiS? • Extensive experience in real census and other high volume form processing projects - Largest market share worldwide in the processing of census projects • Data capture platform (paper, electronic, mobile) and not a recognition product • Successful cooperation with local partners in providing census solutions (knowledge transfer, co-implementation, support) • Max. flexibility & redundancy - ensures meeting timetable to release census results. • Financially stable company – NASDAQ since 1996