210 likes | 218 Views
Learn about the latest advancements in Optical Character Recognition (OCR) and Intelligent Character Recognition (ICR) technologies, including OCR history, ICR process, algorithms, advantages, and Dynamic Form Recognition. Discover how these technologies enhance data capture accuracy and efficiency.
E N D
UN Workshop on Data Capture,Bangkok Session 7Data Capture Richard Lang International Manager
Agenda • OCROptical CharacterRecognition • ICRIntelligent CharacterRecognition • DFRDynamic Form Recognition
OCR = optical character recognition • Technology was first invented in 1929 • Gustav Tauschek obtained a patent on OCR in Germany • Mechanical device that used templates • First commercial system was installed at Readers Digest in 1955 • Years later donated to the Smithsonian Institution • Today • Recognition ofmachine written textis now considered largely a solved problem • Accuracy rates exceed 99%
OCR • Beta Systems well experienced with this recognition engines in Banks • in GermanyOCR A⑁ Chair⑀ Hook ⑂Fork • Austria OCRB+ Plus
ICR Intelligent Character Recognition • The technique is far ahead of OCRbecause of ongoing development of ICR • Handwriting recognitionsystem • Allows different styles of handwritingto be learned by a computerduring / before processingto improve accuracyand recognition rates
ICR Process: • Capturingthe image with Scanners • Processing by (ICR) and/or (OCR) • Segmentationis a very important step • Decision if the homogenous criteria belong to the foreground or to the background • Human editors can do that depending on the context • Compare also computer tomography: according to different results from radio waves reflected from different angels the computer can reconstruct the picture • With the first step only a suitable starting point (sets of pixels) is possible • The increasing process links all closer pixels (computation of valleys and peaks with high degree of confidence)
ICR Process: • Pre-processing • Deskew • Shift, rotate • Stretch
ICR Process: • Enhance • Less / More Contrast • Clean up(de-noise, halftone removal) • to enable the recognition engine to give best results
ICR Process: • Feature extraction • Data reduction
ICR Process: • Classification • A one was written • 90 % = 1 • 8 % = 7 2 % = 4
ICR Algorithm: • Neural Network • Using kNNk-Nearest Neighbour • SVMSupport Vector Machine Minimize simultaneously the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers
ICR Process: • After different classification alternatives the appropriate confidence will be provided • Recognition Limitation only for most probable characterse.g. if only characters 3,6,0 are possible the engine can also be limited to this setand the results are much better • Voting Machine • Usability: • security, • efficiency and • Accuracy
Dynamic Field Recognition • No fixed positionis required • If form is only½ available still ½ readable • No special Formsare required • No timing tracksare necessary on the forms for OMR butresultsare also availablethe same timeno cleaning of LEDs in the scanner necessary • Robust against vertical / horizontal stretching or shrinking(e.g. different printers)
Dynamic Field Recognition • Recognizes: • features(word as pixel cloud) • boxes, • lines and • symbols
Hardware- / Software - Requirement • Hardware • Scanner • PC • Network • Disc Storage only necessary if images are needed for audit purposes • Software • Scan Software • One Recognition and Voting Softwarefor OMR, OCR, ICR, Barcode
ICR Advantages • Better than: • Manual keying • 90 % (plus) correct keysManual = higher substitution ratethan automated recognition • Time consuming • Deliberate manipulation possible • OMR, because OMR is space consuming • OCR, because OCR is machine writtenand therefore of limited use
ICR Advantages • Clear accuracy for OMRbecause of dirt removal by softwaredepending on the mark size and figure • Can detect line and can ignore dirt • Clear result
ICR Advantages • Barcode, • OCR, • OMR, • and ICR Recognition with one Software
ICR Advantages • Pro: • Only rejected characters/fields need correctionRest of the form untouched • With new technologies open for futurefaster, better quality • With standardized correction mode • Handwriting of the corresponding country will be recognized • The previously mentioned advantages do not have to be repeated here again