210 likes | 437 Views
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture. Richard Lang International Manager. Agenda. OCR Optical Character Recognition ICR Intelligent Character Recognition DFR Dynamic Form Recognition. OCR = optical character recognition. Technology was first invented in 1929
E N D
UN Workshop on Data Capture,Dar es Salaam Session 7Data Capture Richard Lang International Manager
Agenda • OCROptical CharacterRecognition • ICRIntelligent CharacterRecognition • DFRDynamic Form Recognition
OCR = optical character recognition • Technology was first invented in 1929 • Gustav Tauschek obtained a patent on OCR in Germany • Mechanical device that used templates • First commercial system was installed at Readers Digest in 1955 • Years later donated to the Smithsonian Institution • Today • Recognition ofmachine written textis now considered largely a solved problem • Accuracy rates exceed 99%
OCR • Beta Systems well experienced with this recognition engines in Banks • in GermanyOCR A⑁ Chair⑀ Hook ⑂Fork • Austria OCRB+ Plus
ICR Intelligent Character Recognition • The technique is far ahead of OCRbecause of ongoing development of ICR • Handwriting recognitionsystem • Allows different styles of handwritingto be learned by a computerduring / before processingto improve accuracyand recognition rates
ICR Process: • Capturingthe image with Scanners • Processing by (ICR) and/or (OCR) • Segmentationis a very important step • Decision if the homogenous criteria belong to the foreground or to the background • Human editors can do that depending on the context • Compare also computer tomography: according to different results from radio waves reflected from different angels the computer can reconstruct the picture • With the first step only a suitable starting point (sets of pixels) is possible • The increasing process links all closer pixels (computation of valleys and peaks with high degree of confidence)
ICR Process: • Pre-processing • Deskew • Shift, rotate • Stretch
ICR Process: • Enhance • Less / More Contrast • Clean up(de-noise, halftone removal) • to enable the recognition engine to give best results
ICR Process: • Feature extraction • Data reduction
ICR Process: • Classification • A one was written • 90 % = 1 • 8 % = 7 2 % = 4
ICR Algorithm: • Neural Network • Using kNNk-Nearest Neighbour • SVMSupport Vector Machine Minimize simultaneously the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers
ICR Process: • After different classification alternatives the appropriate confidence will be provided • Recognition Limitation only for most probable characterse.g. if only characters 3,6,0 are possible the engine can also be limited to this setand the results are much better • Voting Machine • Usability: • security, • efficiency and • Accuracy
Dynamic Field Recognition • No fixed positionis required • If form is only½ available still ½ readable • No special Formsare required • No timing tracksare necessary on the forms for OMR butresultsare also availablethe same timeno cleaning of LEDs in the scanner necessary • Robust against vertical / horizontal stretching or shrinking(e.g. different printers)
Dynamic Field Recognition • Recognizes: • features(word as pixel cloud) • boxes, • lines and • symbols
Hardware- / Software - Requirement • Hardware • Scanner • PC • Network • Disc Storage only necessary if images are needed for audit purposes • Software • Scan Software • One Recognition and Voting Softwarefor OMR, OCR, ICR, Barcode
ICR Advantages • Better than: • Manual keying • 90 % (plus) correct keysManual = higher substitution ratethan automated recognition • Time consuming • Deliberate manipulation possible • OMR, because OMR is space consuming • OCR, because OCR is machine writtenand therefore of limited use
ICR Advantages • Clear accuracy for OMRbecause of dirt removal by softwaredepending on the mark size and figure • Can detect line and can ignore dirt • Clear result
ICR Advantages • Barcode, • OCR, • OMR, • and ICR Recognition with one Software
ICR Advantages • Pro: • Only rejected characters/fields need correctionRest of the form untouched • With new technologies open for futurefaster, better quality • With standardized correction mode • Handwriting of the corresponding country will be recognized • The previously mentioned advantages do not have to be repeated here again