180 likes | 321 Views
OCR at INIS Database Production & Imaging Group Yves Reynaud Y.Reynaud -Pulido @ iaea.org. INIS Training Seminar 14-16 November 2011, Vienna, Austria. Some OCR features. We can find the needle in the haystack OCR offers a basic search from an unstructured document .
E N D
OCR at INISDatabaseProduction & ImagingGroupYves ReynaudY.Reynaud-Pulido@iaea.org INIS Training Seminar 14-16 November 2011, Vienna, Austria
Some OCR features We can find the needle in the haystack • OCR offers a basic search from an unstructured document. • OCR bringstolifeyourdigitilazedcollection. • OCR adds an extra valuetoyourimage. INIS Training Seminar 14-16 November 2011, Vienna, Austria
OCR is a computer technology software that • Translate images handwritten or typewritten text into machine-editable text. • Translate pictures of characters into a standard encoding scheme representing them (e.g. ASCII or Unicode). INIS Training Seminar 14-16 November 2011, Vienna, Austria
Scanned Image (paper or micrographic) • Vector Image (created from native application) here a raster image for sake of comparison INIS Training Seminar 14-16 November 2011, Vienna, Austria
“Do not see the trees (letters)try to see the forest (sentences)“ F0R 488UR1N6 7H3 L0N63V17Y 0F 1NF0RM4710N, P3RH4P8 7H3 M087 1MP0R74N7 R0L3 1N 7H3 0P3R4710N 0F 4 D16174L 4RCH1V3 18 M4N461N6 7H3 1D3N717Y, 1N736R17Y 4ND QU4L17Y 0F 7H3 4RCH1V38 1783LF 48 4 7RU873D 80URC3 0F 7H3 CUL7UR4L R3C0RD. INIS Training Seminar 14-16 November 2011, Vienna, Austria
Verdana FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD. INIS Training Seminar 14-16 November 2011, Vienna, Austria
Brush Script MT (Windows Font) FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD. INIS Training Seminar 14-16 November 2011, Vienna, Austria
PCs≠ Humans • OCR compares patterns and selects closer match, it can be forced to a specific context but requires customization. • People adapt to circumstances and can circumvent misspellings if context is clear. INIS Training Seminar 14-16 November 2011, Vienna, Austria
True or false Usually, an image is adequately sampled if each letter is at least two pixels in thickness: INIS Training Seminar 14-16 November 2011, Vienna, Austria
Zoom in INIS Training Seminar 14-16 November 2011, Vienna, Austria
Zoom in INIS Training Seminar 14-16 November 2011, Vienna, Austria
Results from OCR It is in this context that I… … and an additional protocol on the basis… INIS Training Seminar 14-16 November 2011, Vienna, Austria
Chinese in pixels INIS Training Seminar 14-16 November 2011, Vienna, Austria
Chinese vector images from OCR 滤器 INIS Training Seminar 14-16 November 2011, Vienna, Austria
Arabic in pixels INIS Training Seminar 14-16 November 2011, Vienna, Austria
Arabic vector images from OCR هذ ا وشملت INIS Training Seminar 14-16 November 2011, Vienna, Austria
InftyReader - an OCR System for Math Documents (12) where a . The indices now range from 1 to 5. The bosonic fields obey the commutation rules (13) INIS Training Seminar 14-16 November 2011, Vienna, Austria
Thank you INIS Training Seminar 14-16 November 2011, Vienna, Austria