1 / 28

OCR at INIS

OCR at INIS. INIS Training Seminar 7-11 October 2013, Vienna, Austria. Branko Krznari ć. INIS Unit. ( ba sed on the presentation b y Yves Reynaud). Outline. What is OCR ? OCR Objectives Principles Techniques Software. What is OCR?. (source: pcmag.com).

malory
Download Presentation

OCR at INIS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OCR at INIS INIS Training Seminar 7-11 October 2013, Vienna, Austria Branko Krznarić INIS Unit (based on the presentation by Yves Reynaud)

  2. Outline • What is OCR? • OCR Objectives • Principles • Techniques • Software INIS Training Seminar 7-11 October 2013, Vienna, Austria

  3. What is OCR? (source: pcmag.com) INIS Training Seminar 7-11 October 2013, Vienna, Austria

  4. Optical Character Recognition (OCR) • OCR is the “conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text.” [1] • Make digitized images of printed documents searchable. • Font encoding issues. INIS Training Seminar 7-11 October 2013, Vienna, Austria

  5. OCR Objectives We can “find the needle in the haystack” • OCR offers a basic search from an unstructured document. • OCR adds an extra valuetoyourimage. • OCR bringstolifeyourdigitizedcollection. INIS Training Seminar 7-11 October 2013, Vienna, Austria

  6. OCR Techniques • Pre-processing • De-skew • Despeckle • Binarization (optional) • Line removal • Layout analysis (zoning) • Post-processing (dictionary) INIS Training Seminar 7-11 October 2013, Vienna, Austria

  7. Scanned vs. Vector Image INIS Training Seminar 7-11 October 2013, Vienna, Austria

  8. “Do not lookatthetrees (letters)trytoseetheforest (sentences)“ F0R 488UR1N6 7H3 L0N63V17Y 0F 1NF0RM4710N, P3RH4P8 7H3 M087 1MP0R74N7 R0L3 1N 7H3 0P3R4710N 0F 4 D16174L 4RCH1V3 18 M4N461N6 7H3 1D3N717Y, 1N736R17Y 4ND QU4L17Y 0F 7H3 4RCH1V38 1783LF 48 4 7RU873D 80URC3 0F 7H3 CUL7UR4L R3C0RD. INIS Training Seminar 7-11 October 2013, Vienna, Austria

  9. Verdana Font FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD. INIS Training Seminar 7-11 October 2013, Vienna, Austria

  10. Brush Script MT (Windows Font) FOR ASSURING THE LONGEVITY OF INFORMATION, PERHAPS THE MOST IMPORTANT ROLE IN THE OPERATION OF A DIGITAL ARCHIVE IS MANAGING THE IDENTITY, INTEGRITY AND QUALITY OF THE ARCHIVES ITSELF AS A TRUSTED SOURCE OF THE CULTURAL RECORD. INIS Training Seminar 7-11 October 2013, Vienna, Austria

  11. PCs≠ Humans • OCR compares patterns and selects the closest match. It can be forced to a specific context, but requires customization. • People adapt to circumstances and can circumvent misspellings if context is clear. INIS Training Seminar 7-11 October 2013, Vienna, Austria

  12. True or false Usually, printed text is adequately sampled if each line is at least two pixels in thickness: INIS Training Seminar 7-11 October 2013, Vienna, Austria

  13. Zoom in INIS Training Seminar 7-11 October 2013, Vienna, Austria

  14. Zoom in INIS Training Seminar 7-11 October 2013, Vienna, Austria

  15. Results from OCR It is in this context that I… … and an additional protocol on the basis… INIS Training Seminar 7-11 October 2013, Vienna, Austria

  16. Chinese Raster Image (scanned) INIS Training Seminar 7-11 October 2013, Vienna, Austria

  17. Chinese Vector Image (OCR) 滤器 INIS Training Seminar 7-11 October 2013, Vienna, Austria

  18. Arabic Raster Image (scanned) INIS Training Seminar 7-11 October 2013, Vienna, Austria

  19. Arabic Vector Image (OCR) هذ ا وشملت INIS Training Seminar 7-11 October 2013, Vienna, Austria

  20. Japanese Raster Image (scanned) INIS Training Seminar 7-11 October 2013, Vienna, Austria

  21. Japanese Vector Image (OCR) INIS Training Seminar 7-11 October 2013, Vienna, Austria

  22. Font Encoding INIS Training Seminar 7-11 October 2013, Vienna, Austria

  23. Font Encoding (cont.) INIS Training Seminar 7-11 October 2013, Vienna, Austria

  24. OCR Software • AbbyyFineReader(multilingual OCR) • Adobe Acrobat • InftyReader INIS Training Seminar 7-11 October 2013, Vienna, Austria

  25. AbbyyFineReader(interface) INIS Training Seminar 7-11 October 2013, Vienna, Austria

  26. InftyReader - an OCR System for Math Documents (12) where a . The indices now range from 1 to 5. The bosonic fields obey the commutation rules (13) INIS Training Seminar 7-11 October 2013, Vienna, Austria

  27. Reference [1] “Optical character recognition” http://en.wikipedia.org/wiki/Optical_character_recognition. Retrieved 2013-09-23. INIS Training Seminar 7-11 October 2013, Vienna, Austria

  28. Thank you! INIS Training Seminar 7-11 October 2013, Vienna, Austria

More Related