1 / 22

Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing. Stefan Pletschacher; Marcel Eckert; Arved C. Hübler. GEB1150. Digitization of Historical Documents. Alphabet und Font Extraction. Vectorization - Raster to Vector Conversion. 41 hex. encoded text e.g. ASCII.

calvin
Download Presentation

Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing Stefan Pletschacher; Marcel Eckert; Arved C. Hübler

  2. GEB1150 Digitization of Historical Documents Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  3. Alphabet und Font Extraction Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  4. Vectorization - Raster to Vector Conversion 41 hex encoded text e.g. ASCII font assignment OCR RIP bitmap graphic vector font Vectorization Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  5. DIA System und Workflow Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  6. &#xE000 DIA System und Workflow Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  7. XML DIA System und Workflow Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  8. Vectorization Approaches • Contour based • Skeleton based Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  9. Applied Algorithms • Pre-processing - Finding connected components (Region Growing) - Contour extraction (Contour following) • Polygonal Approximation Based on Relaxation - Phase 1: Clustering of polygonal points - Phase 2: Relaxation (Error correction) • Automatic Parameter Control - Rasterization of the resulting glyph images - Ascertaining a weighted error (Ground Truth) - Selecting appropriate vectorization parameters Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  10. Finding Connected Components Ü Ö Ä % “ ! Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  11. Region Growing Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  12. Contour Following white pixel black pixel starting point examination order Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  13. Clustering of Polygonal Points Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  14. Relaxation Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  15. SVG Representation Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  16. Visual Quality Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  17. Formal Quality Measurement - Ground Truth Error function - absolute number of wrong pixels - weighted by the distance to the next true component Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  18. Results Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  19. Adaptive Parameter Control Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  20. Compression rates Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  21. Conclusions • Good vectorization results already with linear primitives • High compression rates can be achieved • Extracted fonts can be easily scaled and further formatted • Known vectorization methods have been extended towards an • adaptive system for automatic parameter control • These methods can be applied for preservation and handling • of unknown type faces in digitized documents • Originals may be re-encoded using a document specific • alphabet and font • Direct integration into XML/SVG based processes possible • Various output formats can be supported by means of • XSL transformations Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

  22. Questions Thank you very much! stefan.pletschacher@mb.tu-chemnitz.de Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006

More Related