220 likes | 337 Views
Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing. Stefan Pletschacher; Marcel Eckert; Arved C. Hübler. GEB1150. Digitization of Historical Documents. Alphabet und Font Extraction. Vectorization - Raster to Vector Conversion. 41 hex. encoded text e.g. ASCII.
E N D
Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing Stefan Pletschacher; Marcel Eckert; Arved C. Hübler
GEB1150 Digitization of Historical Documents Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Alphabet und Font Extraction Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Vectorization - Raster to Vector Conversion 41 hex encoded text e.g. ASCII font assignment OCR RIP bitmap graphic vector font Vectorization Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
DIA System und Workflow Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
 DIA System und Workflow Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
XML DIA System und Workflow Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Vectorization Approaches • Contour based • Skeleton based Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Applied Algorithms • Pre-processing - Finding connected components (Region Growing) - Contour extraction (Contour following) • Polygonal Approximation Based on Relaxation - Phase 1: Clustering of polygonal points - Phase 2: Relaxation (Error correction) • Automatic Parameter Control - Rasterization of the resulting glyph images - Ascertaining a weighted error (Ground Truth) - Selecting appropriate vectorization parameters Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Finding Connected Components Ü Ö Ä % “ ! Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Region Growing Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Contour Following white pixel black pixel starting point examination order Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Clustering of Polygonal Points Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Relaxation Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
SVG Representation Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Visual Quality Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Formal Quality Measurement - Ground Truth Error function - absolute number of wrong pixels - weighted by the distance to the next true component Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Results Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Adaptive Parameter Control Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Compression rates Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Conclusions • Good vectorization results already with linear primitives • High compression rates can be achieved • Extracted fonts can be easily scaled and further formatted • Known vectorization methods have been extended towards an • adaptive system for automatic parameter control • These methods can be applied for preservation and handling • of unknown type faces in digitized documents • Originals may be re-encoded using a document specific • alphabet and font • Direct integration into XML/SVG based processes possible • Various output formats can be supported by means of • XSL transformations Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006
Questions Thank you very much! stefan.pletschacher@mb.tu-chemnitz.de Pletschacher • Vectorization of Glyphs and Their Representation in SVG for XML‑based Processing • ELPUB 2006