300 likes | 437 Views
Extraction and segmentation of tables from Chinese ink documents based on a matrix model. Zhang Xi-Wen CSE, CUHK and HCI Lab., ISCAS 2005.10.24. Outline. 1 Tables in an ink document. 2 A matrix for an ink document. 3 Ink tables are extracted and segmented. 4 Experimental results.
E N D
Extraction and segmentation of tables from Chinese ink documents based on a matrix model Zhang Xi-Wen CSE, CUHK and HCI Lab., ISCAS 2005.10.24
Outline • 1 Tables in an ink document. • 2 A matrix for an ink document. • 3 Ink tables are extracted and segmented. • 4 Experimental results. • 5 Conclusion.
Ink documents • Ink documents are produced by digital ink capturers. • Many objects are contained in an ink document. • There are many components in an ink table.
1.1 Objects in an ink document • Strokes. • Objects.
Text • Paragraph. • Text-line, Expression. • Character, Word, Symbols.
Graphics • Long. • Parts of tables and flowcharts.
Table • Text (simple). • Graphics. • Bordering lines. • Separating lines.
1.2 Components in an ink table • Strokes. • Row, Column. • Header. • Cell. • Sub-header. • Caption. • Lines.
1.3 Our approach • Previous approaches. • A matrix model.
2 A matrix for a ink document • Components in an ink document are extracted. • An ink document can be modeled be a matrix.
2.1 Ink components • An ink character. • An ink line. • An ink row.
2.2 Extract components in an ink document • Ink characters. • Ink lines. • Ink rows.
2.2 A matrix model • Multiple levels. • Context.
3 Ink tables are extracted and segmented • Extraction. • Segmentation.
3.1 Table extraction • An identical distribution of writing lines. • The same drawing rows (if available) associated.
A seed-table. • The same distribution. • The seed-table grows.
3.2 Table segmentation • Rows. • Columns. • Headers. • Cells.
4.2 performance analyses • Strokes, captions, headers, cells, rows, and columns. • The precision rateand the recall rate.
4.3 performance comparison • Quality. • Quantity.
5 Conclusion • A matrix model for extracting and segmenting ink tables. • More ink tables can be processed. • Extracted ink tables are decomposed.
Thank you very much for your criticism, comments and suggestions! • Email: xwzhang@cse.cuhk.edu.hk • Tel: 3163-4260