180 likes | 367 Views
Imaged Document Text Retrieval without OCR. IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒. Outline. Introduction HTD and VTD Class of Character Objects Similarity Measure of Documents Experimental Results Conclusions. Introduction. Retrieval of Imaged Documents
E N D
Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒
Outline • Introduction • HTD and VTD • Class of Character Objects • Similarity Measure of Documents • Experimental Results • Conclusions
Introduction • Retrieval of Imaged Documents • Process with OCR v.s. without OCR • Language dependence v.s. language independence
Procedure • Image Preprocessing • Feature extraction of character objects • Horizontal Traverse Density (HTD) • Vertical Traverse Density (VTD) • Clustering • To Identify classes of character objects • Document representation • Hash Table • N-Gram • To construct indexes for imaged document retrieval
Class of Character Objects • Unsupervise Clustering with HTD and VTD • Distance measure of character objects
Similarity Measure of Documents • N-Gram Algorithm • Cosine angle between two documents
Corpus • UW1 database (600 dpi)
Experimental Results • Corpus I • E01-E26
Experimental Results • Corpus II
Conclusion and Future Work • A new method for image document retrieval without OCR • Retrieval of language independence • Improvement of robustness for different fonts and noisy documents