1.12k likes | 1.35k Views
Document Image Retrieval using Bag of Visual Words Model. Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar. Motivation. Large number of printed books are digitized. Motivation. Large number of printed books are digitized
E N D
Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad • Advisor : Prof. C.V. Jawahar
Motivation • Large number of printed books are digitized
Motivation • Large number of printed books are digitized • Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. Digital Library Database
Motivation • Large number of printed books are digitized • Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. • Need to design efficient and effective methodology for content level access Digital Library Database
Process Overview Processing Input Query Scanning Index Database Matching Retrieved Documents Documents Matching can be done by two levels : “Text” and “Image”
Matching Approaches • Recognition Based Approach (Text Level Matching) • Optical Character Recognition (OCR) • Recognition Free Approach (Image Level Matching) • Word Spotting
Recognition Based Approach • Optical Character Recognition (OCR) • Binarization of Document • Segmentation using connected components • Line level • Word level • Character level • Character recognition using different features like patch, profile etc • Classification using ANN or SVM
Limitations of Recognition Based Approach • Cuts • Merges
Limitations of Recognition Based Approach • Cuts • Merges • Variation in Script
Limitations of Recognition Based Approach • Cuts • Merges • Variation in Script • Variation in Font and Typesetting
Limitations of Recognition Based Approach • Cuts • Merges • Variation in Script • Variation in Font and Typesetting • Underline and Over Written
Recognition Free Approach • Word Spotting • Representation of word image using global (profile) features
Recognition Free Approach • Word Spotting • Representation of word image using global (profile) features • Matching features using different distance measures like L1, L2 etc
Recognition Free Approach • Word Spotting • Representation of word image using global (profile) features • Matching features using different distance measures like L1, L2 etc • Comparison of different size word images using Dynamic time warping (DTW)
Why Recognition Free Approach ? • Robust OCRs are unavailable for many non-Latin languages • These languages have rich heritage and there is a need for content level search • Word Spotting based methods are too slow for real time system • Most of the existing retrieval methods are memory intensive • Scalability is an immediate challenge
Bag of Visual Words (BoVW) • Bag of Words (BoW) representation is the most popular representation for text retrieval • BoW based efficient systems like Lucene are publically available • Bag of Visual Words (BoVW) performs excellently for image and video retrieval • BoVW based system is flexible, powerful and scalable to Billions of images
BoVW Representation • Word Images are represented using Histogram of Visual Words
BoVW Representation • Code Book generation • Subset of Images is used • Clustering is done using Hierarchical K-Means (HKM) • HKM is faster than K-Means both in building tree and finding nearest neighbours
BoVW based Representation Histogram of Visual Words
BoVW based Representation Cuts Histogram of Visual Words
BoVW based Representation Merges
BoVW based Representation Merges Histogram of Visual Words
Advantages of BoVW based Representation • Fixed size representation
Advantages of BoVW based Representation • Fixed size representation Clean Clean
Advantages of BoVW based Representation • Fixed size representation • Robust against degradation
Advantages of BoVW based Representation • Fixed size representation • Robust against degradation Cuts Merge Clean
Advantage of BoVW based Representation • Fixed size representation • Robust against degradation • Scalable to Billions of images
Advantages of BoVW based Representation • Fixed size representation • Robust against degradation • Scalable to Billions of Images • Language independent
Spatial Verification • Lost Geometry
Spatial Verification • Lost Geometry Clean Clean
Spatial Verification • Lost Geometry Clean Clean Clean
Spatial Verification • Lost Geometry Clean Clean Clean
Spatial Verification • Lost Geometry • Spatial Verification
Spatial Verification • Lost Geometry • Spatial Verification
Spatial Verification • Lost Geometry • Spatial Verification
Re-ranking • SIFT based re-ranking • Higher the Total Score, better the match
Experimentations • Books Used in Experimentations
Quantitative Results • Performance Statistics
Quantitative Results • Performance Statistics
Quantitative Results • mAPVs Query Length
Quantitative Results • mAPVs Query Length • More the # characters, better the results
Quantitative Results Retrieval Time and Index Size