Document Image Retrieval using Bag of Visual Words Model

Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad • Advisor : Prof. C.V. Jawahar

Motivation • Large number of printed books are digitized

Motivation • Large number of printed books are digitized • Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. Digital Library Database

Motivation • Large number of printed books are digitized • Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. • Need to design efficient and effective methodology for content level access Digital Library Database

Process Overview Processing Input Query Scanning Index Database Matching Retrieved Documents Documents Matching can be done by two levels : “Text” and “Image”

Matching Approaches • Recognition Based Approach (Text Level Matching) • Optical Character Recognition (OCR) • Recognition Free Approach (Image Level Matching) • Word Spotting

Recognition Based Approach • Optical Character Recognition (OCR) • Binarization of Document • Segmentation using connected components • Line level • Word level • Character level • Character recognition using different features like patch, profile etc • Classification using ANN or SVM

Limitations of Recognition Based Approach • Cuts

Limitations of Recognition Based Approach • Cuts • Merges

Limitations of Recognition Based Approach • Cuts • Merges • Variation in Script

Limitations of Recognition Based Approach • Cuts • Merges • Variation in Script • Variation in Font and Typesetting

Limitations of Recognition Based Approach • Cuts • Merges • Variation in Script • Variation in Font and Typesetting • Underline and Over Written

Recognition Free Approach • Word Spotting • Representation of word image using global (profile) features

Recognition Free Approach • Word Spotting • Representation of word image using global (profile) features • Matching features using different distance measures like L1, L2 etc

Recognition Free Approach • Word Spotting • Representation of word image using global (profile) features • Matching features using different distance measures like L1, L2 etc • Comparison of different size word images using Dynamic time warping (DTW)

Why Recognition Free Approach ? • Robust OCRs are unavailable for many non-Latin languages • These languages have rich heritage and there is a need for content level search • Word Spotting based methods are too slow for real time system • Most of the existing retrieval methods are memory intensive • Scalability is an immediate challenge

Word Image Retrieval using Bag of Visual Words

Bag of Visual Words (BoVW) • Bag of Words (BoW) representation is the most popular representation for text retrieval • BoW based efficient systems like Lucene are publically available • Bag of Visual Words (BoVW) performs excellently for image and video retrieval • BoVW based system is flexible, powerful and scalable to Billions of images

BoVW Representation • Word Images are represented using Histogram of Visual Words

BoVW Representation • Code Book generation • Subset of Images is used • Clustering is done using Hierarchical K-Means (HKM) • HKM is faster than K-Means both in building tree and finding nearest neighbours

BoVW based Representation

BoVW based Representation Histogram of Visual Words

BoVW based Representation Cuts

BoVW based Representation Cuts Histogram of Visual Words

BoVW based Representation Merges

BoVW based Representation Merges Histogram of Visual Words

Proposed Architecture

Advantages of BoVW based Representation • Fixed size representation

Advantages of BoVW based Representation • Fixed size representation Clean Clean

Advantages of BoVW based Representation • Fixed size representation • Robust against degradation

Advantages of BoVW based Representation • Fixed size representation • Robust against degradation Cuts Merge Clean

Advantage of BoVW based Representation • Fixed size representation • Robust against degradation • Scalable to Billions of images

Advantages of BoVW based Representation • Fixed size representation • Robust against degradation • Scalable to Billions of Images • Language independent

Spatial Verification • Lost Geometry

Spatial Verification • Lost Geometry Clean Clean

Spatial Verification • Lost Geometry Clean Clean Clean

Spatial Verification • Lost Geometry • Spatial Verification

Re-ranking • SIFT based re-ranking • Higher the Total Score, better the match

Experimentations • Books Used in Experimentations

Quantitative Results • Performance Statistics

Quantitative Results • mAPVs Query Length

Quantitative Results • mAPVs Query Length • More the # characters, better the results

Quantitative Results Retrieval Time and Index Size

Qualitative Results HI

Qualitative Results

Document Image Retrieval using Bag of Visual Words Model

Document Image Retrieval using Bag of Visual Words Model

Presentation Transcript

Recognition and Retrieval from Document Image Collections

Content-Based Image Retrieval using the Bag-of-Words Concept

Image Retrieval with Geometry-Preserving Visual Phrases

Addressing the Medical Image Annotation Task using visual words representation

Bag-of-Words based Image Classification (week I)

Image Retrieval with Geometry-Preserving Visual Phrases

Automatic Ground Truth Generation of Camera Captured Documents Using Document Image Retrieval

Probabilistic Language-Model Based Document Retrieval

Recap: Bag of Words for Large Scale Retrieval

Bag of Words

“Bag of Words”: recognition using texture

Bag-of-Words based Image Classification

Document retrieval

Bag-of-Words based Image Classification

Bag-of-words models

Document Image Databases and Retrieval

Content-Based Image Retrieval using the Bag-of-Words Concept

Image Retrieval using Neutrosophic Sets

Probabilistic Language-Model Based Document Retrieval

“Bag of Words”: recognition using texture