160 likes | 461 Views
Discovering Collocation Patterns: from Visual Words to Visual Phrases. Junsong Yuan, Ying Wu and Ming Yang CVPR’07. Discovering Visual Collocation. An exciting idea: detour.
E N D
Discovering Collocation Patterns:from Visual Words to Visual Phrases Junsong Yuan, Ying Wu and Ming Yang CVPR’07
An exciting idea: detour • Related Work: J. Sivic et al. CVPR04, B. C. Russellet al. CVPR06, G. Wang et al. CVPR06, T. Quack et al. CIVR06, S. C. Zhu et al. IJCV05, …
Confrontation • Spatial characteristics of images • over-counting co-occurrence frequency • Uncertainty in visual patterns • Continuous visual feature quantized word • Visual synonym and polysemy
Selecting visual phrases • Visual collocations may occur by chance • Selecting phrases by a likelihood ratio test: • H0: occurrence of phrase P is randomly generated • H1: phrase P is generated by a hidden pattern • Prior: • Likelihood: • Check if words are co-located together by chance or statistically meaningful
Discovery of visual phrases Closed FIM Frequent Word-sets ( |P|>=2 ) AB AE CD CE DE AF BE BF CDE ABF ABE pair-wise student t-test ranked by L(P) Group Database likelihood ratio AB 15.7 14.3 12.2 10.9 9.7 AF Visual Phrase Lexicon (VPL) ABF BF CD
Frequent Itemset Mining (FIM) • If an itemset is frequent then all of its subsets must also be frequent
Phrase Summarization • Measuring the similarity between visual phrases by KL-divergence Yan et al., SIGKDD 05 • Clustering visual phrases by Normalized-cut
Pattern Summarization Results Face database: summarizing top-10 phrases into 6 semantic phrase patterns Car database: summarizing top-10 phrases into 2 semantic phrase patterns
Partition of visual word lexicon • Metric learning method: • Neighborhood component analysis (NCA). Goldberger, et al., NIPS05 • improve the leave-one-out performance of the nearest neighbor classifier
Evaluation • K-NN spatial group: K=5 • Two image category database: car (123 images) and face (435 images) • Precision of visual phrase lexicon • the percentage of visual phrases Pi ∈ Ψ that are located in the foreground object • Precision of background word lexicon • the percentage of background words Wi ∈ Ω−that are located in the background • Percentage of images that are retrieved:
Results: visual phrases from car category Visual phrase pattern 1: wheels different colors represent different semantic meanings Visual phrase pattern 2: car bodies