280 likes | 420 Views
Mobile Visual Search. Bernd Girod Stanford University bgirod@stanford.edu. Mobile Visual Search. Mobile Augmented Reality. Outline. Computer vision: “Bag of Words” matching Scalability for large data bases Feature compression Phone-based vs. server-based processing. d x. d y. scale.
E N D
Mobile Visual Search Bernd Girod Stanford University bgirod@stanford.edu
Outline • Computer vision: “Bag of Words” matching • Scalability for large data bases • Feature compression • Phone-based vs. server-based processing
dx dy scale SIFT Descriptor SURF Descriptor y x Computing Visual Words Σdx Σdy Σ|dx| Σ|dy| Σ Σ Σ Σ Σ Σ Σ Σ Color Gray Dxx Σdx Σdy Σ|dx| Σ|dy| Maxima Dxy … … DxxDyy-(0.9Dxy)2 Σdx Σdy Σ|dx| Σ|dy| Dyy Orient along dominant gradient Oriented Patch Gradient Field Filters Blob Response
Pairwise Comparison “Bag of Words” Matching & Geometric Consistency Check
Growing Vocabulary Tree [Nistér and Stewenius, 2006]
Growing Vocabulary Tree [Nistér and Stewenius, 2006]
Growing Vocabulary Tree [Nistér and Stewenius, 2006]
Growing Vocabulary Tree k = 3 [Nistér and Stewenius, 2006]
Growing Vocabulary Tree k = 3 [Nistér and Stewenius, 2006]
Querying Vocabulary Tree Query
Real-time System: Send Image 20 kbps 20 sec Image Wireless Network Information Camera Client Server Feature Extraction VocTreeImage Matching
Real-time System: Send Features Features Wireless Network Information Camera FeatureExtraction Coding Client Server VocTree Image Matching
Advanced Feature Compression • Transform Coding of SIFT/SURF descriptors[Chandrasekhar et al., VCIP 09] • Direct compression of oriented image patch [M. Makar et al., ICASSP 09] • Descriptor designed for compressibility: CHoG[Chandrasekhar et al., CVPR 09] • Tree-Structured Vector QuantizationTree Histogram Coding [Chen et al., DCC 09] • Compression of Location Information[Tsai et al., Mobimedia 09]
Transform Coding of SURF/SIFT Descriptors OrthonormalTransform Quantizer Entropy Coder OriginalDescriptor Client Channel Channel Server Inverse Transform Inverse Quantizer Entropy Decoder DecodedDescriptor [Chandrasekhar et al., VCIP 2009]
Patch Compression Compute Descriptors Compress Descriptors Detect Keypoints Decompress Descriptors Decompress Patches Matching Compress Patches Client Server [M. Makar et al., ICASSP 2009]
Patch CHoG: Compressed Histogram of Gradients Gradient distributions for each bin Gradients dx dx dy dy 011101 Spatial binning 0100101 01101 101101 Histogram compression 0100011 111001 0010011 01100 1010100 CHoGDescriptor
0.46 1/2 0.21 1/4 0.46 0.16 1/8 0.08 0.09 1/16 1/16 0.21 Gradient distribution 0.08 0.16 0.09 CHoG: Histogram Compression Huffman treeapproximatesprobabilities Gradient binning
Rooted binary trees with nleaf nodes Enumerating Huffman Trees
Ground truth data setof matching patches Liberty data set [Winder & Brown CVPR ’09] Feature Matching Performance
Location Histogram Coding Feature Locations (x,y) Spatial Binning Context-based Arithmetic Coding - Refinement Bits Quantize + [Tsai et al., MobiMedia 2009]
Compressed Feature Vector 52 84 1024 1088 59 Size (bits) SIFT Location x,y 1088 bits CHoG Location x,y ~ 84 bits Compressed x,yCHoG ~ 59 bits • [Tsai et al., MobiMedia 2009]
Timing Analysis Nokia N95 330 MHz ARM 64 MB RAM Server Delay Execution Time (sec) Upload Image 40 kByte Server Delay Upload Features 2.2 kByte Extract Features “Send Features” “Send Image”
Timing Analysis Nokia N95 330 MHz ARM 64 MB RAM Execution Time (sec) Server Delay Upload Image 40 kByte Server Delay Upload 2.2 kByte Extract Features “Send Features” “Send Image”
Timing Analysis Nokia N95 330 MHz ARM 64 MB RAM Execution Time (sec) Server Delay Server Delay Extract Features “Send Features” “Send Image”
Concluding Remarks • Mobile Visual Search is ready for prime-time: • Bag of Words approach: > 95% recall for > 1M database • Vocabulary trees for fast retrieval (1 sec for > 1M database) • Feature compression helps: • Sending a JPEG image over wireless link can be rather slow send salient image features to server • Small database: send compressed database features to phone • CHoG outperforms SIFT and SURF in rate-constrained recognition performance: ~60 bits incl. location • Feature extraction ~1 sec on 330 MHz smartphone
Acknowledgments Radek Grzeszczuk David Chen Vijay Chandrasekhar YuriyReznik Jatinder Singh Gabriel Takacs Wei-Chao Chen Natasha Gelfand Yingen Xiong Kari Pulli Sam Tsai Rahul Swaminathan Thanos Bismpigiannis Mina Makar