1 / 16

Local Visual Words Coding for Low Bit Rate Mobile Visual Search

Local Visual Words Coding for Low Bit Rate Mobile Visual Search. Tao Mei Media Computing Group, MSR Asia. Motivation. Search for similar or near-duplicate images with a query image captured by a mobile device Potential Usage: Location Recognition Product Search Landmark retrieval …

parker
Download Presentation

Local Visual Words Coding for Low Bit Rate Mobile Visual Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Local Visual Words Coding for Low Bit Rate Mobile Visual Search Tao Mei Media Computing Group, MSR Asia

  2. Motivation • Search for similar or near-duplicate images with a query image captured by a mobile device • Potential Usage: • Location Recognition • Product Search • Landmark retrieval • … • Existing applications: • Google Goggles • Bing Mobile App • Nokia Point and Find • Kooaba • Ricoh iCandy • Amazon Snaptell

  3. Mobile Visual Search Scheme 2 Scheme 1 Transmit compact: Similar accuracy Lower system latency Not compact enough Transmit query image: Best accuracy Large system latency Large energy consumption

  4. Challenges • Variant conditions & Viewpoint change • Insufficient invariance of visual features • Limited energy on mobile devices • Bad network connection

  5. Approach——Scheme 3

  6. Visual Words Representation • Suppose: • leaf nodes in the vocabulary tree • descriptors extracted from query image • is quantized to leaf node , we can obtain the corresponding visual word: where: : frequency that is visited in total : orientation of , equals when • Visual Words:

  7. Visual Words Representation Given: For each descriptor , we can obtain the correspondent visual word If visual words are visited as in the figure shows, Visual Words Collection will be: 2 D: depth of the vocabulary tree B: branch of the vocabulary tree

  8. Visual Words Coding Example: Sort by Three ordered parts : Collection of Ids of visual words visited. Collection of term frequency of visual words visited : Collection of orientations of descriptors

  9. Visual Words Coding Example: For each , set the -th element of to be 1, otherwise 0 Adaptive Arithmetic Coding {}

  10. GV-based Reranking Database Image Query Image is rotated from by a global rotation angle Two matched descriptors should share the same orientation distance to the global rotation angle. These kind of matches are used to evaluate the score of candidate of images. Different matches vote differently to the score according to their

  11. GV-based Reranking Suppose • Orientations of descriptors of a database image • One-to-one matched with orientations of query image Orientation Distances: where: , is set empirically The geometric verification score: where :the global rotation angle

  12. Experiments • Experiment Setup • Dataset: • 1 million images (Bing Mobile Data) training data • 2,367 images from Standford Mobile Visual Search Dataset(SMVS), 6 categories • Hardware: • Client: HTC Nexus One phone (1GHz CPU, 512M RAM) • Cloud: Windows Server (Xeon E5540 2.53GHz CPU, 36GB RAM) • Evaluation: • Time & Memory Cost • Bit Rate • Accuracy

  13. Time & Memory Cost • Memory Cost: 5.3 MB on the client (with D = 5, B = 10) • Time Cost Table: Processing Time (msec)

  14. Bit rate Table: Bit rate for transmission in mobile visual search1 Note: 1: Number of descriptors is set to 500. 2: H. Bay, T. Tuytelaars, and L. J. V. Gool. Surf: Speeded up robust fe atures. In ECCV, pages 404–417, 2006. 3: V. Chandrasekhar, G. Takacs, D. Chen, S. S. Tsai, J. Singh, and B. Girod. Transform coding of image feature descriptors. In Visual Communications and Image Processing, 2009. 4: V. Chandrasekhar, G. Takacs, D. Chen, S. S. Tsai, R. Grzeszczuk, and B. Girod. Chog: Compressed histogram of gradients a low bit-rate feature descriptor. In IEEE Computer Vision and Pattern Recognition, pages 2504–2511, 2009.

  15. Search Accuracy MAP of different categories Recall of Top N retrieval results. GV_ORI_IDF denotes the fast re-ranking with idf of leaf nodes cosidered.

  16. Conclusion & Futue Work • Contributions: • Low extra cost: • 5.3MB memory • light-weight computation of visual words • Benefits: • 3 times bit rate reduction compared to sending descriptors • 30 times compared to sending images • Future Work: • Building larger vocabulary tree for better accuracy • More discriminative and efficient descriptors • More effective geometric verification method

More Related