1 / 26

A Hybrid Diagnosis to Real-time Image De-duplication

A Hybrid Diagnosis to Real-time Image De-duplication. Global Media – Photo track Hong-Ming Chen hmchen@yahoo-inc.com. Image duplication. Sometimes it is good for art. But it is annoying for most of the other time …. Case 1: Yahoo! News .

yaakov
Download Presentation

A Hybrid Diagnosis to Real-time Image De-duplication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Hybrid Diagnosis toReal-timeImage De-duplication Global Media – Photo track Hong-Ming Chen hmchen@yahoo-inc.com

  2. Image duplication • Sometimes it is good for art. • But it is annoying for most of the other time …

  3. Case 1: Yahoo! News

  4. An real using instance of Yahoo! Dynamic slide show

  5. Case 2: Yahoo! omg

  6. Our Recipe for Yahoo! • A hybrid real-time de-duplication system • Submitted to Yahoo! Tech Pulse 2012 • Will be on production soon

  7. Concerns and solutions Users’ concern Solutions • 1. short response time • Faster than fast! • 2. Good de-dup result • Sweeping off all the duplications, • keeping all the others. • 1. Fast Approach • “Fingerprint” comparison per image pair • Not accurate enough • 2. Accurate Approach • Sophisticated image matching. • Impossible to be real time.

  8. Difficulty and limitation 1/2 • Huge Computation v.s. Real Time • Pair-wise comparison • # = C(N, 2), N is total image amount. • Computation grows exponentially with the size of image set. • N = 10, # = 45 • N = 20, # = 190 • N = 100, # = 4950

  9. Difficulty and limitation 2/2 • Limited storage space • Photos are described by limited information. Photo CCM (meta-data) Name: URL: Created date: Info for de-dup: …

  10. Proposed Solution • Hybrid referral system: • first consultation: • Fast approach • subsequent consultation: • Accurate approach, exam ambiguous pairs

  11. Fast consultation: Grid Color Moment • Discover Statistical property • 5x5 Grid • HSV color space • 3 moments/grid • Mean, variance, skewness Feature extraction Image descriptor: 1 2 3 … 224 225 Vector length: 5x5x3x3 = 225

  12. Fast consultation : Grid Color Moment Feature extraction Feature extraction Image descriptor: Image descriptor: =similarity - 1 1 2 2 3 3 … … 224 224 225 225

  13. Concerns and solutions Users’ concern Solutions • 1. short response time • Faster than fast! • 2. Good de-dup result • Sweeping off all the duplications, • keeping all the others. • 1. Fast Approach • “Fingerprint” comparison per image pair • Not accurate enough • 2. Accurate Approach • Sophisticated image matching. • Impossible to be real time. Comparing time: ~1 us/pair ! 1000,000 pairs/sec.

  14. How about accuracy? More than 99.6% in average!

  15. How about accuracy? Not high enough?

  16. Result and Observation • Non-Duplicated image pairs number: 460 • Duplicated-image pairs number: 257,454 • Pairs located in [T1, T2] = 1,770 • Pairs located outside [T1,T2] = 256,144 • In average, only 1770/256144 = 0.7% pairs need to be re-examined. • For a set with 50 images, only 8 out of 1225 pairs need to be re-examined. --Non-Duplicated image pairs --Duplicated image pairs [T1, T2] = [5, 25] T1 T2 T1 T2 Pairs Amount GCM Distance • GCM Distance

  17. Accurate Consultation: LIPM – Local Interest Point Matching • Local interest points are described by SURF feature.

  18. The system provide: • Fast 1st round de-duplication • Accurate 2nd round de-duplication (optional) • Similarity scores for: • Remove duplications • Clues to rearrange the photo layout: increase diversity

  19. Successful Duplication detection

  20. Successful Duplication detection

  21. Successful Duplication detection

  22. Successful Duplication detection

  23. Successful Non-Duplication detection

  24. De-Duplication Demo

  25. De-Duplication Demo

More Related