1 / 50

Spatial Business Detection and Recognition from Images

Spatial Business Detection and Recognition from Images. Alexander Darino Weeks 10 & 11. STR Implementation. STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes”. Multiresolution -based potential characters detection.

ashanti
Download Presentation

Spatial Business Detection and Recognition from Images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial Business Detection and Recognition from Images Alexander Darino Weeks 10 & 11

  2. STR Implementation • STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes” Multiresolution-based potential characters detection Character/layout geometry and color properties analysis Refined Detection Local affine rectification

  3. Refined Detection • One Font per classifier, a-z A-Z • Generate alphabet templates • Resize & center templates; Divide into grid (7x7) • Apply several 2D Gabor filters to each grid patch • Different orientations, frequencies, variances, • For each pixel, yields real/imaginary component of transformation • Feed data into Linear Discriminant Analysis • Reduces features and forms classifier at same time

  4. 2D Gabor Filter • Convolution of Gaussian x Sine wave

  5. Training Process

  6. Character Determination • Each grid patch has it’s own LDA classifier; classifier returns vector of probabilities for each symbol • To classify overall character, recursively consider all 9-neighborhoods, multiply corresponding probabilities together • When only one grid-patch remains, highest probability wins

  7. Recognition Process • Color Properties Analysis: Choose channel with highest confidence of best distinguishing foreground from background • Binarization Threshold (50% of Otsu’s Method) • Intermediate Representation: Trim, Resize, and Center Binary Image • Perform OCR on variations of Int. Rep: stretched, eroded (gaussian-based), diluted • Aggregate and return votes

  8. Recognition Process Example:“G” using Trebuchet-MS Classifier Query Character (Actual Size) Intermediate Representation(Actual Size)

  9. abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ

  10. Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: s Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: G

  11. Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: B

  12. Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: B

  13. Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: G

  14. Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: a

  15. Recognition Process Example:“G” using Trebuchet-MS Classifier • Final Results: • B: 5/15 • G: 5/15 • g: 3/15 • a : 1 (6.6%) • s : 1 (6.6%)

  16. “GEORGE” (Trebuchet-MS) • Votes: • E: 14/15 • t: 1/15

  17. “GEORGE” (Trebuchet-MS) • Votes: • j: 13/15 • i: 2/15 • ‘j’ is the default when unable to decide • Should invert during preprocessing

  18. “GEORGE” (Trebuchet-MS) • Votes: • j: 13/15 • i: 1/15 • M: 1/15 • ‘j’ is the default when unable to decide • Should invert during preprocessing

  19. “GEORGE” (Trebuchet-MS) • Votes: • B: 5/15 • G: 5/15 • g: 3/15 • a: 1/15 • s: 1/15

  20. “GEORGE” (Trebuchet-MS) • Votes: • j: 12/15 • Y: 2/15 • X: 1/15 • ‘j’ is the default when unable to decide • Should invert during preprocessing or training

  21. Note on the “Inversion Problem” • Easy to fix; common problem in OCR systems • Will likely detect and correct during preprocessing state as opposed to training • More training data: slower, less reliable • Preprocessing: like trying many different lenses at the eye doctor and taking your best guess with each lense.

  22. “BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ

  23. “BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • B: 9/15 • j: 3/15 • H: 2/15 • F: 1/15

  24. “BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • A: 9/15 • j: 5/15 • n: 1/15

  25. “BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • K: 12/15 • j: 2/15 • H: 1/15

  26. “BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • E: 5/15 • j: 3/15 • L: 3/15 • r: 2/15 • F: 2/15

  27. “BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • p: 12/15 • j: 3/15 PR

  28. “BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • Y: 12/15 • j: 3/15

  29. “UNIVERSITY”(Used: Times New Roman) abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ

  30. “UNIVERSITY”(Used: Times New Roman) • Votes: • U: 8/15 • C: 3/15 • j: 2/15 • s: 1/15 • O: 1/15

  31. “UNIVERSITY”(Used: Times New Roman) • Votes: • N: 12/15 • j: 3/15

  32. “UNIVERSITY”(Used: Times New Roman) • Votes: • l(‘el’): 9/15 • I(‘eye’): 6/15

  33. “UNIVERSITY”(Used: Times New Roman) • Votes: • v: 9/15 • j: 3/15 • V: 3/15

  34. “UNIVERSITY”(Used: Times New Roman) • Votes: • F: 9/15 • L: 5/15 • l (‘el’): 1/15

  35. “UNIVERSITY”(Used: Times New Roman) • Votes: • G: 9/15 • j: 6/15

  36. “UNIVERSITY”(Used: Times New Roman) • Votes: • j: 12/15 • x: 2/15 • w: 1/15

  37. “UNIVERSITY”(Used: Times New Roman) • Votes: • j: 5/15 • C: 4/15 • O: 4/15 • x: 2/15

  38. “UNIVERSITY”(Used: Times New Roman) • Votes: • T: 9/15 • l: 3/15 • i: 1/15 • j: 1/15 • L: 1/15

  39. “UNIVERSITY”(Used: Times New Roman) • Votes: • Y: 10/15 • j: 3/15 • i: 2/15

  40. Evaluation • Biggest weaknesses in preprocessing stage • OCR sensitive to thresholding/color inversion • Occasionally color modeling chooses a bad channel to use for OCR – happens more often on low-resolution images • Works surprisingly well for low-resolution images • Font does not need to be exact, but proportions need to be roughly the same

  41. How do I use this information?

  42. The Big Picture Business Name Matching Business Spatial Detection Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses BusinessIdentification Image STR Detected Text

  43. Old Approach • Form words from highest-voted characters • Compare to lexicon using Levenshtein distance • Use existing ranking system afterwards BOKFRY > BAKERY (L-DIST = 2) GFQRGF > GEORGE (L-DIST = 3)

  44. New Approach (Lexicon-assisted STR) • Minimize Levenshtein distance with best permutation of voted characters • Use existing ranking system afterwards B O K F P Y G U H E R I >>> BAKERY J A j L I l (L-DIST = 0)

  45. The End Result Bruegger's Bagels Category: Bagels Address:Market Sq Pittsburgh, PA 15222 Phone: (412) 281-2515 Rating: Not Rated

  46. Next Steps • Fix STR Preprocessing • Bug in Color Modeling code found online • Inversion determination • Multiple thresholds • Word matching: Generate templates of words/logos instead of letters • Text detector: fix character/word fragmentation by reading papers that address the issue

  47. Next Steps • Test more images; fix problems as they arise • Ideas to consider: • Feed grid-patch probability vectors into SVM instead of “smoothing” • Generate “disambiguation classifiers” to differentiate: • Between top contending votes. Remember how ‘G’ and ‘B’ got confused? Dynamically create classifier to tell them apart • Between commonly confused letters. Eg. E/F, l/i/j, o/c, etc • Don’t consider statistically insignificant confidences

  48. Next Steps • Text Detection • Look into after more work has been done on STR • Need to address issues: • Intracharacter segmentation • Intercharacter segmentation • Word segmentation • Needed to make STR system automated like before

  49. Thank You

More Related