500 likes | 613 Views
Spatial Business Detection and Recognition from Images. Alexander Darino Weeks 10 & 11. STR Implementation. STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes”. Multiresolution -based potential characters detection.
E N D
Spatial Business Detection and Recognition from Images Alexander Darino Weeks 10 & 11
STR Implementation • STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes” Multiresolution-based potential characters detection Character/layout geometry and color properties analysis Refined Detection Local affine rectification
Refined Detection • One Font per classifier, a-z A-Z • Generate alphabet templates • Resize & center templates; Divide into grid (7x7) • Apply several 2D Gabor filters to each grid patch • Different orientations, frequencies, variances, • For each pixel, yields real/imaginary component of transformation • Feed data into Linear Discriminant Analysis • Reduces features and forms classifier at same time
2D Gabor Filter • Convolution of Gaussian x Sine wave
Character Determination • Each grid patch has it’s own LDA classifier; classifier returns vector of probabilities for each symbol • To classify overall character, recursively consider all 9-neighborhoods, multiply corresponding probabilities together • When only one grid-patch remains, highest probability wins
Recognition Process • Color Properties Analysis: Choose channel with highest confidence of best distinguishing foreground from background • Binarization Threshold (50% of Otsu’s Method) • Intermediate Representation: Trim, Resize, and Center Binary Image • Perform OCR on variations of Int. Rep: stretched, eroded (gaussian-based), diluted • Aggregate and return votes
Recognition Process Example:“G” using Trebuchet-MS Classifier Query Character (Actual Size) Intermediate Representation(Actual Size)
abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: s Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: G
Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: B
Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: B
Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: G
Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: a
Recognition Process Example:“G” using Trebuchet-MS Classifier • Final Results: • B: 5/15 • G: 5/15 • g: 3/15 • a : 1 (6.6%) • s : 1 (6.6%)
“GEORGE” (Trebuchet-MS) • Votes: • E: 14/15 • t: 1/15
“GEORGE” (Trebuchet-MS) • Votes: • j: 13/15 • i: 2/15 • ‘j’ is the default when unable to decide • Should invert during preprocessing
“GEORGE” (Trebuchet-MS) • Votes: • j: 13/15 • i: 1/15 • M: 1/15 • ‘j’ is the default when unable to decide • Should invert during preprocessing
“GEORGE” (Trebuchet-MS) • Votes: • B: 5/15 • G: 5/15 • g: 3/15 • a: 1/15 • s: 1/15
“GEORGE” (Trebuchet-MS) • Votes: • j: 12/15 • Y: 2/15 • X: 1/15 • ‘j’ is the default when unable to decide • Should invert during preprocessing or training
Note on the “Inversion Problem” • Easy to fix; common problem in OCR systems • Will likely detect and correct during preprocessing state as opposed to training • More training data: slower, less reliable • Preprocessing: like trying many different lenses at the eye doctor and taking your best guess with each lense.
“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • B: 9/15 • j: 3/15 • H: 2/15 • F: 1/15
“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • A: 9/15 • j: 5/15 • n: 1/15
“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • K: 12/15 • j: 2/15 • H: 1/15
“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • E: 5/15 • j: 3/15 • L: 3/15 • r: 2/15 • F: 2/15
“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • p: 12/15 • j: 3/15 PR
“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • Y: 12/15 • j: 3/15
“UNIVERSITY”(Used: Times New Roman) abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
“UNIVERSITY”(Used: Times New Roman) • Votes: • U: 8/15 • C: 3/15 • j: 2/15 • s: 1/15 • O: 1/15
“UNIVERSITY”(Used: Times New Roman) • Votes: • N: 12/15 • j: 3/15
“UNIVERSITY”(Used: Times New Roman) • Votes: • l(‘el’): 9/15 • I(‘eye’): 6/15
“UNIVERSITY”(Used: Times New Roman) • Votes: • v: 9/15 • j: 3/15 • V: 3/15
“UNIVERSITY”(Used: Times New Roman) • Votes: • F: 9/15 • L: 5/15 • l (‘el’): 1/15
“UNIVERSITY”(Used: Times New Roman) • Votes: • G: 9/15 • j: 6/15
“UNIVERSITY”(Used: Times New Roman) • Votes: • j: 12/15 • x: 2/15 • w: 1/15
“UNIVERSITY”(Used: Times New Roman) • Votes: • j: 5/15 • C: 4/15 • O: 4/15 • x: 2/15
“UNIVERSITY”(Used: Times New Roman) • Votes: • T: 9/15 • l: 3/15 • i: 1/15 • j: 1/15 • L: 1/15
“UNIVERSITY”(Used: Times New Roman) • Votes: • Y: 10/15 • j: 3/15 • i: 2/15
Evaluation • Biggest weaknesses in preprocessing stage • OCR sensitive to thresholding/color inversion • Occasionally color modeling chooses a bad channel to use for OCR – happens more often on low-resolution images • Works surprisingly well for low-resolution images • Font does not need to be exact, but proportions need to be roughly the same
The Big Picture Business Name Matching Business Spatial Detection Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses BusinessIdentification Image STR Detected Text
Old Approach • Form words from highest-voted characters • Compare to lexicon using Levenshtein distance • Use existing ranking system afterwards BOKFRY > BAKERY (L-DIST = 2) GFQRGF > GEORGE (L-DIST = 3)
New Approach (Lexicon-assisted STR) • Minimize Levenshtein distance with best permutation of voted characters • Use existing ranking system afterwards B O K F P Y G U H E R I >>> BAKERY J A j L I l (L-DIST = 0)
The End Result Bruegger's Bagels Category: Bagels Address:Market Sq Pittsburgh, PA 15222 Phone: (412) 281-2515 Rating: Not Rated
Next Steps • Fix STR Preprocessing • Bug in Color Modeling code found online • Inversion determination • Multiple thresholds • Word matching: Generate templates of words/logos instead of letters • Text detector: fix character/word fragmentation by reading papers that address the issue
Next Steps • Test more images; fix problems as they arise • Ideas to consider: • Feed grid-patch probability vectors into SVM instead of “smoothing” • Generate “disambiguation classifiers” to differentiate: • Between top contending votes. Remember how ‘G’ and ‘B’ got confused? Dynamically create classifier to tell them apart • Between commonly confused letters. Eg. E/F, l/i/j, o/c, etc • Don’t consider statistically insignificant confidences
Next Steps • Text Detection • Look into after more work has been done on STR • Need to address issues: • Intracharacter segmentation • Intercharacter segmentation • Word segmentation • Needed to make STR system automated like before