370 likes | 484 Views
Business Identification: Spatial Detection. Alexander Darino Week 5. Outline. Recap of Previous Work Business Name Detection Business Name Matching Business Spatial Detection Weaknesses to Current Approach Alternatives to Current Approach Acknowledgements. Outline. Week 4. Week 5.
E N D
Business Identification:Spatial Detection Alexander Darino Week 5
Outline • Recap of Previous Work • Business Name Detection • Business Name Matching • Business Spatial Detection • Weaknesses to Current Approach • Alternatives to Current Approach • Acknowledgements
Outline Week 4 Week 5 Business Name Matching Business Spatial Detection Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses BusinessIdentification Image OCR Detected Text
Previous Work Image Where Am I? Latitude, Longitude 65 George S Aiken Co Winghart's Burger & Whiskey Bar Market Square Bella Sera On the Square Chipotle NOLA Las Velas … Latitude, Longitude Geocoding Reverse Geocoding Nearby Businesses
Business Name Detection Business Name Matching Business Spatial Detection Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses BusinessIdentification Image OCR Detected Text
Business Name Detection … <line dy="95" dx="1573" y="420" x="11" value="1"> <space dy="26" dx="9" y="379" x="11"/> <box dy="26" dx="9" y="379" x="11" value="0" weights="96" numac="1"/> <box dy="25" dx="6" y="406" x="11" value="J" weights="98,62" numac="2" achars="p"/> <box dy="19" dx="5" y="382" x="19" value="n" weights="96" numac="1"/> <space dy="5" dx="30" y="441" x="25"/> <box dy="5" dx="7" y="441" x="56" value="."/> <box dy="24" dx="5" y="401" x="57" value="."/> <box dy="13" dx="8" y="429" x="58" value="v" weights="98" numac="1"/> <box dy="26" dx="9" y="402" x="60" value="." weights="94" numac="1"/> <box dy="22" dx="5" y="406" x="67" value="0" weights="96" numac="1"/> <box dy="10" dx="12" y="444" x="71" value="."/> </line> …
Business Name Matching Business Name Matching Business Spatial Detection Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses BusinessIdentification Image OCR Detected Text
Business Name Matching • Developed Confidence Attribution Algorithm • Confidence of OCR Token being Name Token • Example: Confidence of “ESTUANT” representing “RESTAURANT” • Point-based system • Confidence of Name appearing in Image • Sum of points of matching OCR Text • Use logarithmically-normalized points to determine business inclusion threshold
Business Name Matching Note: k is usually 2 or 3
Business Name Matching Note: This originally did not appear because it did not exceed the confidence threshold. It now appears because it contributes to the Business Name Identification
Business Spatial Identification Business Name Matching Business Spatial Detection Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses BusinessIdentification Image OCR Detected Text
Business Spatial Identification Aiken George S Co Category: Food, Grocery Address: 218 Forbes Ave Pittsburgh, PA 15222 Phone: (412) 391-6358 Rating: 4.5/5 (2 Reviews)
Business Spatial Identification Bruegger's Bagels Category: Bagels Address:Market Sq Pittsburgh, PA 15222 Phone: (412) 281-2515 Rating: Not Rated
Weaknesses to Current Approach Business Name Matching Business Spatial Detection Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses BusinessIdentification Image OCR Detected Text
Weaknesses to Current Approach Lots of Garbage
Weaknesses to Current Approach Fragmented Word Detection
Weaknesses to Current Approach Fails withnon-orthogonal perspective Did I already mention lots of garbage?
Weaknesses to Current Approach Fails withnon-roman text Not scale-invariant
Two different Alternative Approaches
Alternative #1: Image Matching Match to Storefront Image Business Spatial Detection Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses BusinessIdentification Image
Alternative #1: Image Matching • Weaknesses • Storefront images aren’t always available for matching • Computationally Expensive • Hundreds of images to compare to • Nothing new • Boring!
Alternative #2: Template Matching Latitude Longitude Geocoding Reverse Geocoding Render Templates of Business Names in Different Fonts Nearby Businesses Template Images Image Image Matching (eg. SIFT, HAAR) Business Identification Business Spatial Detection
Alternative #2: Template Matching • Tambellini • Tambellini • Tambellini • Tambellini • Tambellini • Tambellini • Tambellini • Tambellini
Alternative #2: Template Matching OCR Alternative #2 Scale Invariant Bounded Search Whole-word recognition All fonts • Not Scale Invariant • Unbounded Search • Fragmented Recognition • Roman-only font
Acknowledgements • Subh • Provided several ideas regarding template matching using SIFT, HAAR features, etc