280 likes | 298 Views
OCRdroid is a framework that combines OCR technology with mobile phones to efficiently recognize text in camera-captured images. The system addresses challenges like lighting conditions, text skew, perception distortion, and text misalignment to provide real-time processing solutions. Experimental results demonstrate high accuracy even in varying conditions.
E N D
OCRdroid: A Framework to Digitize Text Using Mobile Phones • Authors • Mi Zhang, Anand Joshi, Ritesh Kadmawala, Karthik Dantu, Sameera Poduri, and Gaurav Sukhatme • University of Southern California • Presenter • Mi Zhang
Outline • What is OCRdroid ? • Related Work • Design Considerations • System Architecture • Experimental Results • Summary
What is OCRdroid ? Why? Huge demand for recognizing text in camera-captured pictures Mobile phones are Ubiquitous and Powerful What? OCRdroid = OCR + Mobile Phone Two Applications PocketPal: Personal Receipt Management Tool PocketReader: Personal Mobile Screen Reader
Related Work Design and implementation of a Card Reader based on build-in camera. X.P. Luo, J. Li, and L.X. Zhen Automatic detection and recognition of signs from natural scenes. X. Chen, J. Yang, and A. Waibel A morphological image preprocessing suite for OCR on natural scene images. M. Elmore, and M. Martonosi
Design Considerations • Real-Time Processing • Lighting Conditions • Text Skew • Perception Distortion (Tilt) • Text Misalignment • Blur (Out – Of - Focus)
Real-Time Processing • Issues : • Limited memory • Relative Low processing power • Require quick response • Our Solutions : • Multi-Thread System Architecture • Image Compression • Computationally Efficient Algorithms
Lighting Conditions • Issues : • Uneven Lighting (Shadows, Reflection, Flooding, etc.)
Lighting Conditions • Our Solution : • Local Binarization : Fast Sauvola’s Algorithm
Text Skew • Issues : • When perspective is not fixed, text lines may get skewed from their original orientation
Text Skew • Our Solution : • Branch-and-Bound text line finding algorithm + Auto-rotation
Perception Distortion (Tilt) • Issues : • When the text plane is not parallel to the imaging plane • Mobile phones are susceptible to tilts • Small Perception Distortion causes OCR to fail
Perception Distortion (Tilt) • Our Solution : • Use Embedded Orientation Sensor (Pitch and Roll) • Calibration
Text Misalignment • Issues : • Camera screen covers a partial text region • Irregular shapes of text characters
Top Border Left Border Right Border Bottom Border Text Misalignment • Our Solution : • Step#1 : Modified version of Sauvola’s algorithm
Text Misalignment • Our Solution : • Step#1(Cont) : Routes to perform Sauvola’s algorithm
Text Misalignment • Our Solution : • Step#2 : Noise Reduction Top Border W . . . . . . . . W Left Border Right Border Bottom Border
Blur (Out Of Focus) • Issues : • OCR needs sharp edge response
Blur (Out Of Focus) • Our Solution : • Android autofocus mechanism
Web Server System Architecture 4. Perform Backend Processing & OCR 3. Upload image Internet 5. Return OCR Results OCR Engine – Tesseract 6. Results returned Android Phone 1. Photo of a receipt 2. Front end processing 7. Information Extraction
Front-End Architecture Orientation Handler Camera Preview Capture Image Upload Alignment Checker Internet Proper Alignment Detected Improper Alignment Detected OCR Data Receiver Information Extraction Mobile Database Internet
Back-End Architecture Store Image Skew Detection & Auto-rotation Binarization Internet OCR Text Output Internet Sends Results back to Mobile Device Tesseract OCR Engine
Experimental Results Test Corpus Ten distinct black & white images Three distinct lighting conditions Normal: Adequate light Poor: Dim Flooding: Light source focus on a particular portion of image Performance Metrics Character Accuracy Word Accuracy Timing
Experimental Results • Binarization: (Measured by Character Accuracy) • Normal: Around 97% • Poor: Around 60% • Flooding: Around 60% • Skew tolerance: Up to 30 degrees • Perception Distortion: Up to 10 degrees
Experimental Results Misalignment Detection: Timing Performance: Misalignment Detection: Less Than 6 seconds Overall Process: Less Than 11 seconds
More Information • Project Website @: http://www-scf.usc.edu/~ananddjo/ocrdroid/index.php • Test Cases & Results • Demo Video • Paper • Presentation Slide • Tools Information (Mobile Phone + Software)
Summary • OCRdroid – A Generic Framework for Developing OCR-based Applications on Mobile Phones • Six Design Considerations & Our Solutions • Especially, we advance a new real-time computationally efficient algorithm for text misalignment detection • Experimental Results