280 likes | 406 Views
OCRdroid : A Framework to Digitize Text Using Mobile Phones. Authors Mi Zhang, Anand Joshi, Ritesh Kadmawala, Karthik Dantu, Sameera Poduri, and Gaurav Sukhatme University of Southern California Presenter Mi Zhang. Outline. What is OCRdroid ? Related Work Design Considerations
E N D
OCRdroid: A Framework to Digitize Text Using Mobile Phones • Authors • Mi Zhang, Anand Joshi, Ritesh Kadmawala, Karthik Dantu, Sameera Poduri, and Gaurav Sukhatme • University of Southern California • Presenter • Mi Zhang
Outline • What is OCRdroid ? • Related Work • Design Considerations • System Architecture • Experimental Results • Summary
What is OCRdroid ? Why? Huge demand for recognizing text in camera-captured pictures Mobile phones are Ubiquitous and Powerful What? OCRdroid = OCR + Mobile Phone Two Applications PocketPal: Personal Receipt Management Tool PocketReader: Personal Mobile Screen Reader
Related Work Design and implementation of a Card Reader based on build-in camera. X.P. Luo, J. Li, and L.X. Zhen Automatic detection and recognition of signs from natural scenes. X. Chen, J. Yang, and A. Waibel A morphological image preprocessing suite for OCR on natural scene images. M. Elmore, and M. Martonosi
Design Considerations • Real-Time Processing • Lighting Conditions • Text Skew • Perception Distortion (Tilt) • Text Misalignment • Blur (Out – Of - Focus)
Real-Time Processing • Issues : • Limited memory • Relative Low processing power • Require quick response • Our Solutions : • Multi-Thread System Architecture • Image Compression • Computationally Efficient Algorithms
Lighting Conditions • Issues : • Uneven Lighting (Shadows, Reflection, Flooding, etc.)
Lighting Conditions • Our Solution : • Local Binarization : Fast Sauvola’s Algorithm
Text Skew • Issues : • When perspective is not fixed, text lines may get skewed from their original orientation
Text Skew • Our Solution : • Branch-and-Bound text line finding algorithm + Auto-rotation
Perception Distortion (Tilt) • Issues : • When the text plane is not parallel to the imaging plane • Mobile phones are susceptible to tilts • Small Perception Distortion causes OCR to fail
Perception Distortion (Tilt) • Our Solution : • Use Embedded Orientation Sensor (Pitch and Roll) • Calibration
Text Misalignment • Issues : • Camera screen covers a partial text region • Irregular shapes of text characters
Top Border Left Border Right Border Bottom Border Text Misalignment • Our Solution : • Step#1 : Modified version of Sauvola’s algorithm
Text Misalignment • Our Solution : • Step#1(Cont) : Routes to perform Sauvola’s algorithm
Text Misalignment • Our Solution : • Step#2 : Noise Reduction Top Border W . . . . . . . . W Left Border Right Border Bottom Border
Blur (Out Of Focus) • Issues : • OCR needs sharp edge response
Blur (Out Of Focus) • Our Solution : • Android autofocus mechanism
Web Server System Architecture 4. Perform Backend Processing & OCR 3. Upload image Internet 5. Return OCR Results OCR Engine – Tesseract 6. Results returned Android Phone 1. Photo of a receipt 2. Front end processing 7. Information Extraction
Front-End Architecture Orientation Handler Camera Preview Capture Image Upload Alignment Checker Internet Proper Alignment Detected Improper Alignment Detected OCR Data Receiver Information Extraction Mobile Database Internet
Back-End Architecture Store Image Skew Detection & Auto-rotation Binarization Internet OCR Text Output Internet Sends Results back to Mobile Device Tesseract OCR Engine
Experimental Results Test Corpus Ten distinct black & white images Three distinct lighting conditions Normal: Adequate light Poor: Dim Flooding: Light source focus on a particular portion of image Performance Metrics Character Accuracy Word Accuracy Timing
Experimental Results • Binarization: (Measured by Character Accuracy) • Normal: Around 97% • Poor: Around 60% • Flooding: Around 60% • Skew tolerance: Up to 30 degrees • Perception Distortion: Up to 10 degrees
Experimental Results Misalignment Detection: Timing Performance: Misalignment Detection: Less Than 6 seconds Overall Process: Less Than 11 seconds
More Information • Project Website @: http://www-scf.usc.edu/~ananddjo/ocrdroid/index.php • Test Cases & Results • Demo Video • Paper • Presentation Slide • Tools Information (Mobile Phone + Software)
Summary • OCRdroid – A Generic Framework for Developing OCR-based Applications on Mobile Phones • Six Design Considerations & Our Solutions • Especially, we advance a new real-time computationally efficient algorithm for text misalignment detection • Experimental Results