1 / 28

OCRdroid : A Framework to Digitize Text Using Mobile Phones

OCRdroid is a framework that combines OCR technology with mobile phones to efficiently recognize text in camera-captured images. The system addresses challenges like lighting conditions, text skew, perception distortion, and text misalignment to provide real-time processing solutions. Experimental results demonstrate high accuracy even in varying conditions.

brooksa
Download Presentation

OCRdroid : A Framework to Digitize Text Using Mobile Phones

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OCRdroid: A Framework to Digitize Text Using Mobile Phones • Authors • Mi Zhang, Anand Joshi, Ritesh Kadmawala, Karthik Dantu, Sameera Poduri, and Gaurav Sukhatme • University of Southern California • Presenter • Mi Zhang

  2. Outline • What is OCRdroid ? • Related Work • Design Considerations • System Architecture • Experimental Results • Summary

  3. What is OCRdroid ? Why? Huge demand for recognizing text in camera-captured pictures Mobile phones are Ubiquitous and Powerful What? OCRdroid = OCR + Mobile Phone Two Applications PocketPal: Personal Receipt Management Tool PocketReader: Personal Mobile Screen Reader

  4. Related Work Design and implementation of a Card Reader based on build-in camera. X.P. Luo, J. Li, and L.X. Zhen Automatic detection and recognition of signs from natural scenes. X. Chen, J. Yang, and A. Waibel A morphological image preprocessing suite for OCR on natural scene images. M. Elmore, and M. Martonosi

  5. Design Considerations • Real-Time Processing • Lighting Conditions • Text Skew • Perception Distortion (Tilt) • Text Misalignment • Blur (Out – Of - Focus)

  6. Real-Time Processing • Issues : • Limited memory • Relative Low processing power • Require quick response • Our Solutions : • Multi-Thread System Architecture • Image Compression • Computationally Efficient Algorithms

  7. Lighting Conditions • Issues : • Uneven Lighting (Shadows, Reflection, Flooding, etc.)

  8. Lighting Conditions • Our Solution : • Local Binarization : Fast Sauvola’s Algorithm

  9. Text Skew • Issues : • When perspective is not fixed, text lines may get skewed from their original orientation

  10. Text Skew • Our Solution : • Branch-and-Bound text line finding algorithm + Auto-rotation

  11. Perception Distortion (Tilt) • Issues : • When the text plane is not parallel to the imaging plane • Mobile phones are susceptible to tilts • Small Perception Distortion causes OCR to fail

  12. Perception Distortion (Tilt) • Our Solution : • Use Embedded Orientation Sensor (Pitch and Roll) • Calibration

  13. Text Misalignment • Issues : • Camera screen covers a partial text region • Irregular shapes of text characters

  14. Top Border Left Border Right Border Bottom Border Text Misalignment • Our Solution : • Step#1 : Modified version of Sauvola’s algorithm

  15. Text Misalignment • Our Solution : • Step#1(Cont) : Routes to perform Sauvola’s algorithm

  16. Text Misalignment • Our Solution : • Step#2 : Noise Reduction Top Border W . . . . . . . . W Left Border Right Border Bottom Border

  17. Blur (Out Of Focus) • Issues : • OCR needs sharp edge response

  18. Blur (Out Of Focus) • Our Solution : • Android autofocus mechanism

  19. Web Server System Architecture 4. Perform Backend Processing & OCR 3. Upload image Internet 5. Return OCR Results OCR Engine – Tesseract 6. Results returned Android Phone 1. Photo of a receipt 2. Front end processing 7. Information Extraction

  20. Front-End Architecture Orientation Handler Camera Preview Capture Image Upload Alignment Checker Internet Proper Alignment Detected Improper Alignment Detected OCR Data Receiver Information Extraction Mobile Database Internet

  21. Back-End Architecture Store Image Skew Detection & Auto-rotation Binarization Internet OCR Text Output Internet Sends Results back to Mobile Device Tesseract OCR Engine

  22. Experimental Results Test Corpus Ten distinct black & white images Three distinct lighting conditions Normal: Adequate light Poor: Dim Flooding: Light source focus on a particular portion of image Performance Metrics Character Accuracy Word Accuracy Timing

  23. Experimental Results • Binarization: (Measured by Character Accuracy) • Normal: Around 97% • Poor: Around 60% • Flooding: Around 60% • Skew tolerance: Up to 30 degrees • Perception Distortion: Up to 10 degrees

  24. Experimental Results Misalignment Detection: Timing Performance: Misalignment Detection: Less Than 6 seconds Overall Process: Less Than 11 seconds

  25. More Information • Project Website @: http://www-scf.usc.edu/~ananddjo/ocrdroid/index.php • Test Cases & Results • Demo Video • Paper • Presentation Slide • Tools Information (Mobile Phone + Software)

  26. Summary • OCRdroid – A Generic Framework for Developing OCR-based Applications on Mobile Phones • Six Design Considerations & Our Solutions • Especially, we advance a new real-time computationally efficient algorithm for text misalignment detection • Experimental Results

  27. Questions ?

  28. Thank You

More Related