480 likes | 515 Views
Camera Based Document Image Analysis. David Doermann University of Maryland, College Park. What defines the problem?. Traditional Document Analysis Deals primarily with paper representations Acquired with flatbed or sheetfed scanners Camera Based Analysis
E N D
Camera Based Document Image Analysis David Doermann University of Maryland, College Park
What defines the problem? • Traditional Document Analysis • Deals primarily with paper representations • Acquired with flatbed or sheetfed scanners • Camera Based Analysis • Clearly defined by the acquisition device, its properties, the impacts of it use, etc…. but… The devices open up a wide range of new and interesting applications (and problems) and extends what we may consider document analysis….
Scanner Acquisition • Advantages • Reasonable quality – • Controlled lighting, high resolution, fixed imaging plane • Rapid Acquisition • Relatively cheap • Disadvantages • Specialized Device • Fixed– Documents must come to device • Requires handing of documents or documents in a sheet form for feeders
“Book” Camera/Scanner Acquisition • One step removed from traditional scanners • Controlled environment – lighting, image plane, orientation • Changes the nature of the content • Easier to image atypical documents • R are, Historic, fragile… • Often very expensive • Can be relatively slow • sheet scanners are hundreds of pages per minute • …although robotic cameras can image 10s of pages a minute….
Industrial Cameras • Removes the constraints on configuration • Often still in a controlled environment • Custom (and expensive) solutions are common • Processing power bounded only by cost • Usage • Postal applications, document inspection (newspapers, etc), industrial applications
(Portable) Digital Cameras • Provide a much greater flexibility then scanners • Multiple uses • Devices goes to documents • Potentially removes the bottleneck of acquisition for simple tasks … fewer constraints (Lighting, image plane, focus,…) increases complexity of resulting image and image processing… … yet allows a wider variation of applications A significant tradeoff
Roadmap… • Discussion of some key related research and applications of non-scanner DIA • Influences of mobile devices on applications • Issues with processing “traditional” documents vs processing text • Future of camera based capture…. • Open issues • New opportunities
What has been done? • Applications primarily centered on “Image Text” • Text in Video Graphics • Text from WWW pages • Text in Scenes • Some work on key challenges • Imaging of text in controlled environments such as parking lots, meeting rooms, assembly lines, etc • Limited work on actually processing traditional documents….
Video Text Recognition • Indexing content from graphic or scene text in videos used to supplement speech, closed captions,… • Countless papers published • Challenges are well known • Low Resolution, Complex background, Different font style and size, Lighting, Camera motion, Text/Object motion, Occlusion/distortion, … all magnified for scene text • Benefits of multiple frames, repeated content
WWW Text Image Analysis • Applications • Identifying graphic text for indexing and retrieval • Identification of SPAM email in attachments • Uncovering hidden information…. • Issues • Text style variations (font, font style, orientation). • Text Quality (Color, Size, Anti-aliasing) • Image resolution
Visual Input • Applications • General input for computer systems • Passive verification of signatures from cameras mounted over the writing surface • Has general implications for mobile devices that don’t have “traditional” keyboard input • Challenges: • Pen tip tracking • Identifying the temporal relations • Online recognition
Whiteboard Reading • Reading handwritten and printed material for meeting scenarios • Challenges: • Must deal with unconstrained handwriting • Distinguish text from graphics and sketches • Parse and Interpret graphics (electronic ink) • Content can appear and disappear – dynamically produced
Meeting and Lecture Processing • Meetings • Reading name plates and tags • Identifying and linking references to documents • Processing whiteboards • Lectures • Reading text on projected presentations • Detection, normalization and matching of text with source content (PowerPoint) • Challenges: • Variable content • Animations
License Plate Reading • Applications • Parking lot tracking • Red-light and Speed Camera • Vehicle Surveillance • Challenges • Moving Vehicles • Complex plates • Night and all weather imaging • Limited use of context
Sign Image Sign Image Hausdorff Hausdorff matching results matching results Detected Sign Detected Sign Shape Shape Nominal Conditions: Typical Rectangular Freeway Sign Nominal Conditions: Typical Rectangular Freeway Sign Nominal Conditions: Rural Caution Sign Nominal Conditions: Rural Caution Sign Adverse Glint Conditions: Freeway Sign Adverse Glint Conditions: Freeway Sign Road Sign Recognition • Applications • Driver Assisted Systems, Automated Mapping, Sign guideline enforcement (location, quality) • Challenges • Low resolution, motion blur • Real-time systems • Detecting signs under a variety of conditions…
Sign Recognition and Translation • Application: Integrated identification, recognition and translation text found on foreign signs, maps, menus, transportation schedules, etc • Extremely useful for other character sets… • Primarily PDA or Mobile Phone Based Hardware • Networked or Standalone solutions have are being marketed… • Ultimately software solutions are desirable….
Systems for the Visually Impaired • Allows legally blind consumers access to a variety of information sources • Transportation, shopping, … • System builds end to end application of detection, enhancement, recognition and speech transcription A. Zandifar, A. Chahine, R. Duraiswami and L.S. Davis, “ A Video-based interface to textual information for the visually impaired ”, IEEE Computer Society ICMI 2002, pp 325-330.
Commonalities • Most of these systems can be/have been engineered and with the right constraints more are technically feasible… but perhaps not cost effective. But what is the catalyst that will promote more general applications? • Mobile devices and wireless networking are providing a platform which no longer requires special hardware
Mobile Devices • Examples: PDAs, Digital Cameras, Cellular Phones • Devices are becoming common and pervasive • They are becoming increasingly powerful (processor, memory, power, resolution…) • G3 networks promise multimedia support • They are easy to use • Devices go to the documents • Rapid and Flexible Acquisition • Acquisition becomes just another application of the device
How do they compare? (subjectively) ScannerCamera Resolution Adequate(?) Improving 150-600dpi Distortion Minimal Lens/Perspective Lighting Controlled Sensor and Environment Background Domain Often Complex Dependent Zoom/Focus N/A Variable Blur N/A Motion, focus Noise Minimal Sensor
Are Digital Cameras being used for text? Yes…. • Active capture of information sources [Paris 03] • Note taking during presentations [ICDAR 03] • Japan – Signs prohibit the use of digital cameras in bookstores! They are being used as portable photocopiers…. How about hardcopy documents? • Falcon MT system currently testing high resolution cameras for input to standard OCR systems … What are the challenges of imaging traditional documents?
Resolution and Large Documents • Related Work • Super-resolution • Irani (1991), Patti (1997), Capel (2000), Fekri (2000) • Mosaicing • Taylor (1997, 1999), Mirmehdi (2001), … • State of the Art • Digital Cameras: > 6 megapixels (can provide effective 300dpi) • PDAs: > 1.3 megapixels • Mobile Phones: 1 megapixel • But better cameras are on the way…. • (4 megapixels phones by 2005)
Blur from Focus/Depth of Field • The imaging plane may not be parallel documents resulting in increased blur • Frequency domain strategy • Tsai (1984), Tom (1994), Kim (1990, 1993), … • Iterative solution • Stark (1989), Tekalp (1992), Irani (1991), … • Bayesian methods • Schultz (1994, 1995), …
Lighting • Natural lighting can be uneven • Providing lighting can be challenging • Lighting correction • Global brightness / contrast • Uneven brightness • Adaptive thresholding • Too many to list …
…Motion Blur • Controlled with adequate lighting and shutter speed
Warping and Perspective Distortion • Completely arbitrary viewing angles may not be realistic however…. • Remove perspective distortion of plane document pages • Clark (2000, 2001), … • Unwarp curl pages using 3D shape: • Brown (2001), Pilu (2001), …
… imaging plane is not guaranteed to be smooth • But we simply enhance results and use existing tools?
For controlled imaging…. From scanner: 300 dpi From camera: ~200 dpi
Commercial OCR • OCR is almost identical…. From scanner From camera
Simple Rectification Original Rectified
Better OCR Results Original Rectified
Simple Unwarping • Text line straightening • Unwarping book-spine type deformation
OCR Result t(,_~.t catc+go'r'jZ(tN071 Cap("rinI,ClLt c~~'. et,io-r1. "i,•~~,tO 1 fl classes A hd°d 10 zr~ for n.tc goriz~ r (l f n(,-l'(Lll ~+`~fSet ' illi111t of t11-~'1epo7't 011, OUT ea;Pc?'i,rn.e~~ts ), 'J1~~r ~n ij11111•/iii' ~~~ c(ItPy..i,(Lt-io?L of optically jfl address the 1111119 l1fs' . the of feCts OCR errors ifl(Ly )LL v(" Olt {1~~1Et1t q~I.d-at197,,Sionahi,ty red ction. (Lnd c~,teynr.izfL_ iTII1i d t, 1 ~ep~~rt on 'uvaq~s that cateyvri.zatiorf, If~f l. f ~o~ recti,orL and rctr•ie•11al cf fect%VCflF,ss. f`11f1, help f,foJ _iiiCtion
Extracted Binary Text Rough Text Extraction Original Edge Areas Text Areas Threshold Surface
C B C B Generatrix Directrix D v u A O D A Cylindrical Model
OCR Result 4 successful text categor°izataon, capcri'raent divides tcrtual collection into pre-defined clu,.sses. A true ,lirr>sentatzve for eachh class is generally obtained tiiryttg training of the ca.tcgorzzer. jri this paper, Yale report on o•a.r ca;periro.erats o-n, t,pivtrtg and categorization of optically recognized (pr tartaerrts. In partzctalar. ure in'll address the is- ;,ie,s regarding the of fcct.s OCR, cr-ror.s m ay have on 10itaing. dirraensionality reduction, gnd categoriza_ tip. lire further report on ways that, categorization pla,Uhelp error* cor•rectlor( and retrieval effectiveness.
Line Extraction • Using Extrapolation with overlapping direction estimates
Improved OCR Result A successful text categorization experiment divides textual collection into pre-defined. classes. A true ~presentative for each class is generally obtained ing training of the categorizer. In this paper, we report on our experiments on joining and categorization of optically recognized pcuments. In particular, we will address the is- oes regarding the effects OCR errors may have on joining, dimensionality reduction, and categoriza- hon. We further- report on ways that categorization moy help error correction and retrieval effectiveness.
Open Questions • Are existing tools good enough? • Can we simply “enhance” the images • or do we need to develop new tools…. • or new constraints? • Can we make use of degradation knowledge? • apply constraints from clear parts of the document to recognize similar text blurred by perspective?
Will such devices replace scanners? No • Will they open up the market to new applications? • They already have… • Integrated Information Services • Map locations • Tourist information • Promise of DIA on text captured with digital camera (business cards, nametags, pages of notes, …) • Grand Challenges • Image Quality • Immediate feedback for “processability” • Moving processing to the device • Killer Applications
MyLifeLog • Record and Index anything and everything • Text from everywhere you have been • Identification of every document you have ever looked at (not necessarily read…) • Recall of everything you have ever written • Is it possible?
IJDAR Special Issue • Special Issue on Camera Based Text and Document Recognition • Papers Due: November 2003 http://ijdar.cfar.umd.edu/special_issues/TD-SI.html