150 likes | 174 Views
This document covers the topics of document image retrieval, representation, and retrieval techniques. It includes information on document image processing, indexing, and the use of transducers for search capabilities. Examples and applications of document image retrieval are also discussed.
E N D
Scanned Documents INST 734 Module 10 Doug Oard
Agenda • Document image retrieval • Representation • Retrieval Thanks for David Doermann for most of these slides
Expanding the Search Space Scanned Docs Identity: Harriet “… Later, I learned that John had not heard …”
OCR MT Handwriting Speech Fun with Transducers Searchable Fraction Transducer Capabilities
Document Image Processing Document Image Retrieval Information Retrieval
Indexing Page Images Page Image Structure Representation Document Scanner Page Decomposition Text Regions Character or Shape Codes Optical Character Recognition
Some Examples • Google Books • http://books.google.com/ • Legacy Tobacco Documents Library • http://legacy.library.ucsf.edu/ • George Washington’s Papers • http://ciir.cs.umass.edu/irdemo/hw-demo/
More Applications • Collection management • Who has other versions of the same document? • Knowledge management • Find everything I have ever xeroxed • Declassification • How have similar documents been redacted? • Case management for litigation • Which documents to use at trial?
Agenda • Document image retrieval • Representation • Retrieval Thanks for David Doermann for most of these slides