1 / 22

In-House Digitization: The National Digital Newspaper Program at the University of Kentucky

In-House Digitization: The National Digital Newspaper Program at the University of Kentucky. Eric Weig Head, Digital Programs J. Wendel Cox Manager, UK National Digital Newspaper Program Project.

breck
Download Presentation

In-House Digitization: The National Digital Newspaper Program at the University of Kentucky

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In-House Digitization: The National Digital Newspaper Program at the University of Kentucky Eric Weig Head, Digital Programs J. Wendel Cox Manager, UK National Digital Newspaper Program Project

  2. National Digital Newspaper Program (NDNP): a long-term effort to digitize the nation’s historic newspapers • 20 years • all states and territories • 1836-1923

  3. NDNP is in its two-year test phase – we learn and share our experiences • six projects • newspapers from years 1900-1910 • each project produces 100,000 pages

  4. NDNP awardees create digital content in accordance with strict standards • 300-400 dpi grayscale images • scan from print master microfilm • TIFF6 format scans • deliver a TIFF, a JPG2000, a PDF and page and reel metadata files

  5. UK NDNP draws on our experience and expertise with newspapers, microfilm, and digitization

  6. Microfilm evaluation collects information – and reveals physical problems • dirty film • circulated master negatives • redox • rings from hydration

  7. Microfilm evaluation collects information – and reveals intellectual problems [1], [2], [1], [2], [3], [4], [5], [6] <05.27.1903> | splice | [3], [4], [8], [blank], [1], [2], [7], [8] <05.24.1905> 1 2 1 2 3 4 5 6 3 4 8 B 1 2 7 8

  8. Microfilm evaluation collects information – and sees metadata challenges Title: The Owingsville Outlook, Frequency: Weekly, Location: Owingsville, KY, File Number: S/83-5, Date: 1906: January 25, December 20, Notes: some pages are mutilated, *Issues this month are missing (June) Present: 1906-01-25, 1906-02-01, 1906-02-15, 1906-02-22, 1906-03-01, 1906-03-08, 1906-03-15, 1906-04-05, 1906-04-12, 1906-04-19, 1906-04-26, 1906-05-03, 1906-05-10, 1906-05-17, 1906-07-26, 1906-08-02, 1906-08-16, 1906-09-27, 1906-10-11, 1906-11-08, 1906-11-22, 1906-12-20; Missing: 1906-02-08, 1906-03-22, 1906-03-29, 1906-05-24, 1906-07-12, 1906-07-19, 1906-08-09, 1906-08-23, 1906-09-06, 1906-09-13, 1906-09-20, 1906-10-04, 1906-10-18, 1906-10-25, 1906-11-01, 1906-11-15, 1906-11-29, 1906-12-06, 1906-12-13; Incomplete: 1906-07-05, Codes: check mark=present, M=missing, I=incomplete, Mu=mutilated, NP=not published;

  9. We have decades of experience with microfilm production – but little experience with negative duplication But Shell Dunn taught herself how to make print master negatives, troubleshot problematic film, and helped solve a mystery of mottled film …

  10. How is an $84,000 scanner like a sports car? It’s fast, fun, occasionally unpredictable and sometimes just plain dangerous – and you’ll know your mechanics by name …

  11. Large-format microfilm (IA)+ NDNP image specifications ---------------------------------------------------------------------------------------- Scanning and storage challenges 72 MB for a TIFF of each IA newspaper page 576 MB for each eight-page issue 29,952 MB for one year of an eight-page weekly paper … and that’s just the TIFFs (We produce four for each page, so that’s actually 119,808 MB)

  12. What makes a good image? … and, remember, newspapers aren’t printed on white paper.

  13. Digital Production Application Framework Manages the Digitization Process Ingest Manual Process Automation

  14. Digitization Steps Before Post Processing 1. Ingest (automated) 2. Split/Deskew/Crop (manual) 3. Structural Metadata (manual) 4. Zoning for OCR (manual) 1 | 2 | 3 | 4

  15. 1. Ingest (Automated) 1 | 2 | 3 | 4 • Import images and CSV file into application framework. • Create derivative images for use in the application framework. • Create new work container in database manager.

  16. 2. Split/Deskew/Crop (Manual) 1 | 2 | 3 | 4 • Split any images from IIB oriented film so that each page image is a distinct file. • Deskew by text line for better OCR/OWR. • Crop to include page edges.

  17. 3. Structural Metadata (Manual) 1 | 2 | 3 | 4 • Key data for page numbers, reel sequence, newspaper section, and any targets included on the film.

  18. 4. Zoning for OCR/OWR (Manual) 1 | 2 | 3 | 4 • Plot division lines over page images to create templates that guide the OCR/OWR engines during their recognition process. • Ensure preservation of correct reading order in the generated searchable text.

  19. Quality Control Example: Scan through thumbnails of every page image to check for proper skew, split and crop.

  20. Output: Post Processing Automated process >>

  21. Validation of Data (Automated) • LC Digital Viewer and Validation software parses output to ensure data is present and properly formatted. • Writes digital signatures into XML files that have validated successfully.

  22. In-House Digitization: The National Digital Newspaper Program at the University of Kentucky Eric Weig Head, Digital Programs J. Wendel Cox Manager, UK National Digital Newspaper Program Project

More Related