1 / 31

DML-CZ: Scanning and adjusting the images

DML-CZ: Scanning and adjusting the images. Martin Lhoták Academy of Sciences Library Launching the DML-CZ 11. 5. 2008 Prague. DML-CZ Workflow. Preparation Scanning and adjusting the images OCR Metadata harvesting (MR, ZBL) Integration Digital Library. Content.

lin
Download Presentation

DML-CZ: Scanning and adjusting the images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DML-CZ:Scanning and adjustingthe images Martin Lhoták Academy of Sciences Library Launching the DML-CZ 11.5.2008 Prague

  2. DML-CZ Workflow • Preparation • Scanning and adjusting the images • OCR • Metadata harvesting (MR, ZBL) • Integration • Digital Library

  3. Content • Digitization Centre of the AS Library • Scanning • Adjusting the images • Basic metadada • OCR • Backup and movement of the data • Production till now

  4. Digitization Centre of the AS Library • In operation since1.1.2004 • Builded with support from EU Solidarity fund after floods in Czechia in 2002 • Main aim - to build a digital library of scientific publications, published in the Academy of Science of the Czech Rep. Digital Library of ASCR • Partner of DML-CZ project since 2005

  5. The Academy of Science of the Czech Republic • > 50 scientific institutes • 7500 employees, (4000 R&D) • > 11 000 articles, reports, etc. a year • publish > 90 journals (circa 3000 articl.) • > 100 years history

  6. Digitization Centre of the AS Library • 2 x A2 bw scanners Zeutschel OS 7000 • 1 x A1 color scanner Digibook 10000 • 1 x A4 fast production scan. Panasonic • Staff – 8 to 10 people • Monthly production 40 - 50.000 pages • Overall production > 2.000.000 pages

  7. DML-CZ: Scanning • 2 x A2 bw scanners Zeutschel OS 7000 • 600 DPI • 4 bit greyscale • 1 page = 1 file • usually A5 • TIFF with lossless LZW compression circa 10 MB

  8. Image Adjusting • Software Book Restorer from i2S • Designed to process scanned books • Geometrical correction • Crop • Blur • Binarization • Despecle

  9. Basic Metadata • XML (DTD of The Czech National Library) • Title basic biblographic data • Physical size of the journal • Numbers of pages • Software Sirius (CZ)

  10. OCR • Fine Reader 8.1 • 2 runs: - 1. to recognize language of paragraph - 2. to do OCR with right language OCR workflow developed by team of Dr. P. Sojka • Output – double layer PDF: - 1. layer scanned picture - 2. layer „OCRed“ text

  11. Back up and movement of the data • Main steps and outputs: 1. scanning – TIFF 2. image adjust. and basic metadata – TIFF, XML 3. OCR – PDF • After each step above: One copy to server in Brno Two copies on LTO tapes

  12. Production for DML-CZ till now • Scanning: 97 268 pages • Image adjust.: 123 961 pages • Basic metadata: 96 009 pages • OCR: 126 278 pages Disproportion: some data was obtained from GDZ Goettingen

  13. Alternative output of the Acad. of Sci. mathematic http://kramerius.lib.cas.cz

  14. Thank you! Questions? Martin Lhoták lhotak@knav.cz www.knav.cz

More Related