530 likes | 678 Views
Digitization and scientific digital librar ies. Martin Lhoták Knihovna AV ČR, v. v. i. Academy of Sciences Library 3.6.2009 UISK, Universita Karlova v Praze. Content. Digitization Centre of Acad. of Sci. Library Kramerius – software for dissemination
E N D
Digitization and scientific digital libraries Martin Lhoták Knihovna AV ČR, v. v. i. Academy of Sciences Library 3.6.2009 UISK, Universita Karlova v Praze
Content • Digitization Centre of Acad. of Sci. Library • Kramerius – software for dissemination • Digital Library of the Academy of Sciences • Software for metadata creation • „Digitization Registry CZ“ project
Digitization Centre of the AS Library • In operation since1.1.2004 • Builded with support from EU Solidarity fund after floods in Czechia in 2002 • Main aim - to build a digital library of scientific publications (books, articles,…), published in the Academy of Science of the Czech Rep. Digital Library of ASCR • Partner of DML-CZ: Czech Digital Matemathical Library project since 2005
The Academy of Science of the Czech Republic • > 50 scientific institutes • 8000 employees, (4000 R&D) • > 11 000 articles, reports, etc. a year • publish > 90 journals (circa 3000 articl.) • > 100 years history
Digitization Centre of the AS Library • 1 x A0 color scanner ProServ ScanTech 600i • 1 x A1 color scanner Digibook 10000 • 2 x A2 bw scanners Zeutschel OS 7000 • 1 x A4 fast production scan. Panasonic • Staff – 8 to 10 people • Provides servis also to other institutions • Monthly production 40 - 50.000 pages • Overall production > 2.000.000 pages • Planned acquisition – ScanRobot http://www.treventus.com/
Image Adjusting • Software Book Restorer from i2S • Designed to process scanned books • Geometrical correction • Crop • Blur • Binarization • Despecle
Basic Metadata • XML (DTD of The Czech National Library) • Title basic biblographic data • Book/Journal structure • Physical size of the book/journal • Numbers of pages • Software Sirius (CZ)
OCR • Fine Reader 8.1 • 2 runs: - 1. to recognize language of paragraph - 2. to do OCR with right language OCR workflow developed by DML-CZ team of Dr. P. Sojka • Output – double layer PDF: - 1. layer scanned picture - 2. layer „OCRed“ text
Kramerius – development group and used technology • Open source – development from 2003 • Main purpose – accessing/dissemination of digitized documents (monographs and periodicals) • Czech National Library, Academy of Sciences Library, Qbizm technologies, Moravian Library in Brno • Funded mostly from Ministry of Culture and Academy of Sciences Grant Agency • Used technologies: JAVA, Linux, Apache, Tomcat, Postgres SQL, Lucene
Kramerius – current status • version: 3.3.0, build: 29.7.2008,
Kramerius – current status • DTD for periodicals a monographs • Import of XML, TXT and graphic files • Grafický formát DjVu, JPG, PNG, PDF • Fulltext search (Lucene) • Replication of the data between individual instalations • OAI-PMH – for metadata harvesting • METS, PREMIS, MIX – metadata standards
Kramerius – current status • International an national Connections: - The European Library http://www.theeuropeanlibrary.org - Uniform Innformation Gateway JIB http://www.jib.cz/ • Links to libraries OPACs • Persistent URLs enables persistent linking
Kramerius – new plans of development • Fundamental change – use of the FEDORA repository (open source USA) • Reasons – FEDORA is robust engine with support of compound objects and it is also usefull by means of long term preservation • Enhancement of administration – users and access rights • Batch operations with digitized documents • New types of docs (maps, audio, video,…)
Kramerius – institutional users • Czech National Library, Moravian Library in Brno, State Technical Library, Academy of Sciences Library • Regional Scientific Libraries: Havlíčkův Brod, Hradec Králové, Olomouc, Ostrava, Zlín • Muzeum Libraries: UPM Praha, ŽM Praha, DA Praha, MVČ Hradec Králové • In total circa 5.500.000 pages (circa 500 periodical titles amd 4500 monographs)
Academy of Sciences Digital Library • Funded by Academy of Sciences (2004-2009) • Digitization of historical issues (1890-1990), • Digitized circa 1 500 000 pages • Development of Kramerius system • Accesible 1 000 000 pages, (no articles separation) • Fulltext search • http:\\kramerius.knav.cz
Academy of Sciences Digital Library • New issues – different approach • Open source E-prints (Uni of Southampton) • Agreements with the Academy Institutes – conditions of dissemination • Final goal – merge of both digital libraries (solution probably Drupal/FEDORA – Islandora?)
Collaboration with Google • Digitized journals from Kramerius system - indexing of fulltexts, automatic detection of articles, link from Google to article’s first page or abstratct • New articles in E-prints - indexing of fulltexts, link from Google
Academy of Sciences Central Data Repository • Huge amount of data from digitization • Disk array 30 TB with mirror • Tape library up tp 500 tapes • 3 different location for long term storage • Long term preservation for R&D outputs of the Czech Academy of Sciences • Institutional Repository
System for journal publishing administration • Proven professional system (Manusript Central, Editorial Manager) • Better price for implementation and every year service fees with purchase as consortium • On-line submission system • Complete evidence of authors, reviewers and articles • Automated administration of peer review • Recently 8 journals