430 likes | 552 Views
Processing electronic literature: CERN case study. C. Pettenati (ETT-SI) M. Draper (ETT-DH). CERN. Presentation plan (1). The CERN Library Definitions Grey literature management Current services CERN grey literature collection Submission & Acquisition services
E N D
Processing electronic literature: CERN case study C. Pettenati (ETT-SI) M. Draper (ETT-DH) CERN
Presentation plan (1) • The CERN Library • Definitions • Grey literature management • Current services • CERN grey literature collection • Submission & Acquisition services • Consultation & Dissemination services
Presentation plan (2) • Tools available to the readers • Future perspectives for grey literature at CERN • Architecture • Hardware configuration • Software architecture • Re-usability
CERN - European Organization for Nuclear Research • European Laboratory for Particles Physics • Fundamental research • Founded in 1954 in Geneva, Switzerland • 20 member states • 540 universities and laboratories, 7000 researchers, 90 nationalities • 5 accelerators, more than 1000 experiments and collaborations • Current year budget: 939 MCHF
The CERN Library • A central unit and four satellites • Few monographs, less than 40,000 • 500 open subscriptions to scientific journals • 400 titles available electronically in full text • A very important collection of grey literature, more than 350,000 documents (with full-text electronically available from February 1994 onwards)
Definitions (1) • The CERN grey literature collection is composed • Documents prepared to be submitted to scientific journals • Documents submitted to conferences • Theses • CERN internal notes (Committee papers, Proposals) • External reports • Pictures (photos & diagrams) • Videotapes on academic training (partly “webcasted”) • Administrative Documents (separate protected access) • CERN internal publication (weekly bulletin)
Definitions (2) • Open Archive • A submission mechanism • A long term storage system • A management policy for submission and preservation • An open interface to let third parties collect data from the archive The CERN Preprint Server was an Open Archive a long while before this definition was set up last year in Santa Fe (see http://www.openarchives.org)
Grey literature acquisition procedures • Direct electronic submissions • Official series • Open series • Theses • Downloading from other grey literature servers • Los Alamos, DESY, SLAC, Fermilab, etc. • Email based application: the Uploader • Digitization of paper documents • Exchange with other labs (Annual reports) Harmonization of the record description
E-Submission Web Submission options: • Bibliographic Notice Input/Update • Fulltext document Transfer or Link (TeX, Word, PDF, HTML) • Revised version Transfer • Alert an e-mail distribution list • Forward to Printshop and Mail Office • Ask for approval (internal & scientific notes)
Provenance • More than 40,000 documents processed per year • Internal to CERN 10% • External 90%
Document prepared for publication: Preprints • They are sent to the CERN Library and at the same time submitted to the publisher of a scientific journal • They are distributed via the Library Web server the day after submission • In general they will be published much later, after 8-24 months
Preprints processing procedure Submission to the Library record, text, figures Weekly list preparation Visibility on Internet the day after Input of the publication note Record updating INSPEC, conference proceedings, SLAC db, authors, ... 1 day 1 week 8 - 18 months ???? Article publication Submission to a journal
Access to the preprint full text Access to the published text
CERN Preprint full-text server Record # 123 Author: ....... Title .... Electronic journal Publisher server EXT : URL ... Pub. note Tit. AA, vol. pp ... URL: ..... CERN algorithm
Accepted Tex/Latex Word TIFF HTML ... Distributed PDF PS HTML TIFF GIFF ... Document formats
Formats elaborated by the electronic submission • Conversion from Tex/Latex to PS • Conversion from Word to PS • Conversion from PS to PDF
Text trasmission • FTP by the author him/herself • FTP requested by the CERN Document Server • Automatic transfer from a Web server
Citations management • The document PS format is analysed and citations are automatically extracted • If the cited document is also in the CERN database a link is inserted next to the citation • The citations can not always be safely processed automatically
Documents submitted to conferences • In general they are prepared at the last minute … • Often the submission to the Library is forgotten • These documents are published later • On the conference server or • As printed conference proceedings • As independent monograph or • Included in a specialized journal • Hard and intensive work to discover them
Annual reports • In general received as exchange • More and more often available electronically • Now processed as periodicals • One record, several issues • Automatic claiming • Link to a new title if required
Theses • Degree and post-graduate • Prepared • On CERN equipment and/or • Under CERN staff supervision • In general defended 12-18 months later • Difficult to retrace
Preprints electronic submission Full Aleph FTP ILAS T ext TIFF server CERN LAN TEX, LA TEX, WORD, HTML, ... PC MAC X
CERN grey literature Internet GIF , TIFF , Aleph HTPP PS, PDF , HTML CERN LAN PC MAC X Preprints distribution Full ILAS T ext server
Architecturesoftware MySQL Database PHP/Perl scripting /hardware Configuration DB (EDS) Submission + Services EDS DOCUMENT Electronic document submission SUN SPARC 450 4 CPUs 250 MHz 80 GigaBytes CDS SUN SPARC 450 3 CPUs 300 MHz (ORACLE DB) Aleph Link Manager CERN Document Server Metadata database C programming QUERY ACCESS Aleph APIs C programming (CGI) Java interface WEBLIB WWW interface MySQL Database PHP/Perl scripting Configuration DB (WebLib)
Re-usability • Complete system • Modular: parts can be re-used • Software: • All sources are freely distributed • Databases • Aleph integrated system: commercial (Oracle based) • MySQL databases: freeware • Existing configuration tools • New functions easy to attach
Tools available to the end users • Need to involve directly the readers in the search • Four groups of tools to: • Search • Access • Transfer • Manage
Consultation & Dissemination (1) • Graphical User Interface:WebLib • All catalogues with “Find” and “Browse” • Available indexes on authors, titles, subjects, report numbers, etc. • Words searchable on all fields (including abstracts) • Output sort options • Record metadata available in HTML, LateX or PDF • Navigation & Search can be set up by institute, year, subject, etc. • Search history available • Downloading mechanism for many formats (PS, PDF, GIF, etc.) • Linking capabilities for book records to booksellers' records
Consultation & Dissemination (2) • Personal Virtual Library • Results displayed in various formats (brief, detailed or personal) • Individual Alert mechanism (SDI) to e-mail new records • Personal shelf (basket) to keep searches, items, formats & profiles • E-prints • Record description is updated with the publication notes (Journal title + vol/year + starting page number) • Dynamic linking from the notice to the published article • Dynamic linking from the citations of the document to the article • Availability of the link to users with a subscription to the e-journal
Document access tools • Web • Z39.50 client/server • Different formats (PDF, PS, TIFF, GIF, HTML, …) • Document size continuously increasing • Strong need for increased bandwith
Usage measurement • Statistics collection • By country • By IP domain • By IP number • By type of format • By slice of time
How to prepare a virtual library • The final goal is to provide the end reader with a complete toolbox to search, find, reach, download, use and manage the documents • There are no universal recipes • The CERN Library tries to find its own balance between traditional and electronic literature
Basic components of the CERN virtual (digital) library • An integrated library automation system • A graphic User Interface • A network with enough bandwidth • A CD-ROM LAN • An electronic document delivery tool • A collection of external electronic resources • Electronic journals • Grey literature servers • Use of the protocols HTTP, Z39.50 (SR-U)
Future of grey literature at CERN • Usage of the XML format • More intensive distribution before publication • Preparation of metadata done directly by the author • Use of specialized network search engine
Network Article and DC (Dublin Core) metadata Author webmaster Search Service convert DC metadata website
Conclusion • More and more important role for the grey literature • Contraction of the number of traditional scientific publications • Exponential growing of spontaneouselectronic journals