170 likes | 426 Views
CERN Document Server. Martin Vesely CERN Geneva, Switzerland. Document Management System for Grey Literature in Networked Environment. Overview. Searching Scholarly Publications Why not to use Google? Institutional Repositories
E N D
CERN Document Server Martin Vesely CERN Geneva, Switzerland Document Management System for Grey Literature in Networked Environment
Overview • Searching Scholarly Publications • Why not to use Google? • Institutional Repositories • A natural way of document management at a place of the document origin • Open Archivesinitiative (OAi) • develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content • enhances access to e-print archives as a means of increasing the availability of scholarly communication • Protocol for Metadata Harvesting (PMH) • application-independent interoperability framework • CERN Document Server • Implementation of an institutional repository and information services with searching and harvesting capabilities
Searching Scholarly Publications “Electronic capabilities should be used to provide wide access to scholarship, encourage interdisciplinary research, and enhance interoperability and searchability. Development of common standards will be particularly important in the electronic environment” Principles for Emerging Systems of Scholarly Publishing Tempe, Arizona, March 2-4, 2000
Institutional Repositories “Digital collections capturing, preserving and disseminating the intellectual output of a single or multi-university community” SPARC The Scholarly Publishing & Academic Resource Coalition http://www.arl.org/sparc/
Open Archives Initiative • Milestones of OAi: • Oct 1999, Santa Fe Convention • Nov 2000, OAi TC meeting at CERN • Jun 2002, OAi-PMH v.2.0 released • Next: • CERN 3rd Workshop on Innovations in Scholarly Communication: Implementing the benefits of OAi 12-14th February 2004 CERN, Geneva, Switzerland http://info.web.cern.ch/info/OAIP/
Protocol for Metadata Harvesting • Services • Across institutional repositories Institutional Repositories • Application • e.g. search engine Information Services • Metadata harvesting: OAi XML • Transfer: HTTP • other options • (+) HTTP widely deployed • Transport: communication subsystem • TCP/IP (internet)
XML HTTP, Web Services Protocol for Metadata Harvesting • Independent • Storage technology • Local metadata format • Communication subsystem Data provider Service Provider • Unified • XML Schema (structure) • HTTP transfer • Data encoding • Data flow control • Common transfer metadata format
CERN Document Server • CDS – digital library for HEP community • CDSware in-house developed system • MySQL RDBMS, Apache, Python, PHP • MARC21 metadata format http://www.loc.gov/ • Document submission (with flow control) • Multilingual: UNICODE • CDSware is available as GPL http://cdsware.cern.ch/ • CVS repository access • Free download and usage
CDSware Search Engine • Metadata organized into navigable collections • In-house indexing technique to provide fast user-seen search times (fraction of a second for a typical query on a database upto size of 106 records) • User friendliness, Google-like guidance • Personalization: • Alert engine • User baskets • Combined metadata/reference/fulltext searching
admin WebAccess WebSubmit author BibConvert BibUpload admin BibHarvest OAI/Non OAI Data Provider BibSched BibIndex BibFormat admin WebAccess system librarian CDSware metadata+ data WebAccess WebAccess BibData WebPerso user admin CDSware overview WebAccess WebSearch WebBasket user BibHarvest OAI Services
Flow control Database query MARC XML / DC XML Cache CDS metadata Request parsing OAi XML OAi Request OAi Response CDSware OAi compliancy HTTP
CDSware References • CDSware used or being considered by: • University of Missouri-Columbia , USA • Fundao Oswaldo Cruz (Ministry of Health) Rio de Janeiro, Brasilia • ISDN-ENSSIB, France • Montreal International • Bologna University, Italy • ETH Zurich, Switzerland • EPF Lausanne, Switzerland • UN Population Fund, New York, USA • Instituto de investigacions Electricas, Mexico • Casalini Libri, Italy • HBZ-NRW, Germany • SDSC, USA • Aristotle University of Thessaloniki, Greece • RERO: Consortium de toutes les bibliotheques publiques de Suisse Romande, Switzerland
Archived items Books Documents at CERN CDS at CERN Articles, preprints, thesis 500 000 50 000 50 000 20 000 15 000 14 000 Talks (slides, videos) 2 500 • 650 000 records (Grey Literature > 80%) • - 220 000 full texts • - 350 different collections • 1000 new preprints per week: • 70 % from ArXiv • 5 % from CERN • 25 % from 80 other sources Conferences Multimedia items (photos, clips, press cuttings…) Journals
Interoperability Issues • Standardization efforts • XML Schemata and XSLT stylesheets have been specified (e.g. OAi-PMH) • Common metadata formats are defined (e.g. Dublin Core, MARC21) • Semantic interoperability research • Structural approaches (e.g. RDF/XML) • Ontological Interoperability • Subject of research in DL
Conclusions • Search engines for grey literature are being widely deployed and represent a central information service in scholarly communication • Institutional repositories gain momentum and become dominant over disciplinary repositories • Standardized frameworks for distributed and federated document processing have been established • Information interoperability has been achieved on the syntactic and structural/schematic level, whereas semantic interoperability remains a research issue • CDSware implementing OAi-PMH, freely available (GNU/GPL)
Contact • CERN Document Server • http://cds.cern.ch/ • http://cdsweb.cern.ch/ • CDSware sources and demo • http://cdsware.cern.ch/ • http://cdsware.cern.ch:8000/DEMOPLUS/ • Contact • cds.support@cern.ch • martin.vesely@cern.ch