190 likes | 211 Views
Explore the challenges and solutions for managing scientific data in the digital age at CERN. Discover the significance of digital libraries, open access publishing, archival preservation, and more in scholarly communication. Learn about data retrieval, indexing, and the transition from print to digital formats. Join the discussion on the future of information dissemination and collaboration in the scientific community.
E N D
Digital Libraries and e-Archiving at CERNChallenges and Solutions for the Scientific Community “First” 28th September 2006 Tim Smith CERN/IT
Why Such A Hot Topic? • Software: ... • National repositories: ... • National strategies: ... • International initiatives: The European Library ... • Conferences: ECDL, iPres, ... • Industry: Google Scholar / Book • WWW + Google + Internet archive • Not enough? • Data ≠ Information ≠ Knowledge
Scholarly Communication Publisher Copy editing Consistency Conventions Refereeing Publication Dissemination Library Subscription Collection mgmt Classification Cataloguing Indexing Reference retrieval Archival Search Access Reader Library/Journal Subscription Communities Find Author Manuscript preparation Digital Library WWW
Digital Library Services Aggregation Collection Conversion > 100 sources Expose CERN authored material Organisation Enrichment Stamping Watermarking Indexing Ranking Clustering Classifying
Open Access • Scholarly publication ≠ trade publication • Signatory of Berlin Declaration • Author grants • free, irrevocable, worldwide, perpetual right of access, … • Store in repository • Unrestricted distribution, interoperability, long-term archiving, …
Digital Age Services • Thus far, changed form not function • Reproduced paper chain • Take advantage of native digital services • Collaboration • Comments, reviews, baskets • Immediacy • Email alerts, RSS feeds • Intensive tasks • Keyword & citation extraction • Full text indexing & ranking • Conversion services: multiple download formats • Flexible formats • Remove constraints of print versions • Internationalisation
Digital Age Processes • Thus far, same actors and processes • Print medium was difficult to produce, distribute, archive, duplicate • Not so for electronic media ! • Publishers role: certification and dissemination • How to get in (digital world) • Authority, Authenticity, Quality • Exploring new forms of peer review • Open Access publishing: CERN initiative • Author-pay model • Break the vicious circle: Tenure / grant allocation
Advocacy and Coverage • Legal deposit • Natural focal point: everything passed through publisher/printer • Encouraging / promoting deposit • CERN publishing policy – deposit in eArchive • Harvesting • CDS missing submissions • Theoretical papers: close to 100% • Experimental papers: average, about 70% • Instrumentation papers: only 30%
Digital Age Content • Multimedia • CPU intensive services: web download format preparation from masters • Data behind the publication • Experimental data sets • Log books • Institutional information • Multimedia records of the experiment life-cycle • Financial, social etc • Dissemination of unfinished, unrefereed work
Video Archives EGEE Interview: Bob Jones 0120kbps (2439 kb), 0480kbps (9814 kb), 1000kbps (20702 kb) 2000kbps (40092 kb), Multirate120 1000kbps (32977 kb)
CDS Content and Usage 70% non-CERN
Not “born-digital” • Multimedia archive project • Meta data: key to retrieval • Photo-caption project (retirees) VHS 1980s Beta SP 1980s Open reel Audio 1950s U-matic 1970s
Digitisation for Preservation • Deposit in Digital Library • Improve access • Halt deterioration of objects • Archiving of knowledge to preserve perennial access • Institutional archives • Subject Archives • Digital preservation needs • Strategies • Certification • Networks of backups • Storage model
Perpetual Access • Active curation • Used to be largely passive until conservation work required • Technology obsolescence • Not always possible to create exact digital copy or replicate appearance • Changing media or file format • Need to verify integrity, authenticity, reliability • Audit trails and check sums: to eliminate transcription errors (or deliberate) • Associated metadata • Digital object and meta data encapsulation: ISO14721 OAIS • Multiple copies for security • Across different administrations: Los Alamos declass reps • LOCKSS and CLOCKSS
Outlook • CERN is implementing solutions to manage 100s of PBs of LHC data • CERN’s knowledge is being amassed in a Digital Library which is “safe on a 10yr timescale” • DB migration, redundancy, backups • Long term preservation (100yr timescale) is an unsolved problem, but lots of initiatives • Bringing together IT specialists, librarians, archivists, museum curators, (authors) ...