190 likes | 415 Views
Metadata management and digitization project for CERN Council documents - Sandrine Reyes - CERN Scientific Information Service. Council. Project. Analysis. Goal. Procedure. Results. CERN Council. The highest authority of the Organization 4 Meetings / year
E N D
Metadata management and digitization project for CERN Council documents - Sandrine Reyes - CERN Scientific Information Service Council Project Analysis Goal Procedure Results
CERN Council The highest authority of the Organization 4 Meetings / year Delegations from 20 Members States It controls CERN’s activities in scientific, technical and administrative matters Appointment of the Director-General and Senior staff Adoption of budgets Approval of the scientific programmes Research and development for present and future accelerators at CERN etc.… According to the CERN Convention adopted the 1st July 1953, the Council is assisted by Scientific Policy Committee (SPC) Finance Committee (FC) Council Project Analysis Goal Procedure Results
Council Project Analysis Goal Procedure Results Digitization Project • In September 2008 the CERN Council: "approved the President’s proposal to have all paper copies of past Council documents scanned with a view to making them available electronically from the new Council web pages". • This digitization project started in January 2009 with the Council documents, followed by the Finance Committee documents.
Council Project Analysis Goal Procedure Results Documents • More than 21 000 minutes and related documents published in 2 CERN official languages: French - English • 7 500 for C & CC • 12 500 for FC • 1 500 for SPC • Documents concern 1, 2 or 3 committees • Each document has minimum 1 report number CERN/**** CERN/CC/**** CERN/FC/**** CERN/SPC/**** • CERN Access rules: respect the confidentiality of the documents • 30 years for Committee of Council, FC, SPC documents • 5 years for Council documents non-confidential • 30 years for Council documents confidential
Metadata More than 21 000 records accessible in CDS CDS metadata imported from the Council secretariat FilemakerPro Database Metadata not in MARC21 Format 2 records: English & French Records: Missing (1 language or both) Fulltexts: Existing, incorrect or missing Council Project Analysis Goal Procedure Results
Tools CDSWeb : BibEdit: MARC21 Editor (Cataloguing Module) Invenio database (OPAC Module), Submission with GMI 2 Working methods: Item by item: with BibEdit as editor Global method: CDS Data extration with Boolean searches, on Unix Operating system with Emacs Editor Council Project Analysis Goal Procedure Results
ToolsRecords on BibEdit – French and English versions Council Project Analysis Goal Procedure Results
Council Project Analysis Goal Procedure Results Goal • Obtain 1 record : Merge English + French records • Standardization of data • Adapt the records to the MARC21 format • Complete the cataloguing • Access to the English and French fulltexts including Optical Character Recognition (OCR) • Respect access rules • Improve CDS visualisation (brief and detailed formats)
Council Project Analysis Goal Procedure Results Procedure Cataloguing template - Standardisation • Establish a cataloguing template describing all fields used • Standardization of the fields: 111, 711, 269 269__c : 16 / 17 juin 1965 ➩269__c : 16 - 17 Jun 1965 269__c : 6 février 1965 ➩ 269__c : 06 Feb 1965
Council Project Analysis Goal Procedure Results Cataloguing template
Council Project Analysis Goal Procedure Results Procedure Uploader – Format MARC21 • The Uploader program, thanks to a specific configuration, adapts the metadata to the MARC21 format
Council Databases No System No System Excel Emacs Unix CDS-Invenio Barcode data Report Number Council Project Analysis Goal Procedure Results Procedure Barcodes - Digitization • Equip the documents with barcodes • Add the barcode data in field 088__9 in the English records: Data extraction and Importation • Send the documents to the scanning service(CERN Printshop or Digitization Company in India) Importation
Council Project Analysis Goal Procedure Results Procedure Excel – Emacs Unix
Council Project Analysis Goal Procedure 245__a 246__a Results ProcedureFrench titles • Add the French titles in the 246__a field of the English records: Extraction and importation of data (with Boolean formulae)➩Respect the UTF8 Character encoding Take the content of 245__a field in the French records to add it in 246__a field in the English Record + $$iTitre francais
Council Project Analysis Goal Procedure Results ProcedureFrench records – Digitization OCR - ChKall • Delete the French records (980__a:DELETE…) • Digitization of documents • by the CERN Printshop • Run the Xenu Program to detect incorrect URLs • Send the URLs to the Computing Service for OCRopus program (developed by Google) for OCR • by Indian Company • Ftp from Indian Server to download the fulltexts on CDSWeb • Script runs to find the System Numbers and the language of the document • URLs are generated with BibDocFile in the records • CheckProgram (ChKall) – Verification tool developed to check the correct formatting of metadata
Council Project Analysis Goal Procedure Results ResultsVisualisation in CDS • Brief format • Detailed format
Council Project Analysis Goal Procedure Results ResultsSearch in CDS metadata
Council Project Analysis Goal Procedure Results ResultsSearch in CDS fulltexts
Council Project Analysis Goal Procedure Results Question ?