1 / 15

OAI @ CERN

OAI @ CERN. OAI Open Day for Europe February 26th 2001, Berlin, Germany Jean-Yves Le Meur CERN Document Server project leader. Background: CERN Library. It contains: HEP documents: preprints, books, journals, photos, notes, presentations, meeting agendas, etc

anise
Download Presentation

OAI @ CERN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OAI @ CERN OAI Open Day for Europe February 26th 2001, Berlin, Germany Jean-Yves Le Meur CERN Document Serverproject leader Jean-Yves Le Meur CERN

  2. Background: CERN Library It contains: • HEP documents: preprints, books, journals, photos, notes, presentations, meeting agendas, etc • 430 000 bibliographic records; 170 000 full text documents • Aleph 300 library system (Ex-Libris) • Customized Web interface: WebLib • Software built on top of Aleph APIs (RPC) • Two main servers: weblib and doc • A separate MySQL database for ‘non library’ documents Jean-Yves Le Meur CERN

  3. Community Users are: • Physicists at CERN and all over the world • Distinct hosts counted in 2000: • Total of 127 000distinct hosts 8 000 at CERN 93 000 outside CERN (26 000 unresolved IP) • In average, 20 000 distinct hosts per month Jean-Yves Le Meur CERN

  4. OAI @ CERN history Metadata acquisition (since 1994) • Manual: collection of scanned documents • Electronic: • Web & email submission mecanism • Uploader application for metadata transformation • Checked by human • Long term storage system with an open interface for collecting the metadata Involvement in OAI (1999) • Close follow up since Santa Fe meeting • Straightforward objectives for CERN: • Metadata exchange simplification • Metadata proof read saving Jean-Yves Le Meur CERN

  5. OAI 1.0 @CERN status A test collection: • composed of books and eprints • 30 000 records extracted from our Library system • Stored in a MySQL database (based on MARC 21) OAI 1.0 compliant with: • Three formats supported: oai_dc, oai_marc and oai_rfc1807 • All functions implemented: Identify, ListSets, ListMetadataFormats, GetRecord, ListIdentifiers, ListRecords • oai:cerncds:xxxx ready but not in production yet Jean-Yves Le Meur CERN

  6. Implementation Existing Infrastructure: • MARC 21 in use at CERN • MySQL database with PHP interfacing • Advanced search interface • Multiple output (display) formats OAI “plug-ins”: • New arguments added to search.php engine: verb=, etc • New output formats added to the supported set • About three full working days Jean-Yves Le Meur CERN

  7. Example: Identify <?xml version="1.0" encoding="UTF-8" ?> - <Identify xmlns="http://www.openarchives.org/OAI/1.0/OAI_Identify" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/1.0/OAI_Identify http://www.openarchives.org/OAI/1.0/OAI_Identify.xsd"> <responseDate>2001-02-23T10:59:44+01:00</responseDate> <requestURL>http://cdsdev.cern.ch/casalini/search.php?verb=Identify</requestURL> <repositoryName>CERN Document Server</repositoryName> <baseURL>http://cdsdev.cern.ch/casalini/search.php</baseURL> <protocolVersion>1.0</protocolVersion> <adminEmail>mailto:cds.support@cern.ch</adminEmail> - <description>- <oai-identifier xmlns="http://www.openarchives.org/OAI/oai-identifier" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/oai-identifier http://www.openarchives.org/OAI/oai-identifier.xsd"> <scheme>oai</scheme> <repositoryIdentifier>cerncds</repositoryIdentifier> <delimiter>:</delimiter> <sampleIdentifier>oai:cerncds:1</sampleIdentifier> </oai-identifier> </description>- <description>- <eprints xmlns="http://www.openarchives.org/OAI/eprints" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/eprints http://www.openarchives.org/OAI/eprints.xsd">- <content> <URL>http://cdsdev.cern.ch/casalini/</URL> </content>- <metadataPolicy> <text>Free and unlimited use by anobody.</text> <URL>http://cdsdev.cern.ch/casalini/</URL> </metadataPolicy>- <dataPolicy> <text>Full content, i.e. preprints may not be harvested by robots</text> </dataPolicy>- <submissionPolicy> <URL>http://cdsdev.cern.ch/casalini/</URL> </submissionPolicy> </eprints> </description> </Identify> Jean-Yves Le Meur CERN

  8. Example: ListMetadataFormats <?xml version="1.0" encoding="UTF-8" ?> - <ListMetadataFormats xmlns="http://www.openarchives.org/OAI/1.0/OAI_ListMetadataFormats" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/1.0/OAI_ListMetadataFormats http://www.openarchives.org/OAI/1.0/OAI_ListMetadataFormats.xsd"> <responseDate>2001-02-23T11:04:25+01:00</responseDate> <requestURL>http://cdsdev.cern.ch/casalini/search.php?verb=ListMetadataFormats</requestURL>- <metadataFormat> <metadataPrefix>oai_dc</metadataPrefix> <schema>http://www.openarchives.org/OAI/dc.xsd</schema> <metadataNamespace>http://purl.org/dc/elements/1.1/</metadataNamespace> </metadataFormat>- <metadataFormat> <metadataPrefix>oai_marc</metadataPrefix> <schema>http://www.openarchives.org/OAI/oai_marc.xsd</schema> </metadataFormat> <metadataFormat> <metadataPrefix>oai_rfc1807</metadataPrefix> <schema>http://www.openarchives.org/OAI/rfc1807.xsd</schema> </metadataFormat> </ListMetadataFormats> Jean-Yves Le Meur CERN

  9. Example: GetRecord <?xml version="1.0" encoding="UTF-8" ?> - <GetRecord xmlns="http://www.openarchives.org/OAI/1.0/OAI_GetRecord" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/1.0/OAI_GetRecord http://www.openarchives.org/OAI/1.0/OAI_GetRecord.xsd"> <responseDate>2001-02-23T11:09:17+01:00</responseDate> <requestURL>http://cdsdev.cern.ch/casalini/search.php?verb=GetRecord&identifier=oai%3Acerncds%3A2229111&metadataPrefix=oai_dc</requestURL> - <record>- <header> <identifier>oai:cerncds:2229111</identifier> <datestamp>2000-11-16</datestamp> </header>- <metadata>- <dc xmlns="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://purl.org/dc/elements/1.1/ http://www.openarchives.org/OAI/dc.xsd"> <subject>Accelerators and Storage Rings</subject> <creator>Katz, Ulrich F</creator> <title>Deep inelastic positron-proton scattering in the high-momentum-transfer regime of HERA</title> </dc> </metadata> </record> </GetRecord> Jean-Yves Le Meur CERN

  10. Example: ListIdentifiers <?xml version="1.0" encoding="UTF-8" ?> - <ListIdentifiers xmlns="http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers.xsd"> <responseDate>2001-02-23T11:15:37+01:00</responseDate> <requestURL>http://cdsdev.cern.ch/casalini/search.php?verb=ListIdentifiers</requestURL> <identifier>oai:cerncds:101</identifier> <identifier>oai:cerncds:103</identifier> <identifier>oai:cerncds:105</identifier> <identifier>oai:cerncds:107</identifier> <identifier>oai:cerncds:108</identifier> <identifier>oai:cerncds:109</identifier> <identifier>oai:cerncds:110</identifier> <identifier>oai:cerncds:112</identifier> <identifier>oai:cerncds:113</identifier> <identifier>oai:cerncds:117</identifier> <identifier>oai:cerncds:118</identifier> <identifier>oai:cerncds:119</identifier> <identifier>oai:cerncds:120</identifier> <identifier>oai:cerncds:121</identifier> ….. </ListIdentifiers> Jean-Yves Le Meur CERN

  11. Example: ListRecords (oai_marc) … <varfield id="072" i1=" " i2="7"><subfield label="a">Mathematical Physics and Mathematics</subfield><subfield label="2">CERN-CDS</subfield></varfield><varfield id="245" i1="1" i2=" "><subfield label="a">Sechs Vorträge über ausgewählte Gegenstände aus der reine Mathematik und mathematischen Physik</subfield></varfield><varfield id="909" i1="c" i2="a"><subfield label="a">BOO</subfield><subfield label="b">21</subfield></varfield></oai_marc> … <subfield label="a">Biography, Geography, History</subfield><subfield label="2">CERN-CDS</subfield></varfield><varfield id="100" i1=" " i2=" "><subfield label="a">Leroy, Francis</subfield></varfield><varfield id="245" i1="1" i2=" "><subfield label="a">Dictionnaire encyclopédique des prix Nobel de médecine</subfield></varfield><varfield id="650" i1=" " i2=" "><subfield label="a">Nobel prize winners</subfield><subfield label="a">chemistry</subfield></varfield><varfield id="909" i1="c" i2="a"><subfield label="a">BOO</subfield><subfield label="b">21</subfield></varfield></oai_marc> … Jean-Yves Le Meur CERN

  12. Implementation Issues How to limit the OAI collection ? • Sub part of the whole collection • Depending on an existing or extra field e.g.: “CERN” in the report number e.g.: OAI tag inside all records • Test collection fully separated Which identifier to use ? • Document number(meaningful but may not exist) • Internal system number(always exist but meaningless) How to define sets ? • Within the HEP data providers, subjects (sets) are different • No limitation in the length of a set ? (GET/POST) E.g.:Library_Catalogue:Articles_and_Preprints:Theses:Detectors_and_Experimental_Techniques Jean-Yves Le Meur CERN

  13. General Issues Harvester distinction ? • Kind of “OAI Intranet” would be useful • Different sets for different partners ? OpenUrl in OAI ? • OAI format already as a Web output format in our test collection (e.g.: search by author and give OAI output) • Agreed protocol necessary for searching many OAI compliant sites in parallel Full text Data provider within OAI ? • Full text exchange with agreed protocol Increase metadata quality ? • Too little mandatory tags in DC • Specific tags agreed for specific communities Jean-Yves Le Meur CERN

  14. Future Short term • CERN as data provider … for CERN specific collections • CERN as data harvester (and service provider) … for High Energy Physicists Long term hopes • All HEP institutes OAI compliant … for metadata AND data • Parallel searching possible (with OpenURL protocol) • OAI also used inside CERN between various applications (Engineering Database, Administrative Documents…) to build the CERN electronic archive Jean-Yves Le Meur CERN

  15. Questions ? http://cds.cern.ch Note: Workshop on the Open Archives Initiative and Peer Review journals in Europe CERN, Geneva, March 22-24 2001. http://doc.cern.ch/OAI/ Jean-Yves Le Meur CERN

More Related