300 likes | 411 Views
The Open Archives Initiative Story. Thomas Krichel http://openlib.org/home/krichel Uni. of Surrey, Hitotsubashi Uni., Long Island Uni. About this talk. Follows essentially a historical approach mixes in a few digital library concepts, interrupt me if you do not get some of them
E N D
The Open Archives Initiative Story Thomas Krichel http://openlib.org/home/krichel Uni. of Surrey, Hitotsubashi Uni., Long Island Uni.
About this talk • Follows essentially a historical approach • mixes in a few digital library concepts, interrupt me if you do not get some of them • does not represent an official statement • botches together various ideas from different people • benefited from funding by DLF, LANL, CLIR, JISC, DINI
UPS call 1999-07 • Ginsparg, Luce and Van de Sompel “The purpose of this call is the mobilisation of a core group to work towards achieving a universal service for author-archived literature” • emphasis on a pragmatic level of interoperability
UPS protoproto By Krichel, Nelson and Van de Sompel found that the main problems of interoperability between eprint initiative are • poor metadata • no uniform identifier structure • unclear legal terms and conditions • lack of selective harvesting
Santa Fe meeting 1999-10 • Representatives of arXiv, cogprints, Highwire, NCSTRL, NDLTD, RePEc, SLAC/SPIRES and others • chaired by Lynch and Waters • sponsored by CLIR, LANL and SPARC
basic concepts • “Managed” or formal e-print archive; not papers on the web • Open e-print archive means that there is a machine interface • “record” can be metadata or metadata & full text • archive may be partitioned
business model • Inspired by RePEc initiative • Separation between data providers and service providers Many archives Many metadata collections Many services
Metadata harvesting (not distributed database) Namespace mandatory metadata & parallel sets acceptable use registration OA Dienst subset full id=archive|record OAMS and XML transport gentleperson’s agreement in a provider statement primitive templates requirements & realisations
technical model • Subset of Dienst protocol used by NCSTRL • Compatible archive respond to 4 requests • List-Partitions • List-Meta-Formats • List-Contents (partitionspec, file-after, meta-format) • Disseminate (fullID, meta-format, content-type)
mandatory Title Date of Accession Full ID Author [R] optional Display ID [R] Abstract Subject [R]. Comment [R] Date for Discovery [R] Dublin Core-ish Minimal Metadata for selective harvesting
Implementation efforts • Implementation of Dienst subset • arXiv.org done • Cornell NCSTRL server done • WCR done • RePEc fails • Harvesting arXiv NCSTRL for a test library
Critique • Why OAMS, not Dublin Core • Dienst subset carries a lot of legacy to the full dienst protocol that.
development in DL community • Interest in interoperability for a long time, stated interest of the digital library federation • trouble: two approaches • union catalogue • causes friction • distributed search • high entry requirement • problematic to implement
Harvard meeting 2000-05 • Vision statement: SFc a new way forward for interoperability • could the OAi develop in a more general fashion such that it can be used by different communities? • political agenda of OAi (free access) perceived as problem
San Antonio meeting 2000-06 • 45 people show broad range of interest leads to problem of not getting lost. • View that SFc is a technical support infrastructure • Communities in different business and contents model can adopt the framework for interoperabilty
San Antonio meeting 2000-05 • Carl’s reverse bubble • First there was the OAi that made the SFc. • Now there is the SFc that is implemented by more than the original OAi • discussion of what changes required to the OAi • steering committee • attract funding to develop other application domain
Ithaca meeting 2000 -09 • Experience gained with implementing & discussing the current SFc specs • aim: new spec by the end of 2000 • stable for experimentation but not definite • hope to minimise risks for implementors maximise chances for interoperability • SFc+ to translate from eprint domain interoperability towards general domain interoperability
Abstract concepts to keep • open eprint archive --> open archive • data provider / service provider • archive management • issue of records needed to be discussed OAMS confuses metadata and full text
Implementation features to keep • Metadata harvesting • OAi namespace • shared metadata and parallel metadata • acceptable use • registration of data and service providers
All change please, all change... • OAi DIENST replaced by OA protocol • OAi ID revised • OAMS replaced by wrapped DC • introduction of the concept of native metadata • generalised and marginalised partitions • revisited registrations
New OAi metadata • Accession date to be renamed datestamp and stripped of semantic link to the records • Full ID kept, colon used as canonical separator • unqualified DC is mandatory, but empty DC may be returned • introduction of the idea of native metadata • OAMS scrapped, Krichel and Warner to lead an EPMS discussion
Solution: encapsulate metadata <oai> <oai.fullid>dini:01</oai.fullid> <oai:datestamp>”2000-09-21” <oai:datestamp> <dc xmlns:dc=“…”> <dc.title> Someone’s paper </dc.title> </dc> </oai>
Identifier • Identifiers point to metadata records • Concatenate • Case sensitive archive name • delimiter is a colon • anything internal to the archive appearing after that • prefixed by OAI as a pointer to a resolution mechanism
Sets • replace partitions • ONLY for a local community to implement selective harvesting • there can be zero or more sets in an archive • records can exist at interior nodes in the set hierarchy • asking for records in a set returns records in the set and in all its subsets.
OA protocol • Identify (no arguments, no exceptions) • ListMetadataFormats ([fullId]), response is the same as for the SFc • ListSets (no arguments, empty response ok) • ListRecord ([Sets] colon as separator)
OA protocol • ListContents ([sets][recordbefore] [recordafter][metaformat]) • response as before but may contain • resumption token (set,recordbefore,recordafter) • errors 206,503,302 • GetRecord (fullId) • response as before • error 404
Encoding via cgi • General syntax baseurl?verb=verbname&argname=argval... • baseurl is the location of the OA v1 protocol as registered at openarchives.org • verbname is the name of the verb • argname is the name of the attribute • argval is the value of the attribute
Registration of archives • Metadata format registration as now, names alphanumeric and underscore • Self-description introduced in the OA protocol through the identify verb • Fields of data provider templates • Natural language name • description url • archive id • maintainer (of OA interface) email • version of OA protocol used • OA base url
Conclusion • After the Ithaca work, the OAi is set for another time of testing, with a broader set of tests rather than at the first time. • Many ideosyncracies of the old SFc have been removed, and that will increase the overall acceptability. • The new version one of the OAi protocol may be a bit more complicated than the SFc, but a lot more sound. • It still is not definite.