150 likes | 275 Views
The OAI-PMH Harvester Plugin for The Omeka Content Management System. LIS 654 Building digital libraries Fall 2011 November 03, 2011. James r. griffin iii 100356891. Defining the OAI-PMH.
E N D
The OAI-PMH Harvester PluginforThe Omeka Content Management System LIS 654 Building digital libraries Fall 2011 November 03, 2011 James r. griffin iii 100356891
Defining the OAI-PMH • "The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.“1 • Thus, the OAI-PMH is a means by which to enable digital repositories to openly and freely exchange and share metadata detailing their collections with the world. 1Open archives initiative protocol for metadata harvesting. (2011). Retrieved from http://www.openarchives.org/pmh/
Installing the OAI-PMH Harvester Plugin for Omeka • Download the plug-in from the following source: http://omeka.org/add-ons/plugins/oai-pmh-harvester/ (Note: This is a ZIP archive [like other plug-ins for Omeka]) • Upload the ZIP archive to the server wotan (Note: This can be done using any scp client such as WinSCP) • Decompress the archive into the appropriate directory for your installation of Omeka (Note: This is typically the path /home/[USER NAME]/omeka/plugins/) • Using the web interface, install the harvester plug-in
The Purpose Behind the OAI-PMH • Metadata shared using the OAI-PMH is structured in a uniform manner, ensuring that metadata for all collections shared on the World Wide Web can be harvested regardless of the specific application • For example, one institution can archive content using the Drupal application as a repository, while another institution can archive content using Omeka • Using the OAI-PMH protocol, both repositories can be configured to exchange information detailing the contents of their archived collections.
Repository Interoperability • Unfortunately, not every digital repository has been developed using the same framework (or even the same programming language[s]) • Thus, if OAI-PMH were to attempt to institute language-specific standards for exchanging metadata, inevitably some repository application would be developed in an unsupported language • The solution to this is the software object
OAI-PMH Metadata Objects • For the purposes of this presentation, a software object is a means by which to structure data in a language-independent manner • As the OAI-PMH Initiative seeks to establish their contribution as the definitive standard for the exchange of repository metadata, this will increase the likelihood that future repository applications (some of which will be written in currently non-existent [i.e. future] languages) will still employ this protocol
OAI-PMH Metadata Objects • The metadata objects are transferred over the HyperText Transfer Protocol (HTTP) • This means that no platform-specific binaries must be employed in order to harvest OAI-PMH-compliant metadata • (e.g. Anyone can access information detailing the contents of these archived collections using a web browser – you do not need to purchase or install any additional software)
OAI-PMH Metadata Objects • The metadata objects are bound to/serialized using the eXtensible Markup Language (XML) • This is mentioned for the sake of those who are enrolled in LIS650, those who have previously taken LIS650, or those who are familiar with web design • For those unfamiliar with XML or web design itself, this simply means that this metadata can be extended and manipulated easily by web designers as well as developers
An Instance of an OAI-PMH Metadata Object • In order to generate OAI-PMH-compliant metadata objects for one’s collection, one must first install and configure another plugin: The OAI-PMH Repository (http://omeka.org/add-ons/plugins/oai-pmh-repository/) • Retrieving metadata from the repository: http://wotan.liu.edu/omeka/jgriffin/oai-pmh-repository/request?verb=ListRecords&metadataPrefix=oai_dc • The parameter “verb” specifies to wotan precisely what is being requested • (e.g. A list of my collections – “ListRecord”) • The parameter “metadataPrefix” specifies to wotan precisely which metadata framework to use in the formatting of the response • (e.g. “oai_dc” is the OAI’s format which is based upon the Dublin Core framework)
An Instance of an OAI-PMH Metadata Object This was retrieved by requesting the following resource: http://wotan.liu.edu/omeka/jgriffin/oai-pmh-repository/request?verb=ListRecords&metadataPrefix=oai_dc <OAI-PMH xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2011-11-03T19:46:59Z</responseDate> <!-- When I requested this object --> <request verb="ListRecords" metadataPrefix="oai_dc"> <!-- Which parameters were passed to wotan --> http://wotan.liu.edu/omeka/jgriffin/oai-pmh-repository/request </request> <ListRecords> <!-- A detailed listing of the collection records --> <record> <header> <identifier>oai:wotan.liu.edu/omeka/jgriffin/:5</identifier> <datestamp>2011-10-22T00:48:49Z</datestamp> <!– Record creation time --> <setSpec>6</setSpec> </header> <metadata> <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/http://www.ope[...]"> <!-- The Dublin Core Elements --> <dc:title>/src/bin/psql/psql.c</dc:title> <dc:creator>Regents of the University of California</dc:creator> <dc:publisher> […] </metadata> </record> </ListRecords> </OAI-PMH>
Harvesting Metadata from Remote Repositories in Omeka • The plugin has its utility in its ability to directly import data detailing items archived in a remote repository into one’s own repository • Conceptually, the mechanisms underlying this process are similar to those used in the practice of “copy cataloging”
Harvesting Metadata from Remote Repositories in Omeka • As previously specified, the server must be running an OAI-PMH repository for the archived collections • In order to demonstrate this, I can harvest from my own OAI-PMH repository: http://wotan.liu.edu/omeka/jgriffin/oai-pmh-repository/request • …as well as from L’Université Rennes 2 de la Bibliothèque Numérique*: http://bibnum.univ-rennes2.fr/oai-pmh-repository/request?verb=ListRecords&metadataPrefix=oai_dc *This source was specified by Sheila Brennan of the Roy Rosenzweig Center for History and New Media. Please see http://omeka.org/blog/2011/08/29/do-you-share-your-data/
Harvesting Metadata from Remote Repositories in Omeka • Metadata sets can be re-harvested or deleted • While a set of records are being harvested, one is offered the ability to “kill” the process • Should there be problems regarding the memory required by the harvester, one can modify the settings of the plugin • The “Memory Limit” field should only be modified if a harvest fails due to an error. • The path for the PHP binary should always be ‘/usr/bin/php5’ on wotan
The OAI-PMH Harvester Plug-In for the Omeka Digital Archive • Questions? • Comments?