210 likes | 428 Views
Open Archives Iniative – Protocol for Metadata Harvesting. Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte, KUL Joris Klerkx, KUL. What is OAI?. Harvesting standard, documented at http://www.openarchives.org/OAI/openarchivesprotocol.html
E N D
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte, KUL Joris Klerkx, KUL
What is OAI? • Harvesting standard, documented at http://www.openarchives.org/OAI/openarchivesprotocol.html • Seven service verbs • Identify • ListMetadataFormats • GetRecord • ListRecords • ListIdentifiers • ListSets • Allows multiple metadata formats • DC (Dublin core) format mandatory
OAI “VERBS” Identify ListMetadataFormats GetRecord ListIdentifiers ListRecords ListSets How OAI works Service Provider Metadata Provider H A R VESTER REPOSITORY OAI HTTP Request OAI (OAI Verb) HTTP Response (Valid XML)
Try it • Install Apache-Tomcat or any other Java servlet container • Download WAR file from http://fire.eun.org/Iztok/OAILREApp.war • Deploy WAR • Demo html http://localhost:8080/OAILREApp/ • Or type a service verb, e.g. http://localhost:8080/OAILREApp/oaiHandler?verb=Identify
The raw XML • By default, the resulting XML has stylesheet attached for pretty rendering • To remove the stylesheet comment the line OAIHandler.styleSheet=testoai/oaicat.xsl in file oaicat.properties (in WAR file or the web-app dir)
OAI XML example <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...> <responseDate>2007-06-11T06:48:58Z</responseDate> <request metadataPrefix="oai_lom" verb="ListRecords">http://localhost:8080/OAILREApp/oaiHandler</request> <ListRecords> <record> <header> <identifier>oai:oai.xyz-repository.com:exercises/112553</identifier> <datestamp>2007-06-09T22:38:28Z</datestamp> <setSpec>exercises</setSpec> </header> <metadata> <lom xmlns=...> ... </lom> </metadata> </record> .... <resumptionToken expirationDate="2007-06-11T07:48:58Z" completeListSize="42" cursor="10">1181544538265</resumptionToken> </ListRecords> </OAI-PMH>
OAICat - a Java implementation • OAICat home at http://www.oclc.org/research/software/oai/cat.htm • Takes care of • web service details • OAI XML specification • The implementer has to provide three classes • RepositoryOAICatalog • RepositoryRecordFactory • Repository2oai_dc (lom, ...)- usually more than one
A sample implementation (Source code and libs inhttp://fire.eun.org/Iztok/OAILREApp.zip) • Create a new web module • Add servlet oaiHandler to web.xml <servlet> <servlet-name>LreOAIHandler</servlet-name> <servlet-class>ORG.oclc.oai.server.OAIHandler</servlet-class> <load-on-startup>5</load-on-startup> </servlet> <servlet-mapping> <servlet-name>LreOAIHandler</servlet-name> <url-pattern>/oaiHandler</url-pattern> </servlet-mapping>
(cont) • Define properties file location <context-param> <param-name>properties</param-name> <param-value>oaicat.properties</param-value> </context-param> • Welcome file for testing <welcome-file-list> <welcome-file>testoai/index.html</welcome-file> </welcome-file-list>
Sample record • A record with basic fieldsid, url, title, descr and date • SampleOAICatalog contains an array with 3 sample records
SampleOAICatalog.listIdentifiers • Parameters • from – date to harvest from (String in iso8601 format) • date or datetime - depends on granularity • to – date to harvest to • set – a set name, list only records from this set (if null, list all records) • set names classify objects in natural groups • every record may belong to multiple sets (or none) • metadaPrefix – list only records that support this format (sample formats: oai_dc, oai_lom, ...)
SampleOAICatalog.listIdentifiers • Must return a map with to fields • headers – a String iterator of OAI headers • identifiers – a String iterator of OAI identifiers • Both created by the call (rec is a SampleRecord) String[] header = getRecordFactory().createHeader(rec); headers.add(header[0]); identifiers.add(header[1]); • Create result Map<String, Object> listIdMap = new HashMap<String, Object>(); listIdMap.put("headers", headers.iterator()); listIdMap.put("identifiers", identifiers.iterator()); return listIdMap;
getRecordFactory().createHeader(rec) • Creates header by calling the methods in SampleRecordFactory • String getOAIIdentifier(Object rec) • return full oai identifier “oai:oay.rep.com:id001” • String getDatestamp(Object rec) • returns date in iso8601 format • Iterator<String>getSetSpecs (Object rec) ArrayList<String> list = new ArrayList<String>(); list.add(...); return list.iterator(); • Iterator<String>getAbouts (Object rec) • String fromOAIIdentifier(String id) • helper method – convert id to a local id
SampleOAICatalog.listSets • takes no parameters, returns the list of all sets in this repository • each ListIdentifiers or ListRecords query may contain a set name, limiting the results to just one set
SampleOAICatalog.getSchemaLocations • like GetRecord, but returns the Vector of all metadata schema locations the record supports • to obtain them, just callgetRecordFactory().getSchemaLocations(rec);
SampleOAICatalog.getRecord • String getRecord(String id, String metadataPrefix) • find record and convert it to xml string (<record> element) • id is in global format – to get local value call getRecordFactory().fromOAIIdentifier(id) • throw IdDoesNotExistException if record not found • to generate XML use constructRecordconstructRecord(rec, metadataPrefix)
SampleOAICatalog.listRecords • just like ListIdentifiers, only generates a list of XML <record> elements • return a map with one elementMap<String, Object> listRecMap = new HashMap<String, Object>(); listRecMap.put(“records", records.iterator());return listRecMap;
Crosswalks • Conversions of native record type to XML like Sample2oai_lom or Sample2oai_dc • Only two methods per implementation • boolean isAvailableFor(Object rec) • String createMetadata(Object rec)SampleRecord record = (SampleRecord) rec;return LOMFormat.writeStringWithSchema(record.toLOM()); • throw CannotDisseminateFormatException if the metadata not available in this format
SampleRecord.toLOM • uses LOM-j lib to quickly hack together LOMhttp://sourceforge.net/projects/lom-j/ • automatic serialization/deserialization of LOM and DC XML formats • Example lom.newGeneral().newIdentifier(0).newCatalog().setString("lre"); lom.newGeneral().newIdentifier(0).newEntry().setString("sample:" + id); lom.newTechnical().newLocation(-1).setString(url); lom.newGeneral().newTitle().newString(0).newLanguage().setValue("en"); lom.newGeneral().newTitle().newString(0).setString(title);
Resumption • A repository usually has fixed limit on the numer of records to return in one call • if there are more available, it returns a resumption token, allowing to receive next packet • Implemented by functions listIdentifiers(String resumptionToken) , listRecords(String resumptionToken) • see XYZOAICatalog for details
References • http://www.openarchives.org/OAI/openarchivesprotocol.html • http://www.fmf.uni-lj.si/~kavkler/ • http://www.oclc.org/research/software/oai/cat.htm • http://www.cs.kuleuven.ac.be/~hmdb/SqiOaiMelt • http://sourceforge.net/projects/lom-j/ • SIO/Trubar OAI urlhttp://sio.edus.si/LreTomcat/