470 likes | 912 Views
METS - API. application programming interface. METS Implementors Meeting, May 8th, 2007 . Markus Enders, SUB Göttingen Jens Ludwig, SUB Göttingen. Why?. necessity of an API. Why?. METS has a complex data model:. the most common instantiation of METS is its XML form.
E N D
METS - API application programming interface METS Implementors Meeting, May 8th, 2007 Markus Enders, SUB Göttingen Jens Ludwig, SUB Göttingen
Why? necessity of an API
Why? METS has a complex data model: the most common instantiation of METS is its XML form an API should be based on the data model and is (theoretically) independent of its XML representation
Why? API should be focused on METS elements and their appropriate attributes and relationships API should support creation of METS as well: creation of invalid data should not be possible (e.g. wrong order of elements...) 100% valid METS data
Why? Multi-Tier Applications: API connects application with serialization level. API as a framework for METS creation / parsing
Why? Applikation METS API XML Repository Database
Implementation Issues: Maintainance: Changes in METS-schema must be reflected by API Programming language: more than one language should be supported multi-level access: • Granularity of access
Implementation Issues: Maintainance: Changes in METS-schema must be reflected by API Derive classes from xml-schema: e.g. Apache xmlbeans or SUN JAXB provides java classes for xml-schema Programming language: more than one language should be supported multi-level access: • Granularity of access
Implementation Issues: Maintainance: Changes in METS-schema must be reflected by API Programming language: more than one language should be supported php-java bridge: http://php-java-bridge.sourceforge.net Inline-Java perl module: http://search.cpan.org/~patl/Inline-Java/ multi-level access: • Granularity of access
Implementation Issues: Maintainance: Changes in METS-schema must be reflected by API • access to single elements / attributes • higher level for more widespread functionality Programming language: more than one language should be supported multi-level access: • Granularity of access
Implementation Issues: Apache xmlbeans based API for java Creates an interface for each schema object and an implementation to read / write this object to XML Other implementations possible (repository) Can create DOM tree at any time, e.g. if non-schema based xml-data needs to be stored.
Implementation Issues: level one: METSbeans allows acces to single METS elements, attributes and their relationships xmlbeans based API for java level two: more complex functions which are based on the METSbeans
METSbeans every type from schema becomes one class classes are generated automatically from the XML-schema additional APIs can be generated and integrated for any xml-schema based data format (e.g. MODS, premis etc.)
METSbeans internal architecture: for every type in the xml schema, an appropriate java interface exists every interface is implemented during automatic generation process additional implementations of an interface are possible – high flexibility to access mets-data outside a file system
METSbeans internal architecture: <xsd:complexType name="divType"> interface: DivType class: DivTypeImpl
METSbeans internal architecture: xmlbeans has a set of native data types: XMLObject, XMLString XMLShort, XMLTime etc...
METSbeans internal architecture: METSDocument as topmost class instantiates the document. All other objects cannot be created without this object Instance can be created by: • parsing a file • using a factory class to create new document
METSbeans snippet: MetsDocument example factory class: MetsDocument mets=MetsDocument.Factory.newInstance(); example parsing a file: try { xml = XmlObject.Factory.parse(f); } catch (XmlException e) { e.printStackTrace(); return false; } MetsDocument metsDoc=(MetsDocument) xml;
METSbeans DivType: methods for accessing <mprtr> element getMptrArray(), getMptrArray(int i), sizeOfMptrArray(), setMptrArray(Mptr[] mptrArray), setMptrArray(int i, Mptr mptr), insertNewMptr(int i), addNewMptr(); removeMptr(int i)
METSbeans DivType: methods for accessing <div> element getDivArray() getDivArray(int i) sizeOfDivArray() setDivArray(DivType[] divArray) setDivArray(int i, DivType div) insertNewDiv(int i) addNewDiv() removeDiv(int i)
METSbeans DivType: very similar methods for handling file pointers (<fptr> elements)
METSbeans DivType: methods to set attributes (id attribute) getID(); isSetID(); setID(String id); unsetID(); xsetID(org.apache.xmlbeans.XmlID id); xgetID();
METSbeans snippet: create a new <div> element MetsDocument mets=MetsDocument.Factory.newInstance(); MetsType myMets=mets.addNewMets(); StructMapType sm=myMets.addNewStructMap(); DivType div=sm.addNewDiv(); div.setTYPE("Monograph"); DivType firstchild=div.addNewDiv(); firstchild.setTYPE("TitlePage");
METSbeans snippet: saving a METS document HashMap suggestedPrefixes = new HashMap(); suggestedPrefixes.put("http://www.loc.gov/METS/", "mets"); suggestedPrefixes.put("http://www.w3.org/1999/xlink", "xlink"); XmlOptions opts = new XmlOptions(); opts.setSaveSuggestedPrefixes(suggestedPrefixes); File outputFile=new File(filename); mets.save(outputFile,opts);
METSbeans MdSecType represents the METS elements may contain: MdRef or MdWrap object <dmdSec> <techMd> <digiprovMd> <rightsMd> <sourceMd> but not: <amdSec>
METSbeans snippet: create an MdSecType object MetsDocument mets=MetsDocument.Factory.newInstance(); MetsType myMets=mets.addNewMets(); MdSecType dmdSec= myMets.addNewDmdSec(); dmdSec.setID("DMDID01"); MdSecType.MdWrap mdwrap=dmdSec.addNewMdWrap(); MdSecType.MdWrap.XmlData xmldata=mdwrap.addNewXmlData(); xmldata.set(modsObject); any XMLObject: e.g XMLString
METSbeans snippet: create an MdSecType object String: XmlString xs=XmlString.Factory.newValue("<mydata/>"); xmldata.set(xs); Document: ModsDocument modsObject=ModsDocument.Factory.newInstance(); ModsType myMods=mods.addNewMods(); IdentifierType identifier=myMods.addNewIdentifier(); .... xmldata.set(modsObject);
METSbeans parse mets data: the API provides some parse-methods: parse(java.lang.String xmlAsString) parse(java.io.File file) parse(java.net.URL u) parse(java.io.InputStream is) parse(org.w3c.dom.Node node) if the parsed data is NOT valid METS a XmlException is thrown.
METSbeans snippet: parse mets data File f=new File(filename); XmlObject xml; try { xml = XmlObject.Factory.parse(f); } catch (XmlException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); }MetsDocument metsDoc=(MetsDocument) xml;
METSbeans snippet: get a DivType MetsDocument metsDoc=(MetsDocument) xml; MetsType mets=inDoc.getMets(); StructMapType structs[]=mets.getStructMapArray(); for (int i=0; i<structs.length;i++){ StructMapType struct=structs[i]; String structtype=structs[i].getTYPE(); if ((structtype!=null)&&( structtype.equals("LOGICAL"))){ DivType div= struct.getDiv(); String divtype=div.getTYPE(); return divtype; } }
METSbeans easy to create and parse valid METS data (much easier than parsing DOM trees) easy to combine with other xml data quite fast compared to DOM Drawback: as based on xmlbeans it is only available for java; php-java / inline::java modul needed for php/perl
Helper-class Functions: Need for additional high-level functions: Though the METSbeans allow access to every single METS element, it is still a complex task to do simple things e.g. adding metadata to a <div> Helper-class needed, which sits on top of MetsBeans
Helper-class Functions: Following examples are from experiences working with METSbeans (based on METSbeans) No official implementation, just an excerpt of functions which a level 2 API could provide
Helper-class Functions: Create DMDSec for common METS-objects: createDMDSec(XMLObject inMetadata, DivType inDiv) createDMDSec(XMLObject inMetadata, FileType inFile) ...
Helper-class Functions: Create adminsitrative metadata for common METS-objects: e.g. createMDSectionInAMDSec( XMLObject inMetadata, String type, DivType inDiv, AmdSecType inAmdSec) ...
Helper-class Functions: function to retrieve special metadata sections by ID or TYPE: getMDSecTypeByID( String inID) getMDSecTypeByType( String inType) ...
Helper-class Functions: functions to get related files (to a <div> element): getAllFilesForDivType( DivType inDiv) getAllFilesForFileGroup( FileGrpType inGrp) ...
Extension schema Integration of extension schema: Export MetsBeans-objects as DOM tree. Create Beans for extensions schema as well: Premis, MODS, MIX - Beans.
Extension schema Example: create MODS data MdSecType dmdSec=mets.addNewDmdSec(); dmdSec.setID(dmdid_string); MdSecType.MdWrap mdwrap=dmdSec.addNewMdWrap(); MdSecType.MdWrap.XmlData xml=mdwrap.addNewXmlData(); ModsDocument mods=ModsDocument.Factory.newInstance(); ModsType myMods=mods.addNewMods(); xml.set(mods);
Extension schema Example: create <premis:object> data MdSecType.MdWrap mdwrap=dmdSec.addNewMdWrap(); MdSecType.MdWrap.XmlData xml=mdwrap.addNewXmlData(); ObjectDocument objdoc=ObjectDocument.Factory.newInstance(); ObjectDocument.Object premis_object=objdoc.addNewObject(); xml.set(objdoc);
Extension schema Example: parse MODS data MdSecType dmdSec; .... MdSecType.MdWrap mdw= dmdSec.getMdWrap(); MdSecType.MdWrap.XmlData xml_data=mdw.getXmlData(); String result=xml_data.xmlText(); ModsDocument mods=ModsDocument.Factory.parse(result);
Problems?! Quality of the API API depends on XML-schema; quality of API depends on quality of schema. MetsType fpr <mets> DivType for <div> MdSecType for <dmdSec>,.... but not type for METS-Header <metsHdr> as it is defined inline
Problems?! Integration of extension schema Problematic, if extension schema do not have a top-level element; especially parsing is difficult: String result=xml_data.xmlText(); ModsDocument mods=ModsDocument.Factory.parse(result); result must always contain a valid XML-document! e.g DublinCore simple
How to continue Work with METSbeans everybody can create METSbeans by him/herself -> see Apache xmlbeans Downloadable from GDZ website Will provide a primer as a non-complete documention for METSbeans.
How to continue Identify necessary functions for helper-class Over time we will identify additional methods which might be useful and should be integrated in the "helper-class".
Application Layer can be build on top of METSbeans Profile specific implementations can be build on top of METSbeans and provide an API to the underlying document/content model.
Application Layer can be build on top of METSbeans Applikation Applikation API for content model helper class METS API XML serialization