METS Java Toolkit DLF Spring Forum May 10-12, 2002, Chicago, IL Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu
Why Do We Need a Toolkit? • Automation for archiving project with multiple content providers. • METS used in hierarchical SIP • Client-side tools to produce syntactically valid SIPs • Use of METS to encapsulate complex objects, with multiple content streams. • Page turner, currently based on MOA2 METS Java Toolkit
Functional Requirements • Java API to provide support for generic METS. • Support procedural: • Construction of in-memory representation • Validation • Marshalling/unmarshalling to/from instance documents • Usable as basis for application-specific tools. • Sub-class for specific functionality or restrictions METS Java Toolkit
JAXB • API based on Sun’s JAXB specification, but not the tools. METS Java Toolkit
Toolkit API • Each schema element corresponds to a class. Mets mets = new Mets(); • Accessor/mutator methods for each attribute. mets.setID(id); String id = mets.getID(); • Accessor/mutator methods for content model. List content = Mets.getContent(); content.add(child); METS Java Toolkit
Toolkit API UML METS Java Toolkit
Why Do We Need a New API? • Why not use DOM? • Unnatural unit of granularity: elements and attributes are both nodes in DOM tree • Why not JDOM? • Explicit support for validation • JAXB compiler could (potentially) be used to support METS upgrades. METS Java Toolkit
Procedural Construction • The initial current element is <mets> • For each child element in the current element’s content model: • Instantiate an appropriate element object • Set its attributes • Define its content model • Add it to the content model of its parent METS Java Toolkit
Procedural Construction (Ex.) Mets mets = new Mets(); mets.setID ("1234"); MetsHdr metsHdr = new MetsHdr(); metsHdr.setCREATEDATE(new Date()); Agent agent = new Agent(); agent.setROLE(Role.CREATOR); Name name = new Name (); name.getContent().add(new PCData ("S. Abrams")); agent.getContent().add(name); metsHdr.getContent().add(agent); mets.getContent().add(metsHdr); ... METS Java Toolkit
Validation • Global • ID uniqueness • IDREF-to-ID consitency • Local • Existence of required attributes and content model elements Mets mets = new Mets(); ... mets.validate (); METS Java Toolkit
Marshalling • Serializing in-memory representation to an output stream. Mets mets = new Mets(); ... FileOutputStream out = new FileOutputStream("mets.xml"); mets.validate (); mets.marshal(out); METS Java Toolkit
Unmarshalling • Parsing instance document and creating in-memory representation. • Implicit local validation during parsing; global validation must be explicit. • Internal parsing with Jim Clark’s XP. FileInputStream in = new FileInputStream("mets.xml"); Mets mets = Mets.unmarshal(in); mets.validate (); ... METS Java Toolkit
Extension Schemas • Toolkit could be extended to include explicit support for additional schemas. • Generic namespace-aware Any class: Any any = new Any("elem"); any.setAttribute("attr", value); String attr = any.getAttribute("attr"); any.getContent().add(child); METS Java Toolkit
Additional Work • To be done any day now… • Support for <area>, <par>, and <seq> • Strict validation of sequence ordering • Marshal non-UTF-8 encodings • Base64 encoding/decoding methods for binData and Fcontent • Support for entity references • Diagnostic error messages METS Java Toolkit
Distribution • HUL’s intent is to make the toolkit freely available under an Open Source license. • Minimal support (if any). • Community process for maintenance? • Does an appropriate organizational home exist? METS Java Toolkit
Implementation • METS schema, Version 1.0 (zeta) • JAXB specification, Version 0.21 <http://java.sun/xml/jaxb> • XP, Version 0.5 <http://jclark.com/xml/xp> • Java J2SE and JDK 1.3.1 • Solaris 2.7 • Home page: <http://hul.harvard.edu/mets> METS Java Toolkit
import java.util.*; import org.mets.xml.bind.*; import org.mets.xml.mets.*; public class Marshal { public static void main (String [] args) { Mets mets = new Mets (); mets.setOBJID ("1234-5678(2002)9:1<>1.0.CO;9-X"); mets.setLABEL ("METS Java toolkit"); mets.setTYPE ("Article"); MetsHdr metsHdr = new MetsHdr (); metsHdr.setCREATEDATE (new Date ()); metsHdr.setRECORDSTATUS ("DRAFT"); Agent agent = new Agent (); agent.setROLE (Role.CREATOR); Name name = new Name (); name.getContent ().add (new PCData ("S. L. Abrams")); agent.getContent ().add (name); Note note = new Note () note.getContent ().add (new PCData ("HUL/OIS")); agent.getContent ().add (note); note = new Note (); note.getContent ().add (new PCData ("Special order, 2002/02/25")); agent.getContent ().add (note); metsHdr.getContent ().add (agent); AltRecordID doi = new AltRecordID (); doi.setTYPE ("DOI"); doi.getContent ().add (new PCData ("10.1234/56789")); AltRecordID nrs = new AltRecordID (); nrs.setTYPE ("NRS"); nrs.getContent ().add (new PCData ("nrs:hul.ois:10203")); metsHdr.getContent ().add (doi); metsHdr.getContent ().add (nrs); mets.getContent ().add (metsHdr); DmdSec dmdSec = new DmdSec (); dmdSec.setID ("xyz-123"); MdRef mdRef = new MdRef (); mdRef.setLOCTYPE (Loctype.DOI); MdRef.setMDTYPE (Mdtype.DC); mdRef.setMIMETYPE ("text/xml"); ... Marshal.java METS Java Toolkit
... mdRef.setXlinkHref ("10.9876/54321"); dmdSec.getContent ().add (mdRef); MdWrap mdWrap = new MdWrap (); mdWrap.setMDTYPE (Mdtype.MARC); BinData binData = new BinData (); binData.getContent ().add (new PCData ("AbC…Yz0123456789")); mdWrap.getContent ().add (binData); dmdSec.getContent ().add (mdWrap); mets.getContent ().add (dmdSec); AmdSec amdSec = new AmdSec (); TechMD techMD = new TechMD (); techMD.setID ("t-1234"); mdWrap = new MdWrap (); mdWrap.setMDTYPE (Mdtype.OTHER); mdWrap.setOTHERMDTYPE ("MyTechMD"); XmlData xmlData = new XmlData (); Any any = new Any ("my", "techMD"); any.getAttributes ().add (new Attribute ("ID", "AB123")); any.getAttributes ().add (new Attribute ("my", "type", "TIFFF")); any.getContent ().add (new PCData ("...technical MD...")); xmlData.getContent ().add (any); mdWrap.getContent ().add (xmlData); techMD.getContent ().add (mdWrap); amdSec.getContent ().add (techMD); RightsMD rightsMD = new RightsMD (); rightsMD.setID ("r-5678"); mdWrap = new MdWrap (); mdWrap.setMDTYPE (Mdtype.OTHER); mdWrap.setOTHERMDTYPE ("MyRightsMD"); xmlData = new XmlData (); any = new Any ("my", "rightsMD"); any.getContent ().add (new PCData ("...rights MD...")); xmlData.getContent ().add (any); any = new Any ("your", "rightsMD"); any.getContent ().add (new PCData ("...rights MD...")); xmlData.getContent ().add (any); any = new Any ("their", "rightsMD"); any.getContent ().add (new PCData ("...rights MD...")); ... Marshal.java (cont.) METS Java Toolkit
... xmlData.getContent ().add (any); mdWrap.getContent ().add (xmlData); rightsMD.getContent ().add (mdWrap); amdSec.getContent ().add (rightsMD); SourceMD sourceMD = new SourceMD (); sourceMD.setID ("s-9012"); mdWrap = new MdWrap (); mdWrap.setMDTYPE (Mdtype.OTHER); mdWrap.setOTHERMDTYPE ("MySourceMD"); xmlData = new XmlData (); any = new Any ("my", "sourceMD"); any.getAttributes ().add (new Attribute ("aat", "type", new Integer (178684))); any.getContent ().add (new PCData ("...source MD...")); xmlData.getContent ().add (any); mdWrap.getContent ().add (xmlData); sourceMD.getContent ().add (mdWrap); amdSec.getContent ().add (sourceMD); DigiprovMD digiprovMD = new DigiprovMD (); digiprovMD.setID ("d-3456"); mdWrap = new MdWrap (); mdWrap.setMDTYPE (Mdtype.OTHER); mdWrap.setOTHERMDTYPE ("MyDigiprovMD"); xmlData = new XmlData (); any = new Any ("my", "digiprovMD"); any.getContent ().add (new PCData ("...provenance MD...")); xmlData.getContent ().add (any); mdWrap.getContent ().add (xmlData); digiprovMD.getContent ().add (mdWrap); amdSec.getContent ().add (digiprovMD); mets.getContent ().add (amdSec); FileSec fileSec = new FileSec (); FileGrp fileGrp = new FileGrp (); fileGrp.getADMID ().add ("t-1234"); fileGrp.getADMID ().add ("s-9012"); File file = new File (); file.setID ("a1b2c3"); FLocat flocat = new FLocat (); flocat.setLOCTYPE (Loctype.URN); flocat.setXlinkHref ("urn:nid:nss"); file.getContent (). add (flocat); FContent fcontent = new FContent (); ... Marshal.java (cont.) METS Java Toolkit
... fcontent.getContent ().add (new PCData ("MS0yLTM=")); file.getContent ().add (fcontent); fileGrp.getContent ().add (file); fileSec.getContent ().add (fileGrp); mets.getContent ().add (fileSec); StructMap structMap = new StructMap (); structMap.setID ("A125"); structMap.setLABEL ("Individual volumes"); Div div = new Div (); div.setORDER (25); div.setORDERLABEL ("xxv"); div.setTYPE ("Chapter"); Div sec = new Div (); sec.setTYPE ("Section"); Div sub = new Div (); sub.setTYPE ("Sub-section"); Fptr fptr = new Fptr (); fptr.setFILEID ("a1b2c3"); sub.getContent ().add (fptr); sec.getContent ().add (sub); div.getContent ().add (sec); sec = new Div (); sec.setTYPE ("Section"); Mptr mptr = new Mptr (); mptr.setID ("123-45-6789"); mptr.setLOCTYPE (Loctype.OTHER); mptr.setOTHERLOCTYPE ("filepath"); mptr.setXlinkHref ("dir/file.xml"); sec.getContent ().add (mptr); div.getContent ().add (sec); structMap.getContent ().add (div); mets.getContent ().add (structMap); BehaviorSec behavior = new BehaviorSec (); behavior.setID ("killerapp"); behavior.getSTRUCTID ().add ("A125"); behavior.getSTRUCTID ().add ("s-9012"); Mechanism mechanism = new Mechanism (); mechanism.setLOCTYPE (Loctype.URL); mechanism.setXlinkHref ("http://host/path"); behavior.getContent ().add (mechanism); mets.getContent ().add (behavior); mets.validate (); mets.marshal (System.out); } } Marshal.java (cont.) METS Java Toolkit
<mets xmlns="http://www.loc.gov/METS/” xmlns:xlink="http://www.w3.org/1999/xlink” xmlns:xsi="http://www.w3.org/2001/XMLSchema- instance” xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd” OBJID="1234-5678(2002)9:1<>1.0.CO;9-X” LABEL="METS Java toolkit" TYPE="Article"> <metsHdr CREATEDATE="2002-03-15T161023” RECORDSTATUS="DRAFT"> <agent ROLE="CREATOR"> <name>S. L. Abrams</name> <note>HUL/OIS</note> <note>Special order, 2002/02/25</note> </agent> <altRecordID TYPE="DOI">10.1234/56789</altRecordID> <altRecordID TYPE="NRS">nrs:hul.ois:10203</altRecordID> </metsHdr> <dmdSec ID="xyz-123"> <mdRef LOCTYPE="DOI" xlink:type="simple” xlink:href="10.9876/54321" MDTYPE="DC" MIMETYPE="text/xml"/> <mdWrap MDTYPE="MARC"> <binData>AbCdEfGhIjKlMnOpQrStUvWxYz0123456789</binData> </mdWrap> </dmdSec> <amdSec> <techMD ID="t-1234"> <mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyTechMD"> <xmlData> <my:techMD ID="AB123" my:type="TIFF">...technical MD...</my:techMD> </xmlData> </mdWrap> </techMD> <rightsMD ID="r-5678"> <mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyRightsMD"> <xmlData> <my:rightsMD>...rights MD...</my:rightsMD> <your:rightsMD>...rights MD...</your:rightsMD> <their:rightsMD>...rights MD...</their:rightsMD> </xmlData> </mdWrap> </rightsMD> ... marshal.xml METS Java Toolkit
... <sourceMD ID="s-9012"> <mdWrap MDTYPE="OTHER" OTHERMDTYPE="MySourceMD"> <xmlData> <my:sourceMD aat:type="178684">...source MD...</my:sourceMD> </xmlData> </mdWrap> </sourceMD> <digiprovMD ID="d-3456"> <mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyDigiprovMD"> <xmlData> <my:digiprovMD>...provenance MD...</my:digiprovMD> </xmlData> </mdWrap> </digiprovMD> </amdSec> <fileSec> <fileGrp ADMID="t-1234 s-9012"> <file ID="a1b2c3"> <FLocat LOCTYPE="URN" xlink:type="simple” xlink:href="urn:nid:nss"/> <FContent>MS0yLTM=</FContent> </file> </fileGrp> </fileSec> <structMap ID="A125" LABEL="Individual volumes"> <div ORDER="25" ORDERLABEL="xxv" TYPE="Chapter"> <div TYPE="Section"> <div TYPE="Sub-section"> <fptr FILEID="a1b2c3"/> </div> </div> <div TYPE="Section"> <mptr ID="123-45-6789" LOCTYPE="OTHER” OTHERLOCTYPE="filepath” xlink:type="simple" xlink:href="dir/file.xml"/> </div> </div> </structMap> <behaviorSec ID="killerapp" STRUCTID="A125 s-9012"> <mechanism LOCTYPE="URL" xlink:type="simple” xlink:href="http://host/path"/> </behaviorSec> </mets> marshal.xml (cont.) METS Java Toolkit