70 likes | 169 Views
XML Parsing Using Java APIs. AIP Independence project Fall 2010. XML overview. XML (eXtensible Markup Language) is a language specification created by the W3C A very general version of HTML Format takes the form of arbitrary tags that contain information
E N D
XML Parsing Using Java APIs AIP Independence project Fall 2010
XML overview • XML (eXtensible Markup Language) is a language specification created by the W3C • A very general version of HTML • Format takes the form of arbitrary tags that contain information • e.g. <recordCreationDate encoding="w3cdtf">2010-10-06</recordCreationDate> • These tags are defined in XML schema documents (.xsd)
JAXP Java API for XML Processing, the default Java XML library There are two main default interfaces
SAX • SAX (Simple API for XML) is used for serial reading, analogous to a file stream • Faster and uses less memory • Doesn’t store the XML file in memory • The user is responsible for keeping track of needed data
DOM • DOM (Document-Object Model) • Creates an actual internal tree representation of the XML • Provides non-sequential access, allowing data to be manipulated at will • Slower and takes more memory
A related API: JAXB • Java API for XML Binding • A separate and somewhat more sophisticated approach • Using the schema document, XML tags are bound as actual Java objects • Allows intuitive coding, but also memory-intensive
A simple example This program uses SAX to print the provided sample MODS document Doesn’t apply any formatting or try to figure out how to use the information yet, but this should be possible using the MODS specification