250 likes | 324 Views
XML Parsers. Overview Types of parsers Using XML parsers SAX DOM DOM versus SAX Products Conclusion. Types of Parsers. There are several different ways to categorise parsers: Validating versus non-validating parsers Parsers that support the Document Object Model
E N D
XML Parsers Overview • Types of parsers • Using XML parsers • SAX • DOM • DOM versus SAX • Products • Conclusion
Types of Parsers There are several different ways to categorise parsers: • Validating versus non-validating parsers • Parsers that support the Document Object Model (DOM) • Parsers that support the Simple API for XML (SAX) • Parsers written in a particular language (Java, C++, Perl, etc.)
Non-validating Parsers • Speed and efficiency • It takes a significant amount of effort for an XML parser to process a DTD and make sure that every element in an XML document follows the rules of the DTD. • If only want to find tags and extract information - use non-validating
Using XML Parsers • Three basic steps to use an XML parser • Create a parser object • Pass your XML document to the parser • Process the results • Generally, writing out XML is outside scope of parsers (though some may implement proprietary mechanisms)
Parsing XML Two established API's: • SAX (Simple API for XML) • Define handlers containing methods as XML parsed • DOM (Document Object Model) • Defines a logical tree representing the parsed XML
Parsing XML: DOM • Document Object Model • standard API for accessing and creating XML data • tree-based • programming language indepedent • developed by W3C • whole document is read into memory • read and write
Creating a DOM Tree API Application • A DOM implementation will have a method to pass a XML file to a factory object that will return a Document object that represents root element of whole document • After this, may use DOM standard interface to interact with XML structure
Parsing XML: DOM XML File DOM Tree
DOM Interfaces • The DOM defines several interfaces • Node The base data type of the DOM • Element Represents element • Attr Represents an attribute of an element • Text The content of an element or attribute • Document Represents the entire XML document. A Document object is often referred to as a DOM tree
DOM Level • DOM Level 1 - basic functionality for document navigation and manipulation. • DOM Level 2 - includes a style sheet object model - defines an event model and provides support for XML namespaces. • DOM Level 3 - still under development - addresses document loading and saving - content model (DTDs and schemas) with document validation support.
Parsing XML: SAX • Simple API for XML • API for accessing xml data • event based • programming language indepedent • application has to store fragments into memory • read only
Parsing XML: SAX • SAX is an interface to the XML parser based on streaming and call-backs • You need to implement the HandlerBase interface : • startDocument, endDocument • startElement, endElement • characters • warning, error, fatalError
Parsing XML: SAX XML File SAX calls
SAX versus DOM DOM: • read and write • need to move back and forth in data • document is human created SAX: • read only • huge data or streams • data is machine generated
DOM pro and contra PRO • The file is parsed only once. • High navigation abilities : this is the aim of the DOM design. CONTRA • More memory needed since the XML tree is in memory.
SAX pro and contra PRO • Low memory needs since the XML file is never entirely in memory • Can deal with XML streams CONTRA • The file has to be parsed entirely to access any node. Thus, getting the 10 nodes included in a catalog ended up in parsing 10 times the same file. • Poor navigation abilities : no way to get easily the children of a given node or the list of "B" nodes
SAX versus DOM • If your document is very large and you only need a few elements - use SAX • If you need to process many elements and perform operations on XML - use DOM • If you need to access the XML many times - use DOM
Parser Products • Xerces4J / Xerces4C++ (Apache) • James Clark’s XP (Java) • IBM XML4J / XML4C++ • Java Project X (Sun) • Oracle’s XML Parser for Java • MSXML (Microsoft) • Dan Connolly’s XML Parser (Phyton) • …
Conclusion • The parser is key building block for every XML application. • When building XML applications, you have to think how will you handle large chunks of data • Choosing between SAX and DOM is not always trivial
The End Questions? Thank you!