XML Parsers

XML Parsers An XML parser is a software library or package that provides interfaces for client applications to work with an XML document. The XML Parser is designed to read the XML and create a way for programs to use XML.

Types of XML Parsers • These are the two main types of XML Parsers: • DOM • SAX

DOM (Document Object Model) • A DOM document is an object which contains all the information of an XML document. It is composed like a tree structure. The DOM Parser implements a DOM API. This API is very simple to use.

Features of DOM Parser • A DOM Parser creates an internal structure in memory which is a DOM document object and the client applications get information of the original XML document by invoking methods on this document object. • DOM Parser has a tree based structu

Advantages • 1) It supports both read and write operations and the API is very simple to use. • 2) It is preferred when random access to widely separated parts of a document is required.

Disadvantages • 1) It is memory inefficient. (consumes more memory because the whole XML document needs to loaded into memory). • 2) It is comparatively slower than other parsers.

DOM interfaces • The DOM defines several Java interfaces. Here are the most common interfaces − • Node − The base datatype of the DOM. • Element − The vast majority of the objects you'll deal with are Elements. • Attr − Represents an attribute of an element. • Text − The actual content of an Element or Attr. • Document − Represents the entire XML document. A Document object is often referred to as a DOM tree.

Common DOM methods • When you are working with DOM, there are several methods you'll use often − • Document.getDocumentElement() − Returns the root element of the document. • Node.getFirstChild() − Returns the first child of a given Node. • Node.getLastChild() − Returns the last child of a given Node. • Node.getNextSibling() − These methods return the next sibling of a given Node. • Node.getPreviousSibling() − These methods return the previous sibling of a given Node. • Node.getAttribute(attrName) − For a given Node, it returns the attribute with the requested name.

Steps to Using JDOM Following are the steps used while parsing a document using JDOM Parser. • Import XML-related packages. • Create a SAXBuilder. • Create a Document from a file or stream • Extract the root element • Examine attributes • Examine sub-elements

SAX (Simple API for XML) • A SAX Parser implements SAX API. This API is an event based API and less intuitive.

Features of SAX Parser • It does not create any internal structure. • Clients does not know what methods to call, they just overrides the methods of the API and place his own code inside method. • It is an event based parser, it works like an event handler in Java.

Advantages • 1) It is simple and memory efficient. • 2) It is very fast and works for huge documents.

Disadvantages • 1) It is event-based so its API is less intuitive. • 2) Clients never know the full information because the data is broken into pieces.

ContentHandlerInterface • This interface specifies the callback methods that the SAX parser uses to notify an application program of the components of the XML document that it has seen. • void startDocument() − Called at the beginning of a document. • void endDocument() − Called at the end of a document. • void startElement(String uri, String localName, String qName, Attributes atts) − Called at the beginning of an element. • void endElement(String uri, String localName,StringqName) − Called at the end of an element.

ContentHandler Interface • void characters(char[] ch, int start, int length) − Called when character data is encountered. • void ignorableWhitespace( char[] ch, int start, int length) − Called when a DTD is present and ignorable whitespace is encountered. • void processingInstruction(String target, String data) − Called when a processing instruction is recognized. • void setDocumentLocator(Locator locator)) − Provides a Locator that can be used to identify positions in the document.

ContentHandler Interface • void skippedEntity(String name) − Called when an unresolved entity is encountered. • void startPrefixMapping(String prefix, String uri) − Called when a new namespace mapping is defined. • void endPrefixMapping(String prefix) − Called when a namespace definition ends its scope.

Attributes Interface • This interface specifies methods for processing the attributes connected to an element. • intgetLength() − Returns number of attributes. • String getQName(int index) • String getValue(int index) • String getValue(String qname)

XML Parsers

XML Parsers

Presentation Transcript

Parsers and Grammars

Comparing Java XML parsers

XML Parsers

LR PARSERS

Parsers

XML DOM and SAX Parsers

Implementing LR Parsers

Review: LR(k) parsers

Parsers

FACS Week 15 Recursive Descent Parsers, Table Driven Recursive Descent Parsers

Parsers and Grammar

XML Parsers

Recursive Descent Parsers

High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers

Recursive Descent Parsers

MC365 XML Parsers

XML Parsers

Using XML Parsers and Unicode

Parsers and Grammars

Parsers and Grammar

Parsers