1 / 19

SAX

SAX. What is SAX. SAX 1.0 was released on May 11, 1998. SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations in most languages

dorit
Download Presentation

SAX

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAX

  2. What is SAX • SAX 1.0 was released on May 11, 1998. • SAX is a common, event-based API for parsing XML documents • Primarily a Java API but there implementations in most languages • The current version is SAX 2.0.1, and there are versions for several programming language environments other than Java

  3. How does SAX work • An XML document is seen as a series of “events” • Unlike DOM, SAX does not store information in an internal tree structure • SAX is able to parse huge documents (think gigabytes) without having to allocate large amounts of system resources • If processing is built as a pipeline, it doesn’t have to wait for the data to be converted to an object; it can go to the next process once it clears the preceding callback method • SAX does not allow random access to the file; it proceeds in a single pass, firing events as it goes

  4. SAX Structure(1/4)

  5. SAX Structure(2/4) • SAXParserFactory:A SAXParserFactory object creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory. • SAXParser:The SAXParser interface defines several kinds of parse() methods. In general, it passes an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object. • SAXReader:The SAXParser wraps a SAXReader. Typically, it doesn't care about that, but every once in a while it needs to get hold of it using SAXParser's getXMLReader() so that it can configure it. It is the SAXReader that carries on the conversation with the SAX event handlers it defines.

  6. SAX Structure(3/4) • DefaultHandler:Not shown in the diagram, a DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so it can override only the ones it is interested in. • ContentHandler:Methods such as startDocument, endDocument, startElement, and endElement are invoked when an XML tag is recognized. This interface also defines the methods characters and processingInstruction, which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively. • EntityResolver:The resolve Entity method is invoked when the parser must identify data identified by a URI

  7. SAX Structure(4/4) • ErrorHandler:Methods error, fatalError, and warning are invoked in response to various parsing errors. The default error handler throws an exception for fatal errors and ignores other errors (including validation errors). That's one reason you need to know something about the SAX parser, even if you are using the DOM. Sometimes, the application may be able to recover from a validation error. Other times, it may need to generate an exception. To ensure the correct handling, you'll need to supply your own error handler to the parser. • DTDHandler:Defines methods you will generally never be called upon to use. Used when processing a DTD to recognize and act on declarations for an unparsed entity.

  8. SAX Event • startDocument • endDocument • startElement • endElement • characters

  9. Pull Parsing Versus Push Parsing • Streaming pull parsing refers to a programming model in which a client application calls methods on an XML parsing library when it needs to interact with an XML infoset--that is, the client only gets (pulls) XML data when it explicitly asks for it. • Streaming push parsing refers to a programming model in which an XML parser sends (pushes) XML data to the client as the parser encounters elements in an XML infoset--that is, the parser sends the data whether or not the client is ready to use it at that time.

  10. XML Parser API Feature Summary

  11. XML Parser and APIs supporting SAX • Xerces • Xerces is a family of software packages for parsing and manipulating XML, part of the Apache XML project • MSXML • Microsoft XML Core Services (MSXML) is a set of services that allow applications written in JScript, VBScript and Microsoft Visual Studio 6.0 to build XML-based applications • Crimson XML • JAXP: Java API for XML Processing • The Java API for XML Processing, or JAXP, is one of the Java XML programming APIs. It provides the capability of validating and parsing XML documents

  12. SAX Example

  13. public class MySAXApp extends DefaultHandler { XMLReader xr = XMLReaderFactory.createXMLReader(); MySAXApp handler = new MySAXApp(); xr.setContentHandler(handler); xr.setErrorHandler(handler); FileReader r = new FileReader(file); xr.parse(new InputSource(r)); //////////////////////////////////////////////////////////////////// // Event handlers. //////////////////////////////////////////////////////////////////// }

  14. public void startDocument () { // TODO: add customized code here } public void endDocument () { // TODO: add customized code here } public void startElement (String uri, String name, String qName, Attributes atts) { // TODO: add customized code here } public void endElement (String uri, String name, String qName) { // TODO: add customized code here }

  15. Applications of XML Stream Processing • content-based XML routing • selective dissemination of information • continuous queries • processing of scientific data stored in large XML files

  16. Selective Dissemination of Information • The use of selective approaches to dissemination in order to avoid users with unnecessary information. • Applications: • stock and sports tickers • traffic information systems • electronic personalized newspapers • entertainment delivery

  17. Typical SDI Systems • Representation of user profiles • simple keyword matching • “bag of words” Information Retrieval (IR) techniques • Limited ability • Inefficiency of filtering

  18. Selective Dissemination of Information

  19. References • M. Altinel, M. J. Franklin. Efficient Filtering of XML Documents for Selective Dissemination of Information. In VLDB Conf., Sep. 2000. • Y. Diao, P. Fischer, M. Franklin, and R. To. Yfilter: Efficient and scalable Filtering of XML documents. In Proceedings of the International Conference on Data Engineering, San Jose, California, February 2002.

More Related