1 / 22

Streaming API for XML (stAX)

Streaming API for XML (stAX). Cheng-Chia Chen. XML API styles. Push: SAX, XNI Tree: DOM, JDOM, XOM, ElectricXML, dom4j, Sparta Data binding: Castor, Zeus, JAXB Pull: XMLPULL, StAX, NekoPull Transform: XSLT, TrAX, XQuery. What is pull parsing ?. SAX:push parsing (event driven)

ciqala
Download Presentation

Streaming API for XML (stAX)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Streaming API for XML(stAX) Cheng-Chia Chen

  2. XML API styles • Push: • SAX, XNI • Tree: • DOM, JDOM, XOM, ElectricXML, dom4j, Sparta • Data binding: • Castor, Zeus, JAXB • Pull: • XMLPULL, StAX, NekoPull • Transform: • XSLT, TrAX, XQuery

  3. What is pull parsing ? • SAX:push parsing (event driven) • view an XML document as if it is composed of a sequence of events and call preconfigured event-handling methods while sequentially visiting these events. • stAX: pull parsing (token based) • behaves like traditional lexical analyzers (in compiler). • curser-based. • view an XML documents as is composed of a sequence of passive tokens (or events) and the AP can determine when to get the next token of a certain kind via a set of methods like nest(), nextElement() etc. • Traits of Pull Parsing: • Fast, Memory efficient, Streamable, Read-only • Suitable for the need of JAXB and JAX-RPC, which require more flexible context-dependant processing.

  4. Major Classes and Interfaces in stAX • XMLStreamReader: • an interface that represents the parser • cursor-based, event info stored in the parser • XMLEventReader • an interface that represents the parser • event-based, event info stored in the return event • XMLInputFactory: • the factory class for instantiating an XMLStreamReader and XMLEventReader • XMLStreamException: • the generic class for everything other than an IOException that might go wrong when parsing an XML document.

  5. well-formedness checking (full source) try { InputStream in = … XMLInputFactory factory = XMLInputFactory.newInstance(); XMLStreamReader parser = factory.createXMLStreamReader(in); while (true) { int event = parser.next(); // move curser to next XML token if (event == XMLStreamConstants.END_DOCUMENT) { parser.close(); break; } } parser.close(); // If we get here there are no exceptions out.println(" The input is well-formed"); } catch (XMLStreamException ex) { out.println(“ The input is not well-formed"); } catch (IOException ex) { out.println(" IO error”); }

  6. Inteface XMLStreamConstants • Define 15 event codes for XMLStreamReader.next() to tell you what kind of events the parser encounter: • START_DOCUMENT, END_DOCUMENT • START_ELEMENT, END_ELEMENT • ATTRIBUTE, CHARACTERS • CDATA, SPACE // ignorable WS • NAMESPACE, PROCESSING_INSTRUCTION • COMMENT, • ENTITY_REFERENCE • NOTATION_DECLARATION • ENTITY_DECLARATION • DTD • Depending on the read event, different methods are available on the XMLStreamReader for fetching additional infomation about the present event. (state-based)

  7. POssible parsing events • For a well-formed XML document, only the following events can be generated by XMLStreamReader#next() • START_DOCUMENT [XML declaration], • DTD [ NOTATION_DECLARATION, • ENTITY_DECLARATION ] • END_DOCUMENT • START_ELEMENT [ATTRIBUTE, NAMESPACE] • END_ELEMENT • CHARACTERS, CDATA, SPACE // ignorable WS • PROCESSING_INSTRUCTION • COMMENT, • ENTITY_REFERENCE

  8. XML Event Hierarchy • java.xml.stream.XMLStreamConstants j.x.s.events.XMLEvent • StartDocument, • DTD, NotationDeclaration, EntityDeclaration • StartElement, • Attribute Namespace • Characters, Comment, • EntityReference, ProcessingInstruction • EndElement • EndDocument

  9. XMLStreamReader • Event content queries • For element Name • getName(): QName • getLocalName(): String • getNamespaceURI():String • For declared Namespaces • getNamespaceCount(): int • getNamespaceURI(int) • getNamespacePrefis(int) • Inscope Namespace • getNamespaceURI(String prefix) • getNamespaceContext(): NamespaceContext

  10. For attached Attributes • getAttributeCount() : int • getAttributeName(int) : QName • getAttributePrefix(int): String • getAttributeNamespace(int):String • getAttributeLocalName(int):String • getAttributeType(int):String • getAttributeValue(int):String • getAttributeValue(URI, localName):String • For text or character data • hasText() : boolean; getText() : String • getTextCharacters():char[] // readOnly, valid unitl next() • getTextStart() : int; getTextLength() : int • getTextCharacters(int sstart, char[] target, int tstart, int length) // sstart = 0  copy from getTextStart()

  11. interface javax.xml.stream.events.XMLEvent • XMLEvents are value objects that are used to communicate the XML 1.0 InfoSet to the Application. • Events may be cached and referenced after the parse has completed. • Methods • int getEventType(); • Location getLocation(); • QName getSchemaType(); // optional • void writeAsEncodedUnicode(Writer writer) • write this XMLEvent to write as Unicode characters. • Xxxx asXxxx() • Xxxx : StartElement, EndElement, or Characters. • boolean isXxxx() // Xxxx: All events except Comment, DTD, NotationDecl and EntityDecl

  12. StartDocument • startDocument • public String getSystemId(); // default is “” • public String getCharacterEncodingScheme(); // ‘UTF-8” • public boolean encodingSet(); // true if this attr set • public boolean isStandalone(); //default is false • public boolean standaloneSet(); // true if this attr set • public String getVersion(); // 1,0 or 1.1 • EndDocument extends XMLEvent { } • // no special methods • Namespace • String getPrefix(); • String getNamespaceURI(); • bolean isDefaultNamespaceDeclaration();

  13. StartElement and EndElement • StartElement • QName getName(); • Iterator getAttributes(); // of j.x.stream.Attribute • Iterator getNamespaces(); • namespaces declared or undeclared in this start tag of j.x.s.Namespace • Attribute getAttributeByName(QName name); • String getNamespaceURI(String prefix); • query namespace URI for the input prefix • NamespaceContext getNamespaceContext(); • contains all namespaces in scope • EndElement • public QName getName(); • public Iterator getNamespaces(); • // namespaces going out of scope

  14. The NamespaceContext Class package javax.xml.namespace; • NamespaceContext • String getNamespaceURI(String prefix); • String getPrefix(String namespaceURI); • Iterator getPrefixes(String namespaceURI);

  15. javax.xml.nsamespace.QName public class QName {// immutable objects public QName( [String namespaceURI,]String localPart [, String prefix]); • default URI: java.xml.XMLConstants.NS_NULL_URI = “” • default prefix: java.xml.XMLConstants.NS_NULL_PREFIX = “” public String getLocalPart(); public String getPrefix(); public String getNamespaceURI(); public int hashCode(); public boolean equals(Object object); // true iff same URI and same localpart public String toString(); // = ‘{‘ +namespaceURI+ ‘}’ + localPart public static QName valueOf(String qNameAsString); // inverse of toString() }

  16. Location package javax.xml.stream; • Location • int getLineNumber(); • int getColumnNumber(); • int getCharacterOffset(); • String getLocationURI();

  17. XMLEventReader package javax.xml.stream; public interface XMLEventReader extends Iterator { Obejct next(); boolean hasNext() ; boolean remove(Object) public XMLEvent peek(); public String getElementText(); public XMLEvent nextTag(); // skip whitespace characters public Object getProperty(String name) }

  18. Attribute and Characters • Attribute • QName getName(); • String getValue(); • QName getDTDType(); // default is “CDATA” • boolean isSpecified(); // false  given in DTD • Characters • String getData(); • boolean isWhiteSpace(); • boolean isCData(); // is a CDATA SECTION • // if Coalescing with other text  false • boolean isIgnorableWhiteSpace(); • // isWhiteSpace() && child of element-only element

  19. Comment, PI and Notation • Comment • getText() : String • ProcessingINstruction • getTarget() : String • getData() : String

  20. DTD and NotationDeclaration • DTD • String getDocumentTypeDeclaration(); • // as a string • Object getProcessedDTD(); • // a representation of DTD. • List getNotations(); • // of NotationDeclaration • List getEntities(); // unparsed entities • NotationDeclaration • getName() : String • getPublicId(): Strnig • getSysteId()L String

  21. EntityDeclaration and EntityReference • (general unparsed ?) EntityDeclaration • String getPublicId(); • String getSystemId(); • String getName(); • public String getNotationName(); • public String getReplacementText() • (Unexpanded general) EntityReference • EntityDeclarationgetDeclaration() • Return the declaration of this entity.   • StringgetName() • The name of the entity • Event reported only if isReplacingEntityReferences is set to false

More Related