1 / 31

XML

XML. Today we will. Learn what XML is and where it comes from Learn how to parse and create XML documents See the difference between SAX and DOM ( Document Object Model) API Learn about XML validation through DTDs (Document Type Definition) and XML Schema

erol
Download Presentation

XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML

  2. Today we will • Learn what XML is and where it comes from • Learn how to parse and create XML documents • See the difference between SAX and DOM ( Document Object Model) API • Learn about XML validation through DTDs (Document Type Definition) and XML Schema • Learn how to use XSLT style sheets for transforming XML into presentation form such as HTML

  3. XML history • Has its roots in variants of SGML (Standard Generalized Markup Language) wich became international standard in 1986. • SGML is a complex tagging language. HTML was inspired by SGML, and by adding the <A>-tag the hyperlink was born. • HTML was easy and flexible for sharing strucutured information with embedded hyperlinks. But HTML was targeted at human interpretation • During 1999 the process of merging HTML and XML began steadily. • HTML4 = XHTML 1.0 became a W3C recommendation in 2000. • An XHTML document is also an XML document. • XML is targeted towards machine interpretation.

  4. XML and stylesheets • XML is a markup language • Uses tags to identify the components of the document • Does not imply how the components should be presented. • Is all about data structure. • Presentation details is left for the stylesheets to define. • A markup language states which parts of the text are 1st level headings and the stylesheet defines how these heading should look like.

  5. <html> <body> <h1>Heading Text Goes Here</h1> <p>This is a paragraph with some <b>boldfaced</b> text as well as some text that forms a list <ul> <li>First list item <li>Second item </ul> </body> </html> The presentation of HTML can be changed by stylesheets In HTML all tags have predefined “meaning”. You cannot define your own tags. HTML Example

  6. XML • Can be used to mark up just about any information. • It is called Extensible Markup Langeage • The plain-text structure makes it “portable”, since it may be edited in any simple text editor. • Standard committees tries to define suits of XML tags/structures that fit the needs for particular business branches. • Just as Java promises portable programs, XML promises portable datastructures.

  7. XML structure • An XML element is the combination of an opening tag, a closing tag, and all the data in between. • <tag> some text in between</tag> • If the opening and closing tags are collapsed, it is written • <tag/> • Data appearing within an element may contain other tags. • Proper nesting: A nested element must be closed before the its containing element is closed.

  8. Using attributes or nested elements • Any attribute may be included in the opening tag of an XML element. • <tag2 height=“12.1” length=“7”> some text </tag2> • These may also be represented as nested elements. • <tag1><height>12.1</height><length>7</length> some text </tag1>

  9. Namespaces • With many user defined element and tag names there is a considerable risk for name conflicts. • This is handled through namespaces • Like C++ and other languages as well. • A namespace is typically declared in the root element of the document • <tagRoot xmlns=“http://www.defaulttags.com/tags” xmlns:xyz=“http://www.xyztags.com/tags”> • <xyz:tag1>some text here</xyz:tag1> • <tag1>some other text here<tag1> • </tagRoot> • Above we have specified the default namespace and the xyz-namespace.

  10. Well formed and Valid? • XML documents is always well-formed. • but may not be valid XML documents. • There may be additional rules that require tags to be used in a predefined order. • Some tags or attributes may be mandatory • A well-formed but invalid XML document is like a syntactically correct (compilable) program that executes improperly.

  11. XML Example (complete file) <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="dinosaurs.xsl"?> <!DOCTYPE DinoList SYSTEM "dinosaurs.dtd"> <DinoList> <Dinosaur period="Late Cretaceous"> <Name>Tyrannosaurus Rex</Name> <Group>Carnosaur</Group> <Range> <Region>Europe</Region> <Region>North America</Region> </Range> <PhysicalAttr> <Length unit="feet">39</Length> <Weight unit="tons">6</Weight> </PhysicalAttr> </Dinosaur> <Dinosaur period="Late Jurassic"> <Name>Stegosaurus</Name> <Group>Stegosaur</Group> <Range> <Region>Europe</Region> <Region>Asia</Region> <Region>North America</Region> </Range> <PhysicalAttr> <Length unit="metres">9</Length> <Weight unit="kgs">3100</Weight> </PhysicalAttr> </Dinosaur> </DinoList>

  12. XML Example continued • Top row identifies XML version • DinoList is top level element • Top level element is called “root” element. There should be only one root element in a document. • The sub elements is quite straght forward to understand… • Notice the tree structure of elements and sub elements. <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="dinosaurs.xsl"?> <!DOCTYPE DinoList SYSTEM "dinosaurs.dtd"> <DinoList> <Dinosaur period="Late Cretaceous"> <Name>Tyrannosaurus Rex</Name> <Group>Carnosaur</Group> <Range> <Region>Europe</Region> <Region>North America</Region> </Range> <PhysicalAttr> <Length unit="feet">39</Length> <Weight unit="tons">6</Weight> </PhysicalAttr> Etc...

  13. JAXP, SAX and DOM • JAXP (Java API for XML Processing) • is the official API for XML processing from Sun. • Contains SAX, DOM and XML Schema support. • javax.xml.*, org.w3c.dom.*, org.xml.sax.* • Both SAX and DOM are language independent APIs for processing XML documents. • Many hope that JDOM will be part of JAXP(JSAX JavaScript Abstractions for X(HT)ML) in the future org.jdom.*

  14. SAX • … is an event-based API for XML processing. • An XML tree is not viewed as a data structure, but as a stream of events generated by the parser. • … reports parsing events (such as the start and end of elements) directly to the application through callbacks. • The application implements handlers to deal with the different events, much like handling events in a graphical user interface. • … is efficient when we are only interested in a subset of the entire XML source. • When you are only interested in one pass over the XML source, and not interested in building up a complete in memory tree representation, SAX may be the prefered choice. • If only a fraction of the document needs to be processed, or if the document is very large compared to internal memory, SAX is more efficient than DOM.

  15. DOM • …is a tree based API. • A DOM parser automatically maps an XML document into an internal tree structure, which allows an application to navigate that tree with random access. • It therefore often consumes more resources that SAX • It is more convenient, when performance is not an issue.

  16. A DOM parser public class DOMParser { public static void main(String[] args) throws Exception { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); Document document = parser.parse(new InputSource("dinosaurs.xml")); Element dinoList = document.getDocumentElement(); NodeList dinosaurs = dinoList.getElementsByTagName("Dinosaur"); Element currElement = null; String groupName = null; for (int i = 0; i < dinosaurs.getLength(); i++) { currElement = (Element) dinosaurs.item(i); String nameValue = getSimpleElementText(currElement, "Name"); if (nameValue.equals("Dilophosaurus")) { groupName = getSimpleElementText(currElement, "Group"); } } System.out.println("Dilophosaurus group: " + groupName); }

  17. A DOM parser /** * Method to return the first element of a specified * name from the given element */ public static Element getFirstElement(Element element, String name) { NodeList nl = element.getElementsByTagName(name); if (nl.getLength() < 1) { throw new RuntimeException("Element: " + element + " does not contain: " + name); } return (Element) nl.item(0); } /** * Method to return the text contained within the * element with the given name found within the * specified element */ public static String getSimpleElementText(Element node, String name) { Element nameEl = getFirstElement(node, name); Node textNode = nameEl.getFirstChild(); if (textNode instanceof Text) { return textNode.getNodeValue(); } else { throw new RuntimeException("No text in " + name); } } } This code shows how to scan through a DOM data structure to find specific information

  18. JDOM vs DOM • DOM is a technology independent of programming language. • It doesn’t utilize the Java Colloection framework. • JDOM is a third party (free) Java adapted DOM implementation, that utilizes the Collections framework.

  19. Generating XML content with DOM public class DOMPrinter { public static void main(String[] args) throws Exception { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); DOMImplementation domImpl = builder.getDOMImplementation(); Document document = domImpl.createDocument(null, "tagRoot", null); Element root = document.getDocumentElement(); root.setAttribute("testAttr", "testValue"); Element tag1Element = document.createElement("tag1"); Text tag1Text = document.createTextNode("sample text"); tag1Element.appendChild(tag1Text); root.appendChild(tag1Element); Element tag2Element = document.createElement("tag2");

  20. Generating XML content with DOM Text tag2Text = document.createTextNode("more text"); tag2Element.appendChild(tag2Text); tag1Element.appendChild(tag2Element); Element tag3Element = document.createElement("tag3"); root.appendChild(tag3Element); TransformerFactory tf = TransformerFactory.newInstance(); Transformer transformer = tf.newTransformer(); Source source = new DOMSource(document); FileOutputStream fos = new FileOutputStream("tags.xml"); Result output = new StreamResult(fos); transformer.transform(source, output); } }

  21. Output: • <?xml version=“1.0” encoding=“UTF-8”?><tagRoot testAttr=“testValue”> <tag1>sample text <tag2>more text</tag2> </tag1> <tag3/> • </tagRoot>

  22. Validating • Document Type Definition (DTD) • XML Schema • DTD or XML Schema • DTD is well established, and simple • XML Schema is more elaborate and may have a bright future.

  23. Example: A DTD for the dinosaurs <?xml version='1.0' encoding="UTF-8"?> <!ELEMENT DinoList (Dinosaur+)> <!ELEMENT Dinosaur (Name,Group,Range,PhysicalAttr)> <!ATTLIST Dinosaur period CDATA #IMPLIED> <!ELEMENT Group (#PCDATA)> <!ELEMENT Height (#PCDATA)> <!ATTLIST Height unit CDATA #IMPLIED> <!ELEMENT Length (#PCDATA)> <!ATTLIST Length unit CDATA #IMPLIED> <!ELEMENT Name (#PCDATA)> <!ELEMENT PhysicalAttr (Height?,Length?,Weight?)> <!ELEMENT Range (Region+)> <!ELEMENT Region (#PCDATA)> <!ELEMENT Weight (#PCDATA)> <!ATTLIST Weight unit CDATA #IMPLIED>

  24. RegExp repetition • “*”, “+”, “?” • “*” = 0 or more • “+” = 1 or more • “?” = 0 or 1

  25. Comments on the DTD dinosaur example • <Dinosour> element has exactly one <Name>,<Group>,<Range>,<PhysicalAttr> IN THAT ORDER. • CDATA – Character data • Often in attributes • PCDATA – Parsed Character data • Often inside elements. (May contain other elements) • To apply a DTD to a XML file we must modify the XML file header. • Insert <!DOCTYPE DinoList SYSTEM “dinosaurs.dtdt”> • Identifies the DinoList as the root element.

  26. Validation - Finally activate validation public static void main(String[] args) throws Exception { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(true); DocumentBuilder parser = factory.newDocumentBuilder(); parser.setErrorHandler(new ParserErrorHandler()); Document document = parser.parse(new InputSource("dinosaurs.xml")); Element dinoList = document.getDocumentElement(); NodeList dinosaurs = dinoList.getElementsByTagName("Dinosaur"); Element currElement = null; String groupName = null; for (int i = 0; i < dinosaurs.getLength(); i++) { currElement = (Element) dinosaurs.item(i); String nameValue = getSimpleElementText(currElement, "Name"); if (nameValue.equals("Dilophosaurus")) { groupName = getSimpleElementText(currElement, "Group"); } } System.out.println("Dilophosaurus group: " + groupName); }

  27. XML Schema • *.xsd • An alternative to DTDs • Is itself an XML document • It includes the full capabilities of DTDs, so that existing DTDs can be converted to XML Schema. • XML Schemas have additional capabilities compared to DTDs.

  28. XML Schema example <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="DinoList"> <xsd:complexType> <xsd:sequence> <xsd:element maxOccurs="unbounded" minOccurs="1" ref="Dinosaur"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Dinosaur"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Name"/> <xsd:element ref="Group"/> <xsd:element ref="Range"/> <xsd:element ref="PhysicalAttr"/> </xsd:sequence> <xsd:attribute name="period" type="xsd:string" use="optional"/> </xsd:complexType> </xsd:element> <xsd:element name="Group" type="xsd:string"/>

  29. Transforming XML into other forms • Common target: http documents. • By using different stylesheets (XSL) we can present the same data to a web browser, mobile phone and a PDA.

  30. <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <html><head><title>Dinosaurs!</title></head> <body><h1>Dinosaurs!</h1> <xsl:apply-templates select="DinoList/Dinosaur"/> </body></html> </xsl:template> <xsl:template match="Dinosaur"> <h2><xsl:value-of select="Name"/></h2> <table border="1" width="400" cellpadding="5"> <tr> <th>Period</th> <td><xsl:value-of select="@period"/></td> </tr> <tr> <th>Group</th> <td><xsl:value-of select="Group"/></td> </tr> <xsl:apply-templates select="Range"/> <xsl:apply-templates select="PhysicalAttr"/> </table> </xsl:template> <xsl:template match="Range"> <tr> <th>Range</th> <td> <ul> <xsl:for-each select="Region"> <li><xsl:value-of select="."/></li> </xsl:for-each> </ul> </td> </tr> </xsl:template> <xsl:template match="PhysicalAttr"> <xsl:if test="Height"> <tr> <th>Height</th> <td> <xsl:value-of select="Height"/> <xsl:text disable-output-escaping="yes"> &amp;nbsp; </xsl:text> <xsl:value-of select="Height/@unit"/> </td> </tr> </xsl:if> etc Example

  31. Stylesheet transforms… • Tree structured starting with the root node. • See Figure with guiding comments.

More Related