740 likes | 763 Views
Learn how to work with XML documents programmatically using .NET Framework classes for navigating, accessing, and reading XML content. Understand push and pull models, XMLReader, XmlDocument, XmlElement, and more. Practice iterating through XML documents with XmlTextReader.
E N D
XML for .NET Session 1 Introduction to XML Introduction to XSLT Programmatically Reading XML Documents Introduction to XPATH
XML Documents Can be Read Programmatically • The .NET Framework consists of many classes to aid in programmatically iterating through and navigating XML documents. • These classes are found in the System.Xml namespace. The various classes in the System.Xml namespace are highlighted in Chapter 6 of the text, XML and ASP.NET (starting on page. 261).
Accessing XML Content • XML documents can be accessed in one of two ways: in a push model or a pull model. • The pull model loads the entire XML document into memory, and then works with the document once it has been completely loaded. • The push model accesses only tiny pieces of the XML document when needed.
How to use the Two Methods • The .NET Framework provides developers both methods: • Pull Method – use the DOM classes in the .NET Framework. • Push Method – use the XmlReader and XmlWriter classes.
Using the Pull Method • The System.Xml namespace contains a number of classes to work with XML documents in the DOM paradigm: • XmlDocument – represents an XML document. • XmlElement – represents an individual element in the DOM • XmlAttribute – represents an attribute. • XmlText – represents text content.
Using the Push Method • The XmlReader reads one node at a time from a specified XML source. The XmlReader can only read in a FORWARD direction. • The XmlReader class cannot be used directly; instead, one of its derived classes must be used instead: • XmlNodeReader – reads one node at a time from an XML DOM. • XmlTextReader – reads one node at a time from an XML source, such as a file with XML content. • XmlValidatingReader – a reader that performs DTD or schema validation (more on this next week!)
Iterating through an XML Document using XmlTextReader • To iterate through the contents of an XML document with the XmlTextReader we need to: • Specify the XML document to iterate through when creating the XmlTextReader. • Call the Read() method, which reads in the next Node. • Access the properties of the XmlTextReader to determine the name, value, and other information about the read Node.
Iterating through an XML Document using XmlTextReader • We can programmatically read through the contents of an XML file like so: // create an XmlTextReader to read the specified XML file XmlTextReader reader = new XmlTextReader(filepath); // now, display the information of each node in the TextBox while (reader.Read()) { // access the properties of the XmlTextReader class... // like reader.Name, reader.NodeType, reader.Value, etc. } // close the XmlTextReader reader.Close();
What is a Node? • Recall that the XmlReader classes read XML nodes. What constitutes a node? Can you identify the nodes in the following XML fragment? <?xml version=“1.0” encoding=“utf-8” ?><books> <book price=“34.95”> <title>Animal Farm</title> <authors> <author>Orwell</author> </authors> </book></books>
What is a Node? <?xml version=“1.0” encoding=“utf-8” ?><books> <book price=“34.95”> <title>Animal Farm</title> <authors> <author>Orwell</author> </authors> </book></books> The whitespace between each element (if present) is also considered a node! (Although, you can set the XmlTextReader’s WhitespaceHandling property to specify if the Reader should read whitespace nodes or not.
What is a Node? <?xml version=“1.0” encoding=“utf-8” ?><books> <book price=“34.95”> <title>Animal Farm</title> <authors> <author>Orwell</author> </authors> </book></books> Notice that the attributes of an element are not considered nodes...
Creating a Program to View the Content Read by an XmlTextReader • We can create a program that allows the user to select an XML file; then, the contents of the XML file are read by an XmlTextReader, with each read node’s name, type, and value displayed.(Run demo!)
Reading the Attributes • As we saw in the demo, the attributes are not read as a separate node. • We can determine whether or not a given node has attributes by the HasAttributes property. • In order to programmatically access the attributes of a node, we must use the MoveToNextAttribute() method of the XmlTextReader.
Reading the Attributes while (reader.Read()) // C# { if (reader.HasAttributes) while (reader.MoveToNextAttribute()) // Access the attribute name/value via // reader.Name/reader.Value } While reader.Read // VB.NET Ifreader.HasAttributes then Whilereader.MoveToNextAttribute() ' Access the attribute name/value via ' reader.Name/reader.Value End While End If End While
The XmlTextReader Properties and Methods • The properties and methods of the XmlTextReader are listed started on pg. 272 of the text. • Some more germane methods include: • ReadInnerXml() – returns a string with the complete content (including XML markup) of the current node’s content (child nodes, text content, etc.) • ReadOutterXml() – returns a string containing the node’s XML markup along with the node’s content XML markup.
The XmlTextReader Properties and Methods • Run ReadInnerOutterXml-ForXmlTextReader demo… • When reading an XML document, the XmlTextReader class will throw an XmlException if there was an error in parsing the XML. • An error can occur if the XML, for example, is malformed. (That is, it is not well-formed.)
The XmlTextReader Properties and Methods • Run the XmlException demo • We will examine the XmlNodeReader and XmlValidatingReader – the other two XmlReader classes – later in this course.
Using the DOM to Iterate through an XML Document • In contrast to the Push method (XmlReader/XmlWriter), the .NET Framework offers a Pull method. • Recall that the Pull method reads the entire XML document into memory and then works with it from there. • For this model, XML documents are represented in the Document Object Model (DOM).
What is the DOM? • DOM stands for Document Object Model, and it’s a model that can be used to describe an XML document. • The DOM expresses the XML document as a hierarchy of nodes, where each element can have zero to many children elements. • The text content and attributes of an element are expressed as its children as well.
Example XML File <?xml version="1.0" encoding="UTF-8" ?> <books> <book price="34.95"> <title>TYASP 3.0</title> <authors> <author>Mitchell</author> </authors> </book> <book price=“29.95"> <title>ASP.NET Tips</title> <authors> <author>Mitchell</author> <author>Walther</author> <author>Seven</author> </authors> </book> </books>
The DOM Classes - XmlNode • There are a number of classes in the System.Xml namespace that represent the DOM. • Each “box” in the DOM model is represented in the .NET Framework by the XmlNode class. • This means that elements, attributes, and text values are all represented by the XmlNode class. The XmlNode class is discussed on pg. 287
Extending the XmlNode Class • There are a number of classes that are derived from the XmlNode class: • XmlAttribute • XmlElement • XmlDocument • And so on…
The XmlNode Properties • The XmlNode class many properties, the most germane ones being: • Name – the name of the node. For elements and attributes, the name is the name of the element or attribute. For text content, the name is #text. • Value – the value of the DOM element. For elements, there is no value. For attributes, it’s the value of the attribute; for text nodes, it’s the value of the text in the node. • NodeType – indicates the type of the node (element, text, attribute, etc.)
More XmlNode Properties • InnerXml – the string content of the XML markup of the node’s children. • OuterXml – the string content of the XML markup of the node itself and its children. • InnerText – the string content of the value of the node and all its children nodes. • HasChildNodes – a Boolean, indicating if the node has any children.
The XmlNodeList Class • The XmlNodeList class represents an arbitrary collection of XmlNodes. • For example, the XmlNode class has a ChildNodes property, which returns an XmlNodeList instance. This instance is a collection of nodes representing the DOM element’s children.
Loading an XML Document into a DOM Representation • The XmlDocument’s Load() method has four variations: • Load(Stream) • Load(string) • Load(TextReader) • Load(XmlTextReader) • In the Load(string) variation, the input string is a file path (or URL) to the XML file to load into the DOM representation.
The XmlDocument Properties • The XmlDocument is derived from the XmlNode class, meaning it has all of the properties and methods available to the XmlNode class. • Once an XML file has been loaded into an XmlDocument instance, we can access the root element through the DocumentElement property.
The XmlElement and XmlAttribute Classes • The XmlElement and XmlAttribute classes are also derived from the XmlNode class. • They represent, respectively, an element and an attribute.
Example • The following loads and XML document and displays the name of the root element. Dim xmlDoc As New XmlDocument() xmlDoc.Load(filepath) Dim rootElementName as String rootElementName = xmlDoc.DocumentElement.Name
Example • Iterating through the root element’s children: Dim xmlDoc As New XmlDocument() xmlDoc.Load(filepath) Dim n as XmlNode For Each n in xmlDoc.DocumentElement.ChildNodes ' Display the name of the node using n.Name Next
An Example of Iterating through an XML Document • Let’s create an application that displays an XML document in a TreeView control. • Each node in the TreeView represents a Node in the DOM
An Example of Iterating through an XML Document • We can recursively iterate through the DOM, ensuring that we’ll visit each node. (Explain recursion?) • Examine application code... • Questions on the program?
Navigating through an XML Document • So far, all we have seen is how to iterate through an XML document, one node at a time. • With the pull method (DOM), however, we can navigate through the document as well. • For example, we might want access just the elements in the document that have a certain name. (Such as elements with the name <author>.)
Accessing Elements with a Certain Name • The XmlDocument class contains a GetElementsByTagName() method, which returns an XmlNodeList containing elements that have the specified tag name. Dim xmlDoc As New XmlDocument() xmlDoc.Load(filepath) Dim n as XmlNode For Each n in xmlDoc.GetElementsByTagName("author") Display n.Value Next What would be the output of the above code???
Navigating through an XML Document • However, what if we want to access nodes based on more complex criteria, such as: “Access all <book> elements with a price attribute value less than 30,” or, “Access the name of the authors who have written more than one book.” • To accomplish this we need something more powerful – enter XPath!
A Quick Examination of XPath • XPath is used to define particular sections of an XML document. • XPath is named XPath because its syntax is similar to the syntax for a file path. For example, in our books XML document, we could use the following XPath statement to access all of the author elements: /books/book/authors/author
Why We Might Want to Access Certain XML Document Portions • When using XSLT to display an XML file, typically we want to display only a subset of the XML document. For example, we might want to display a listing of flights, displaying the date, the departure city and the destination city. • When working with XML data, we might want to retrieve only a certain subset of the data. • We might want to access data that meets a certain set of criteria. All of these tasks can be accomplished with XPath
XPath Components – Steps • To access the root element of the XML document, we use the following syntax: /RootElementName • Then, to access immediate descendents (children) of a given element, we use /, followed by the name of the child element. • The / operator is referred to as the step operator.
XPath Components – Steps • The step operator has parallels to the \ operator in file paths. With file systems (which can be modeled as XML documents), you navigate the directory structure by using \. For example, a path like: C:\Games\Quake\SavedGames • This file path - C:\Games\Quake\SavedGames – takes you to the specified directory. • A file system can be represented as an XML Document
The file system can be represented as an XML document… <?xml version="1.0" encoding="UTF-8" ?> <filesystem> <drive letter="C"> <folder name="Program Files" /> <folder name="Games"> <folder name="Quake"> <folder name="SavedGames" /> <file>Quake.exe</file> <file>README.txt</file> </folder> </folder> <folder name="Windows"> <file>README.txt</file> </folder> </drive> <drive letter="D"> <folder name="Backup"> <file>2003-06-01.bak</file> <file>2003-06-07.bak</file> </folder> </drive> </filesystem>
XPath Components - Steps • Using XPath we can access all of the root element using: /filesystem
XPath Components - Steps • To access all of the <drive> elements, we’d use: /filesystem/drive
XPath Components - Steps • To access all of the folder elements that were children of <drive> elements, we’d use: /filesystem/drive/folder
XPath Components - Steps • What about /filesystem/drive/folder/folder/folder
Descendent Steps • Using elementName/elementName2, we get all of the elements that are children of elementName that have the name elementName2. • But what if we want all elements that are descendents of elementName, regardless of whether or not the element is a child, grandchild, great-grandchild, etc.? • Here, we use the // operator.
Descendent Steps • As we saw earlier, /filesystem/drive/folder will return the folders that are immediate children of the <drive> element (Program Files, Games, and Window). • If we want to get all folders, regardless of their depth in the hierarchy, we can use: /filesystem/drive//folder
Descendent Steps - Example • What will /filesystem//file return?