1 / 22

CSCI/CMPE 4341 Topic: Programming in Python Chapter 9: Python XML Processing

CSCI/CMPE 4341 Topic: Programming in Python Chapter 9: Python XML Processing. Xiang Lian The University of Texas – Pan American Edinburg, TX 78539 lianx@utpa.edu. Objectives. In this chapter, you will: Understand XML Become familiar with the types of markup languages created with XML

petec
Download Presentation

CSCI/CMPE 4341 Topic: Programming in Python Chapter 9: Python XML Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCI/CMPE 4341 Topic: Programming in PythonChapter 9: Python XML Processing Xiang Lian The University of Texas – Pan American Edinburg, TX 78539 lianx@utpa.edu

  2. Objectives • In this chapter, you will: • Understand XML • Become familiar with the types of markup languages created with XML • Learn to create XML markup programmatically • Use the Document Object Model (DOM) to manipulate XML documents • Explore ElementTree package to retrieve data from XML documents

  3. Introduction • XML developed by World Wide Consortium’s (W3C’s) XML Working Group (1996) • XML portable, widely supported, open technology for describing data • XML quickly becoming standard for data exchange between applications

  4. XML Documents • XML documents end with .xml extension • XML marks up data using tags, which are names enclosed in angle brackets • <tag> elements </tag> • Elements: individual units of markup (i.e., everything included between a start tag and its corresponding end tag) • Nested elements form hierarchies • Root element contains all other document elements

  5. XML comments delimited by <!– and --> Optional XML declaration includes version information parameter Root element contains all other document elements End tag has format </start tag name> <?xml version = "1.0"?> <!-- Fig. 15.1: article.xml --> <!-- Article structured with XML. --> <article> <title>Simple XML</title> <date>December 21, 2001</date> <author> <firstName>John</firstName> <lastName>Doe</lastName> </author> <summary>XML is pretty easy.</summary> <content>In this chapter, we present a wide variety of examples that use XML. </content> </article> article.xml

  6. XML Document • View XML documents • Any text editor • Internet Explorer, Notepad, Visual Studio, etc. Minus sign

  7. Root element letter Child element contact Attribute (name-value pair) Empty elements do not contain character data 1 <?xml version = "1.0"?> 2 3 <!-- Fig. 15.3: letter.xml --> 4 <!-- Business letter formatted with XML. --> 5 6<letter> 7 <contact type = "from"> 8 <name>Jane Doe</name> 9 <address1>Box 12345</address1> 10 <address2>15 Any Ave.</address2> 11 <city>Othertown</city> 12 <state>Otherstate</state> 13 <zip>67890</zip> 14 <phone>555-4321</phone> 15 <flag gender = "F" /> 16 </contact> 17 18 <contact type = "to"> 19 <name>John Doe</name> 20 <address1>123 Main St.</address1> 21 <address2></address2> 22 <city>Anytown</city> 23 <state>Anystate</state> 24 <zip>12345</zip> 25 <phone>555-1234</phone> 26 <flag gender = "M" /> 27 </contact> 28 29 <salutation>Dear Sir:</salutation> 30 letter.xml

  8. 31 <paragraph>It is our privilege to inform you about our new 32 database managed with <technology>XML</technology>. This 33 new system allows you to reduce the load on 34 your inventory list server by having the client machine 35 perform the work of sorting and filtering the data. 36 </paragraph> 37 38 <paragraph>Please visit our Web site for availability 39 and pricing. 40 </paragraph> 41 42 <closing>Sincerely</closing> 43 44 <signature>Ms. Doe</signature> 45 </letter> letter.xml

  9. XML Namespaces • Provided for unique identification of XML elements • Namespace prefixes identify namespace to which an element belongs <Xiang:CSCI/CMPE4341> Topic: Programming in Python </Xiang:CSCI/CMPE4341>

  10. Attribute xmlns creates namespace prefix Namespace prefix bound to an URI Uses prefix text to describe element file 1 <?xml version = "1.0"?> 2 3 <!-- Fig. 15.4: namespace.xml --> 4 <!-- Demonstrating namespaces. --> 5 6<text:directory xmlns:text = "http://www.deitel.com/ns/python1e" 7 xmlns:image = "http://www.deitel.com/images/ns/120101"> 8 9 <text:file filename = "book.xml"> 10 <text:description>A book list</text:description> 11 </text:file> 12 13 <image:file filename = "funny.jpg"> 14 <image:description>A funny picture</image:description> 15 <image:size width = "200" height = "100" /> 16 </image:file> 17 18 </text:directory> namespace.xml

  11. Creates default namespace by binding URI to attribute xmlns without prefix Element without prefix defined in default namespace 1 <?xml version = "1.0"?> 2 3 <!-- Fig. 15.5: defaultnamespace.xml --> 4 <!-- Using default namespaces. --> 5 6<directory xmlns = "http://www.deitel.com/ns/python1e" 7 xmlns:image = "http://www.deitel.com/images/ns/120101"> 8 9<file filename = "book.xml"> 10 <description>A book list</description> 11 </file> 12 13 <image:file filename = "funny.jpg"> 14 <image:description>A funny picture</image:description> 15 <image:size width = "200"height = "100" /> 16 </image:file> 17 18 </directory> defaultnamespace.xml

  12. Document Object Model (DOM) • DOM parserretrieves data from XML document • Hierarchical tree structure called a DOM tree • Each component of an XML document represented as a tree node • Parent nodes contain child nodes • Sibling nodes have same parent • Single root (or document) node contains all other document nodes

  13. article contents title author date firstName summary lastName Example of Document Object Model (DOM)

  14. Processing XML in Python • Python packages for XML support • 4DOM and xml.sax • Generating XML dynamically similar to generating HTML • Python scripts can use print statements or XSLT to output XML

  15. O'Black, John Green, Sue Red, Bob Blue, Mary White, Mike Brown, Jane Gray, Bill names.txt Fig. 16.1 Text file names.txt used in Fig. 16.2.

  16. Print XML declaration Open text file if it exists Print root element List of special characters and their entity references Replace special characters with entity references #!c:\Python\python.exe # Fig. 16.2: fig16_02.py # Marking up a text file's data as XML. import sys # write XML declaration and processing instruction print ("""<?xml version = "1.0"?>""") # open data file try: file = open( "names.txt", "r" ) except IOError: sys.exit( "Error opening file" ) print ("<contacts>") # write root element # list of tuples: ( special character, entity reference ) replaceList = [ ( "&", "&amp;" ), ( "<", "&lt;" ), ( ">", "&gt;" ), ( '"', "&quot;" ), ( "'", "&apos;" ) ] # replace special characters with entity references for currentLine in file.readlines(): for oldValue, newValue in replaceList: currentLine = currentLine.replace( oldValue, newValue ) fig16_02.py

  17. Extract first and last name Remove carriage return Print contact element Print root’s closing tag # extract lastname and firstname last, first = currentLine.split( ", " ) first = first.strip() # remove carriage return # write contact element print (""" <contact> <LastName>%s</LastName> <FirstName>%s</FirstName> </contact>""" % ( last, first )) file.close() print ("</contacts>") fig16_02.py

  18. XML Processing Packages • Third-party package 4DOM, included with package PyXML, complies with W3C’s DOM Recommendation • xml.sax, included with Python, contains classes and functions for SAX-based parsing • 4XSLT, located in package 4Suite, contains an XSLT processor for transforming XML documents into other text-based formats • importxml.etree.ElementTree

  19. XML document used by fig16_04.py <?xml version = "1.0"?> <!-- Fig. 16.5: article2.xml --> <!-- Article formatted with XML --> <article> <title>Simple XML</title> <date>December 19, 2001</date> <author> <firstName>Jane</firstName> <lastName>Doe</lastName> </author> <summary>XML is easy.</summary> <content>Once you have mastered XHTML, XML is learned easily. Remember that XML is not for displaying information but for managing information. </content> </article> article2.xml

  20. # Fig. 16.4: fig16_04.py # Using 4DOM to traverse an XML Document. import sys import xml.etree.ElementTree as etree # open XML file try: tree = etree.parse("article2.xml") except IOError: sys.exit( "Error opening file" ) # get root element rootElement = tree.getroot() print ("Here is the root element of the document: %s" % \ rootElement.tag) # traverse all child nodes of root element print ("The following are its child elements:" ) for node in rootElement: print (node) fig16_04.py

  21. # get first child node of root element child = rootElement[0] print ("\nThe first child of root element is:", child.tag) print ("whose next sibling is:" ) # get next sibling of first child sibling = rootElement[1] print (sibling.tag) print ('Value of "%s" is:' % sibling.tag, end="") print (sibling.text) fig16_04.py

More Related