180 likes | 270 Views
XML. CIT 383: Administrative Scripting. Topics. What is XML? XML Structure REXML. eXtensible Markup Language. Extensible descriptive markup language framework Began as subset of Standard Generalized Markup Language (SGML).
E N D
XML CIT 383: Administrative Scripting CIT 383: Administrative Scripting
CIT 383: Administrative Scripting Topics • What is XML? • XML Structure • REXML
CIT 383: Administrative Scripting eXtensible Markup Language Extensible descriptive markup language framework • Began as subset of Standard Generalized Markup Language (SGML). • To ensure that data remains available after programs that originally created/read it become obsolete or unusable. <?xml version="1.0" encoding="UTF-8"?> <inventory> <book isbn=“0976694042”> <author>Chris Pine</author> <title>Learn to Program</title> </book> </inventory>
CIT 383: Administrative Scripting Descriptive vs Presentational Presentational describe how documents should look <b>text</b> turns on boldface for text What if you want to change book titles from bold to italics? Replace won’t work if items other than books are bold. Descriptive languages focus on the meaning <title>xml and you</title> Stylesheets describe how to present logical items. Can just be used for data storage, interchange. A/K/A logical or structural markup languages.
CIT 383: Administrative Scripting Ant Atom CML MathML MML MusicXML ODF OPML RDF SAML SOAP SVG VoiceXML WML XHTML XUL XML-based Languages
CIT 383: Administrative Scripting Evolution of XML 1986 SGML standard published as ISO 8879 1987 Unicode proposal published 1991 First volume of Unicode standard 1996 XML work started 1998 XML 1.0 released as a W3C standard 2001 XML Schema language 2004 XML 1.1 released (not widely used) 2007 Unicode 5.0 published
XML Tree Structure <todo> <title> Monday’s List </title> <item> Study for midterm </item> <item> <priority=10/> Scripting Class </item> <item> Bathe cat </item> </html> CIT 383: Administrative Scripting
CIT 383: Administrative Scripting Elements and Attributes An element consists of tags and contents <title>Learn to Program</title> Begin and end tags are mandatory. <isbn number=“0976694042” /> Attributes number=“0976694042” Elements may have zero or more attributes. Attribute values must always be quoted.
CIT 383: Administrative Scripting Text XML declaration specifies character encoding <?xml version="1.0" encoding="UTF-8"?> Encodings Unicode: universal character set, UTF-8, UTF-32 ISO-8859: 8-bit encodings, 8859-1 is West Europe Entities &#nnnn; encodes specified Unicode character &name; are named character entities, such as < is < > is > & is & currency symbols, fractions, Greek letters, math symbols, etc.
CIT 383: Administrative Scripting XML Syntax Rules • There is one and only one root tag. • Begin tags must be matched by an end tag. • XML tags must be properly nested. • XML tags are case sensitive. • All attribute values must be quoted. • Whitespace within tags is part of text. • Newlines are always stored as LF. • HTML-style comments: <!-- comment -->
CIT 383: Administrative Scripting Correctness Well-formed • Conforms to XML syntax rules. • A conforming parser will not parse documents that are not well-formed. Valid • Conforms to XML semantics rules as defined in • Document Type Definition (DTD) • XML Schema • A validating parser will not parse invalid documents.
CIT 383: Administrative Scripting XML Schema Languages <?xml version="1.0" encoding="utf-8" ?> <xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Address"> <xs:complexType> <xs:sequence> <xs:element name="Recipient" type="xs:string" /> <xs:element name="House" type="xs:string" /> <xs:element name="Street" type="xs:string" /> <xs:element name="Town" type="xs:string" /> <xs:element minOccurs="0" name="County" type="xs:string" /> <xs:element name="PostCode" type="xs:string" /> <xs:element name="Country"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="FR" /> <xs:enumeration value="DE" /> <xs:enumeration value="ES" /> <xs:enumeration value="UK" /> <xs:enumeration value="US" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> Document Type Definitions Inherited from SGML. No support for all XML. XML Schema Most commonly used. Schemas are XML docs. A/K/A WXS, XSD RELAX NG REgular LAnguage for XML Next Generation XML and non-XML forms.
CIT 383: Administrative Scripting Ruby XML Parsers REXML: Ruby Electric XML • Standard with the ruby language. • Slow on large documents. libxml-ruby • Ruby bindings for Gnome libxml2 XML toolkit. • Very fast (30X as fast as REXML). HPricot • Parses XML as well as HTML. • Fast (3-4X as fast as REXML). • Does not check for well-formedness or validity.
CIT 383: Administrative Scripting Types of Parsing Tree Parsing (DOM-like) • Good for small documents. • Loads entire document into memory. • Simple API Stream Parsing (SAX-like) • Good for large documents. • User defines callback methods, passes to API. • Parser runs callback methods on pattern match.
CIT 383: Administrative Scripting Tree Parsing Loads entire XML doc into memory. require ‘rexml/document’ include REXML input = File.new(‘data.xml’) doc = Document.new(input) root = doc.root Search document as a tree using XPath doc.elements.each(“ch/section”) do |e| puts e.attributes[“title”] end
CIT 383: Administrative Scripting Stream Parsing Define listener class. class MyListener include REXML::StreamListener def tag_start(*args) puts “start: #{args.map {|x| x.inspect}.join(‘,’” end end Invoke parser require ‘rexml/document’ require ‘rexml/streamlistener’ include REXML listen = MyListener.new source = File.new(‘data.xml’) Document.parse_stream(source, listen)
CIT 383: Administrative Scripting XPath Searches h.search("p") Find all paragraph tags in document. doc.search("/html/body//p") Find all paragraph tags within the body tag. doc.search("//a[@src]") Find all anchor tags with a src attribute. doc.search("//a[@src='google.com']") Find all a tags with a src attribute of google.com.
CIT 383: Administrative Scripting References • Michael Fitzgerald, Learning Ruby, O’Reilly, 2008. • David Flanagan and Yukihiro Matsumoto, The Ruby Programming Language, O’Reilly, 2008. • Hal Fulton, The Ruby Way, 2nd edition, Addison-Wesley, 2007. • Robert C. Martin, Clean Code, Prentice Hall, 2008. • Dave Thomas with Chad Fowler and Andy Hunt, Programming Ruby, 2nd edition, Pragmatic Programmers, 2005.