550 likes | 700 Views
Applied Component-Based Software Engineering XML Basics. CSE 668 / ECE 668 Prof. Roger Crawfis. XML Quiz. What does XML stand for? Is XML a language? What is HTML? What is XHTTP? What is HTTPS? Is HTML a language?. XML Quiz. What does XML stand for? e X tensible M arkup L anguage
E N D
Applied Component-Based Software EngineeringXML Basics CSE 668 / ECE 668 Prof. Roger Crawfis
XML Quiz • What does XML stand for? • Is XML a language? • What is HTML? What is XHTTP? What is HTTPS? • Is HTML a language?
XML Quiz • What does XML stand for? • eXtensibleMarkup Language • Is XML a language? • No! • What is HTML? What is XHTTP? What is HTTPS? • xhttp is a well-formed html (aka a valid XML) • Is HTML a language? • Yes!
XML Motivation • Data interchange is critical in today’s networked world • Examples: • Banking: funds transfer • Order processing (especially inter-company orders) • Scientific data • Chemistry: ChemML, … • Genetics: BSML (Bio-Sequence Markup Language), … • Paper flow of information between organizations is being replaced by electronic flow of information • Each application area has its own set of standards for representing information • Plain text with line headers indicating the meaning of fields • XML has become the basis for all new generation data interchange formats
Semi-structured Data • Nodes = objects. • Labels on arcs (attributes, relationships). • Atomic values at leaf nodes (nodes with no arcs out). • Flexibility: no restriction on: • Labels out of a node. • Number of successors with a given label.
Notice a new kind of data. The beer object for Bud The bar object for Joe’s Bar Example: Data Graph root beer beer bar manf manf prize name A.B. name year award servedAt Bud M’lob 1995 Gold name addr Joe’s Maple
XML Standardization World Wide Web Consortium (W3C) http://www.w3.org More resources at http://www.xml.com Java-XML (and web services) info at http://java.sun.com/javaee/technologies .NET-XML (via web services) info at http://www.microsoft.com/net/TechnicalResources
XML Uses Example: the Ajax technology. Small volume browser-server communication in XML supports more interactive Web pages. Example: Web services. Marshalling and unmarshalling data in SOAP uses XML. Service descriptions use XML.
XML Uses Example: Data exchange formats. (Applications must agree on common meaning for tags.) Older data exchange formats have been redesigned as instances of XML, eg. HL7 in health informatics, FIX in the financial industry, etc. Even proprietary formats like MS Word now have open XML versions. Example: Software development configuration files, eg., in W3C, Apache, Java EE, .NET frameworks. (All this may be geek paradise but it’s awfully verbose and the scarcity of visual editors is puzzling.)
Why People Like XML Can get data from all sorts of sources • Allows us to touch data we don’t own! • Can integrate various data sources as if they were databases (almost) • We can publish some of the data in our databases on the Web conveniently
Well-Formed and Valid XML • Well-Formed XML allows you to invent your own tags. • Similar to labels in semi-structured data. • Valid XML involves either a: • DTD (Document Type Definition), a grammar for tags. • XSD (XML Scheme Document), a grammar for tags in XML format.
Well-Formed XML • A legal XML document – fully parsable by an XML parser • All open-tags have matching close-tags • Attributes (which are unordered) only appear once in an element • There’s a single root element
Well-Formed XML • Start the document with a declaration, surrounded by <?xml … ?> . • Normal declaration is: <?xml version = “1.0” standalone = “yes” ?> • Standalone – DTD or Schema provided. • Balance of document is a root tag surrounding nested tags.
Tags • Tags, as in HTML, are normally matched pairs, as <FOO> … </FOO> . • Tags may be nested arbitrarily. • XML tags are case sensitive.
A NAME subobject A BEER subobject Example: Well-Formed XML <?xml version = “1.0” standalone = “yes” ?> <BARS> <BAR><NAME>Joe’s Bar</NAME> <BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER> <BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER> </BAR> <BAR> … </BARS>
XML and Semi-structured Data • Well-Formed XML with nested tags is exactly the same idea as trees of semi-structured data. • Graphs are possible through indirection.
Example • The <BARS> XML document is: BARS BAR BAR BAR NAME . . . BEER BEER Joe’s Bar PRICE NAME PRICE NAME Bud 2.50 Miller 3.00
XML as a Data Model XML “information set” includes 7 types of nodes: • Document (root) • Element • Attribute • Processing instruction • Text (content) • Namespace • Comment XML data model includes this, plus order info and a few other things
XML Anatomy Processing Instr. <?xml version="1.0" encoding="ISO-8859-1" ?> <dblp> <mastersthesismdate="2002-01-03" key="ms/Brown92"> <author>Kurt P. Brown</author> <title>PRPL: A Database Workload Specification Language</title> <year>1992</year> <school>Univ. of Wisconsin-Madison</school> </mastersthesis> <article mdate="2002-01-03" key="tr/dec/SRC1997-018"> <editor>Paul R. McJones</editor> <title>The 1995 SQL Reunion</title> <journal>Digital System Research Center Report</journal> <volume>SRC1997-018</volume> <year>1997</year> <ee>db/labs/dec/SRC1997-018.html</ee> <ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee> </article> Open-tag Element Attribute Close-tag
A Visualization of XML Data attribute root Root p-i element dblp ?xml text mastersthesis article mdate mdate key key author title year school 2002… editor title journal volume year ee ee 2002… 1992 1997 The… ms/Brown92 tr/dec/… PRPL… Digital… db/labs/dec Univ…. Paul R. Kurt P…. SRC… http://www.
Note exception to “matching tags” rule Empty Elements • We can do all the work of an element in its attributes. • Like BEER in previous example. • Another example: SELLS elements could have attribute price rather than a value that is a price. • Example use: <SELLS theBeer = “Bud” price = “2.50”/>
XML Namespaces • Namespaces allow us to specify a context for different tags • Two parts: • Binding of namespace to URI • Qualified names <tagxmlns:myns=“http://www.fictitious.com/mypath”> <thistag>is in namespace myns</thistag> <myns:thistag>is the same</myns:thistag><otherns:thistag>is a different tag</otherns:thistag> </tag>
XML Attributes An (opening) tag may contain attributes. These are typically used to describe the content of an element <entry> <wordlanguage= “en”> cheese </word> <wordlanguage= “fr”> fromage</word> <wordlanguage= “ro”> branza</word> <meaning> A food made … </meaning> </entry>
XML Attributes Another common use for attributes is to express dimension or type <picture> <height dim= “cm”> 2400 </height> <width dim= “in”> 96 </width> <data encoding = “gif”compression = “zip”> M05-.+C$@02!G96YE<FEC ... </data> </picture>
When to use attributes <person ssno= “123 45 6789”> <name> F. MacNiel </name> <email> fmacn@dcs.barra.ac.sc </email> ... </person> <person> <ssno>123 45 6789</ssno> <name> F. MacNiel </name> <email> fmacn@dcs.barra.ac.sc </email> ... </person> The choice between representing data as attributes or as elements is sometimes unclear, taste applies.
Defining the structure of an XML file • We can check if an XML file is well-formed • by looking at it, maybe • By loading it into a browser • If well-formed, it will be displayed • However, how can we check that the well-formed file contains the correct elements in the correct quantities? • We need to write a specification for the XML file
XML Needs Help It’s too unconstrained for many cases! • How will we know when we’re getting garbage? • How will we query? • How will we understand what we got? We also need: • Some idea of the structure • Presentation, in some cases – CSS, XSL • Some way of interpreting the tags
Defining the structure of an XML file • There are 2 main alternatives • Document Type Definitions • Original and simple • XML Schema • More versatile and complex • We will look at both • Concentrating on XML Schema • XML documents are not required to have an associated schema
Document Type Definition (DTD) • The type of an XML document can be specified using a DTD • DTD constrains structure of XML data • What elements can occur • What attributes can/must an element have • What sub-elements can/must occur inside each element, and how many times. • DTD does not constrain data types • All values represented as strings in XML • DTD syntax • <!ELEMENT element (subelements-specification) > • <!ATTLIST element (attributes) >
Exactlyonename An attribute Up to 4 telnos At least one email One or more persons Example: An Address Book <person ssn = “4444”> <name> Homer Simpson </name> <tel> 2543 </tel> <tel> 2544 </tel> <email> homer@math.springfield.edu </email> </person>
Example: The Address Book2 <person> <name> MacNiel, John </name> <greet> Dr. John MacNiel </greet> <addr>1234 Huron Street </addr> <addr> Rome, OH 98765 </addr> <tel> (321) 786 2543 </tel> <fax> (321) 786 2543 </fax> <tel> (321) 786 2543 </tel> <email> jm@abc.com </email> </person> Exactly one name At most one greeting As many address lines as needed (in order) Mixed telephones and faxes At least one
DTD - Specifying the Structure • In a DTD, we can specify the permitted content for each element, using regular expressions • For a person element, the regular expression is • name, title?, tel*,email+
What’s in a person Element? • This means • name= there must be a name element • title? = there is an optional title element (i.e., 0 or 1 title elements) • name, title?= the name element is followed by an optional title element • tel* = there are 0 or more telelements • email+= there are 1 or more email elements
Regular expressions DTD For the Address Book2 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE addressbook [ <!ELEMENT addressbook (person*)> <!ELEMENT person (name, title?, tel*, email+)> <!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ATTLIST person ssn CDATA REQUIRED> ]> PCDATA means parsed character data
Attributes in a DTD • XML elements can have attributes. • General Syntax for DTD: <!ATTLIST element-name attribute-name1 type1 default-value1 …. attribute-namentypen default-valuen> • Example: <!ATTLIST person ssn CDATA REQUIRED> • CDATA means Character data • Default value could be REQUIRED or IMPLIED (meaning optional)
A BARS object has zero or more BAR’s nested within. A BAR has one NAME and one or more BEER subobjects. A BEER has a NAME and a PRICE. NAME and PRICE are text. Example: DTD <!DOCTYPE BARS [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]>
Use of DTD’s • Set standalone = “no”. • Either: • Include the DTD as a preamble of the XML document, or • Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD can be found.
The DTD The document Use of DTD’s <?xml version = “1.0” standalone = “no” ?> <!DOCTYPE BARS [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> <BARS> <BAR><NAME>Joe’s Bar</NAME> <BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER> <BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER> </BAR> <BAR> … </BARS>
Get the DTD from the file bar.dtd Use of DTD’s • Assume the BARS DTD is in file bar.dtd. <?xml version = “1.0” standalone = “no” ?> <!DOCTYPE BARS SYSTEM “bar.dtd”> <BARS> <BAR><NAME>Joe’s Bar</NAME> <BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER> <BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER> </BAR> <BAR> … </BARS>
Valid Documents • A document with a DTD is validif it conforms to the DTD, i.e., • the document conforms to the regular-expression grammar, • types of attributes are correct, and • constraints on references are satisfied
DTDs Problems • DTDs are rather weak specifications by DB & programming-language standards • Some limitations: • Only one base type – PCDATA • Also no constraints, e.g range of values, frequency of occurrence • Not easily parsed (since they are not XML) • Not easy to express that element a has exactly the children c, d, e in any order
DTDs Problems • Difficult to specify unordered sets of subelements • Order is usually irrelevant in databases (unlike in the document-layout environment from which XML evolved) • (A | B)* allows specification of an unordered set, but • Cannot ensure that each of A and B occurs only once • Many other more complex problems.
XML Schema • DTDs are now being superceded by XML schemas. • They provide the following features • XML Syntax • So can be parsed, validated with standard XML tools • Data types other than #PCDATA • There are built in types such as integer, float, boolean, string and many others • Greater control over permitted constructs • Can specify maximum and minimum occurrences • Can use regular expressions to set patterns to be matched • Support for modularity and inheritance
Schema types • There are some basic built-in types such as xs:string, xs:decimal, xs:integer, xs:ID • Each element is composed of either simple types or complex types. A complex type is often a sequence of elements • The content of the type can be declared as shown in the following example. A type can also be declared, named and referred to. • Notice the use of minOccurs and maxOccurs. Their default is 1.
standard stuff Top-level element Namespace Simple Schema Example <?xml version="1.0" ?> <xs:schemaxmlns:xs= "http://www.w3.org/2001/XMLSchema"> <xs:element name="people"> <xs:complexType> <xs:sequence> <xs:element name="person" maxOccurs = "unbounded"> details of the person element -pto </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Namespace declaration • So at the start of a document we must specify what namespaces we are using. • In the schema example, we are using the XML schema namespace with the xs prefix • We declare this namespace in an attribute in the top-level element<xs:schemaxmlns:xs= "http://www.w3.org/2001/XMLSchema"> • We then use the xs prefix in all the XML Schema elements e.g. complexType, sequence, element etc
Schema Example Continued Empty element Details of the person element <xs:element name="person" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name ="name" type="xs:string"/> <xs:element name = "tel" type="xs:string" /> <xs:element name = "email" type="xs:string" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name= "sssNo" type="xs:integer" use="required"/> </xs:complexType> </xs:element> A person is a complex type which is a sequence of elements and an attribute
Restrictions on elements • You can also restrict the data values • a range • <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> • an enumerated list • <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> • a pattern • <xs:pattern value="([a-z])*"/> • Means 0 or more lowercase alphabetic chars
Declaring your own types • Named types can be used for elements or attributes. Here’s an example which specifies restrictions on the attribute • A named type is declared <xs:simpleTypename = "ssstype"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> </xs:restriction> </xs:simpleType> • And used as the attribute type • <xs:attribute name= "sssNo" type="ssstype" use="required"/>