280 likes | 305 Views
Learn about XML, DTD, XSchema, XPath, XInclude, XSLT, XLink. Understand W3C recommendations and elements in XML documents.
E N D
XML Language FamilyDetailed Examples • Most information contained in these slide comes from: http://www.w3.org/ • These slides are intended to be used as a tutorial on XML and related technologies • Slide author:Jürgen Mangler etm@wkv.at • This section contains examples on: • XML, DTD (Document Type Definition) • XSchema • XPath, XPointer • XInclude • XSLT • XLink
The W3C is the "World Wide Web Consortium", a voluntary association of companies and non-profit organizations. Membership is very expensive but confers voting rights. The decisions of W3C are guided by the Advisory Committee, lead by Tim Berners-Lee. The XML recommendation was written by the W3C's XML Working Group (WG), which has since divided into a number of subgroups. • The stages in the life of a W3C Recommendation (REC) • Working Draft (maximum gap target: 3 months) • Last Call (public comment invited; W3C must respond) • Candidate Recommendation (design is stable; implementation feedback invited) • Proposed Recommendation (Advisory Committee review)
An XML document is valid if it has an associated document type definition and if the document complies with the constraints expressed in it. The document type definition (DTD) must appear before the first element in the document. The name following the word DOCTYPE in the document type definition must match the name of the root element. tutorial.dtd: <!ELEMENT tutorial (#PCDATA)> tutorial.xml: <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE tutorial SYSTEM "tutorial.dtd"><tutorial>This is an XML document</tutorial>
tutorial.dtd: <!ELEMENT XXX (AAA , BBB)><!ELEMENT AAA (#PCDATA)><!ELEMENT BBB (#PCDATA)> An element type has element content if elements of that type contain only child elements (no character data), optionally separated by white space. tutorial.xml: <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE XXX SYSTEM "tutorial.dtd"><XXX> <AAA>Start</AAA> <BBB>End</BBB></XXX> tutorial.xml (with errors, BBB missing): <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE XXX SYSTEM "tutorial.dtd"><XXX> <AAA>Start</AAA></XXX>
The root element XXX can contain zero or more elements AAA followed by precisely one element BBB. Element BBB must be always present.: If an element name in the DTD is followed by the star [*], this element can occur zero, once or several times tutorial.dtd: <!ELEMENT XXX (AAA* , BBB)><!ELEMENT AAA (#PCDATA)><!ELEMENT BBB (#PCDATA)> tutorial.xml: <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE XXX SYSTEM "tutorial.dtd"><XXX> <AAA>Start</AAA> <AAA>Again</AAA> <BBB>End</BBB></XXX>
tutorial.dtd: <!ELEMENT XXX (AAA+ , BBB)><!ELEMENT AAA (#PCDATA)><!ELEMENT BBB (#PCDATA)> If an element name in the DTD is followed by the plus [+], this element can occur once or several times. tutorial.xml: <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE XXX SYSTEM "tutorial.dtd"><XXX> <AAA>Start</AAA> <AAA>Again</AAA> <BBB>End</BBB></XXX> tutorial.xml (with errors, AAA must occur at least once): <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE XXX SYSTEM "tutorial.dtd"><XXX> <BBB>End</BBB></XXX>
tutorial.dtd: <!ELEMENT XXX (AAA? , BBB)><!ELEMENT AAA (#PCDATA)><!ELEMENT BBB (#PCDATA)> If an element name in the DTD is followed by the question mark [?], this element can occur zero or one times. tutorial.xml: <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE XXX SYSTEM "tutorial.dtd"><XXX> <BBB>End</BBB></XXX> This example uses a combination of [ + * ?] <!ELEMENT XXX (AAA? , BBB+)><!ELEMENT AAA (CCC? , DDD*)><!ELEMENT BBB (CCC , DDD)><!ELEMENT CCC (#PCDATA)><!ELEMENT DDD (#PCDATA)> How could a valid document look like?
The root element XXX must contain either one element AAA or one element BBB: With the character [ | ] you can select one from several elements. test.dtd: <!ELEMENT XXX (AAA | BBB)><!ELEMENT AAA (#PCDATA)><!ELEMENT BBB (#PCDATA)> test.xml: <!DOCTYPE XXX SYSTEM "test.dtd"><XXX> <BBB>Valid</BBB></XXX> test.xml: <!DOCTYPE XXX SYSTEM "test.dtd"><XXX> <AAA>Also Valid</AAA></XXX> Text can be interspersed with elements. <!ELEMENT XXX (AAA+ , BBB+)><!ELEMENT AAA (BBB | CCC )><!ELEMENT BBB (#PCDATA | CCC )*><!ELEMENT CCC (#PCDATA)>
Attributes are used to associate name-value pairs with elements. Attribute specifications may appear only within start-tags and empty-element tags. The declaration starts with !ATTLIST followed by the name of the element (myElement) to which the attributes belong to, followed by the definition of the individual attributes (myAttributeA, myAttributeB). <!ELEMENT myElement (#PCDATA)><!ATTLIST myElement myAttributeA CDATA #REQUIRED myAttributeB CDATA #IMPLIED> <!DOCTYPE myElement SYSTEM "tutorial.dtd"> <myElement myAttributeA="kasperl" myAttributeB="petzi42"> Text </myElement>
An attribute of type CDATA may contain any arbitrary character data, given it conforms to well formedness constraints. Type NMTOKEN can contain only letters, digits and point [ . ] , hyphen [ - ], underline [ _ ] and colon [ : ] NMTOKENS can contain the same characters as NMTOKEN plus whitespaces. White space consists of one or more space characters, carriage returns, line feeds, or tabs. <!ELEMENT taskgroup (#PCDATA)><!ATTLIST taskgroup group CDATA #IMPLIED purpose NMTOKEN #REQUIRED names NMTOKENS #REQUIRED> <!DOCTYPE persongroup SYSTEM "tutorial.dtd"><taskgroup group="#RT1" purpose="realisation::T1" names="Joe Max Eddie"/>
The value of an attribute of type ID may contain only characters permitted for NMTOKEN and must start with a letter. No element type may have more than one ID attribute specified. The value of an ID attribute must be unique between all values of all ID attributes (in the document!). <!ELEMENT XXX (AAA+ , BBB+)> <!ELEMENT AAA (#PCDATA)> <!ELEMENT BBB (#PCDATA)> <!ATTLIST AAA id ID #REQUIRED> <!ATTLIST BBB code ID #IMPLIED list NMTOKEN #IMPLIED> <XXX> <AAA id="a1"/> <AAA id="a2"/> <AAA id="a3"/> <BBB code="QWQ-123-14-6" list="14:5"/></XXX>
The value of an attribute of type IDREF has to match the value of some ID attribute in the document. The value of an IDREF attribute can contain several references to elements with ID attributes separated by whitespaces. <!ELEMENT XXX (AAA+ , CCC+)><!ELEMENT AAA (#PCDATA)><!ELEMENT CCC (#PCDATA)><!ATTLIST AAA mark ID #REQUIRED><!ATTLIST CCC ref IDREFS #REQUIRED> <XXX> <AAA mark="a1"/> <AAA mark="a2"/> <AAA mark="a3"/> <CCC ref="a3" /> <CCC ref="a1 a2" /></XXX>
<!ELEMENT XXX (AAA+, BBB+)> <!ELEMENT AAA (#PCDATA)> <!ELEMENT BBB (#PCDATA)> <!ATTLIST AAA true ( yes | no ) #REQUIRED> <!ATTLIST BBB month (1|2|3|4|5|6|7|8|9|10|11|12) #IMPLIED> Permitted attribute values can be defined in the DTD If an attribute is implied, a default value can be provided in case the attribute isn't used. <!ELEMENT XXX (AAA+, BBB+)><!ELEMENT AAA (#PCDATA)><!ELEMENT BBB (#PCDATA)><!ATTLIST AAA true ( yes | no ) "yes"><!ATTLIST BBB month NMTOKEN "1"> #Required: You must set the attribute #Implied: You can set the attribute
<!ELEMENT XXX (AAA+)><!ELEMENT AAA EMPTY><!ATTLIST AAA true ( yes | no ) "yes"> An element can be defined as EMPTY. In such a case it may contain attributes only but no text. <XXX> <AAA true="yes"/> <AAA true="no"></AAA></XXX> <XXX> <AAA true="yes"/> <AAA true="no"></AAA> <AAA> </AAA> <AAA>Hello!</AAA></XXX> Are there errors in this example? Where are they?
The purpose of XML Schema is to deploy a standard mechanism to describe and evaluate the datatype of the content of an element. XML examples: <myElement type="integer">12</myElement> correct<myElement type="integer">eT</myElement> also correct The XML Parser can not distiguish the content of an Element. This is where XML Schema comes in: <name xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> Jürgen Mangler </name>
If we use the attribute "noNamespaceSchemaLocation", we tell the document that the schema belongs to an element from the null namespace. Valid document: <namexsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> Jürgen Mangler </name> correct_0.xsd: <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:element name="name" type="xsd:string"/> </xsd:schema>
If we want the root element to be named "AAA", from null namespace, containing text and an element "BBB", we will need to set the attribute "mixed" to "true" - to allow mixed content. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:element name="AAA"> <xsd:complexType mixed="true"> <xsd:sequence minOccurs="1"> <xsd:element name="BBB" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> <AAA xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> xxx yyy <BBB>ZZZ</BBB>aaa </AAA>
We want the root element to be named "AAA", from null namespace, containing one "BBB" and one "CCC" element. Their order is not important. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:element name="AAA"> <xsd:complexType mixed="false"> <xsd:all minOccurs="1" maxOccurs="1"> <xsd:element name="BBB" type="xsd:string"/> <xsd:element name="CCC" type="xsd:string"/> </xsd:all> </xsd:complexType> </xsd:element> </xsd:schema> <AAA xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <CCC/> <BBB/> </AAA>
We want the root element to be named "AAA", from null namespace, containing a mixture of any number (even zero), of "BBB" and "CCC" elements. We need to use the 'trick' below - we use a "sequence" element with "minOccurs" attribute set to 0 and "maxOccurs" set to "unbounded". The attribute "minOccurs" of the "element" elements has to be 0 too. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:element name="AAA"> <xsd:complexType mixed="false"> <xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:element name="BBB" type="xsd:string" minOccurs="0"/> <xsd:element name="CCC" type="xsd:string" minOccurs="0"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Give a valid document!
We want the root element to be named "AAA", from null namespace, containing a mixture of any number (even zero), of "BBB" and "CCC" elements. You need to use the trick below - use "sequence" element with "minOccurs" attribute set to 0 and "maxOccurs" set to "unbounded", and the attribute "minOccurs" of the "element" elements must be set to 0 too. <AAA xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <BBB>111</BBB> <CCC>YYY</CCC> <BBB>222</BBB> <BBB>333</BBB> <CCC>ZZZ</CCC> </AAA> A valid solution!
We want the root element to be named "AAA", from null namespace, containing either "BBB" or "CCC" elements (but not both) - the "choice" element. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:element name="AAA"> <xsd:complexType mixed="false"> <xsd:choice minOccurs="1" maxOccurs="1"> <xsd:element name="BBB" type="xsd:string"/> <xsd:element name="CCC" type="xsd:string"/> </xsd:choice> </xsd:complexType> </xsd:element> </xsd:schema> <AAA xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <CCC>aaa</CCC></AAA> Other valid solutions?
In XML Schema, the datatype is referenced by the QName. The namespace must be mapped to the prefix. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:element name="root" type="xsd:integer"/> </xsd:schema> <root xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 25 </root>
Restricting simpleType is relatively easy. Here we will require the value of the element "root" to be integer and less than 25. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:element name="root"> <xsd:simpleType> <xsd:restriction base="xsd:integer"> <xsd:maxExclusive value="25"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:schema> <root xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 25 </root> Valid? Use <xsd:minInclusive value="0"/> to force element >= 0. You can also combine min/max in <xsd:restriction>!
If we want the element "root" to be either a string "N/A" or a string "#REF!", we will use <xsd:enumeration> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="root"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="N/A"/> <xsd:enumeration value="#REF!"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:schema> <root xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> N/A </root> Other solutions?
If we want the element "root" to be either an integer or a string "N/A", we will make a union from an "integer" type and "string" type. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="root"> <xsd:simpleType> <xsd:union> <xsd:simpleType> <xsd:restriction base="xsd:integer"/> </xsd:simpleType> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="N/A"/> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType> </xsd:element> </xsd:schema>
Below we define a group of common attributes, which will be reused. The root element is named "root", it must contain the "aaa" element, and this element must have attributes "x" and "y". <xsd:element name="root"> <xsd:complexType> <xsd:sequence> <xsd:element name="aaa" minOccurs="1" maxOccurs="1"> <xsd:complexType> <xsd:attributeGroup ref="myAttrs"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:attributeGroup name="myAttrs"> <xsd:attribute name="x" type="xsd:integer" use="required"/> <xsd:attribute name="y" type="xsd:integer" use="required"/> </xsd:attributeGroup> Give a valid document!
Below we define a group of common attributes, which will be reused. The root element is named "root", it must contain the "aaa" and "bbb" elements, and these elements must have attributes "x" and "y". <root xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <aaa x="1" y="2"/> </root> Valid document from the previous Schema!
The element "A" has to contain a string which is exactly three characters long. We will define our custom type for the string named "myString" and will require the element "A" to be of that type. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:element name="A" type="myString"/> <xsd:simpleType name="myString"> <xsd:restriction base="xsd:string"> <xsd:length value="3"/> </xsd:restriction> </xsd:simpleType> </xsd:schema> <A xsi:noNamespaceSchemaLocation="correct_0.xsd" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > abc </A>