510 likes | 530 Views
Learn how XML Schema simplifies data validation, defines data types, and ensures well-formed XML documents. Discover the advantages over DTDs and the extensibility of XML Schema.
E N D
XML Schema Languages 2/2 Dongwon Lee, Ph.D. The Pennsylvania State University IST 516 / Fall 2011 http://www.practicingsafetechs.com/TechsV1/XMLSchemas/
XML Schema • New XML schema language from W3C • Successor of DTD • Unlike DTD, XML Schema is in XML syntax • http://www.w3.org/XML/Schema <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType>
XML Schema vs. DTD: What’s New • XML Schemas are extensible to future additions • XML Schema V 1.0 1.1 … • XML Schemas are richer and more powerful than DTDs • XML Schemas are written in XML • No <!ELEMENT …> or <!ATTLIST ..> notation • XML Schemas support data types • XML Schemas support namespaces
New: Data Types • XML Schema support data types. Easier to: • Describe allowable document content • Validate the correctness of data • Work with data from a database • Define data facets (restrictions on data) • Define data patterns (data formats) • Convert data between different data types • Eg, <date type="date">2001-09-11</date> • Ensures a mutual understanding of the content • The XML data type "date" requires the format “YYYY-MM-DD”
New: in XML Notation • XML Schema uses XML notation • <> and </> • XML Schema file itself IS an XML file, too • No need to learn a new language • No need to use new tools • Use an XML editor to edit XML Schema files • Use XML parser to parse XML Schema files • Manipulate an XML Schema using DOM • Transform an XML Schema with XSLT
New: Extensibility • XML Schema is extensible because XML is extensible • XML Schema lets you: • Reuse your schema in other schemas • Create your own data types derived from the standard types Inheritance • Reference multiple schemas in the same document
Well-Formed: Not Enough • Well-Formed: a document conforms to XML syntax rules such as: • Begin with XML decl. • One unique root • Case-sensitive • Matching Start / End tags • Properly nested • Well-formed documents can still contain semantic errors or inconsistencies • Need VALID documents according to schema
Main Features • XML Schema defines elements • Simple elements: • contains only “text” • No sub-elements or attributes • “text” can be of different types • Types from XML schema built-in • Eg, boolean, string, date • User-defined types • Can add restrictions (facets) to a data type to limit its content
Simple Element • <xs:element name="xxx" type="yyy"/> • “xxx”: the name of the element • “yyy”: the data type of the element • Common built-in types in XML Schema: • xs:string • xs:decimal • xs:integer • xs:boolean • xs:date • xs:time Namespace as in: xmlns:xs="http://www.w3.org/2001/XMLSchema”
Simple Element • Some simple XML elements: <lastname>Lee</lastname> <age>2</age> <dateborn>2009-03-27</dateborn> • Corresponding simple element definitions: <xs:element name="lastname" type="xs:string"/> <xs:element name="age" type="xs:integer"/> <xs:element name="dateborn" type="xs:date"/>
Simple Element • Simple elements may have a default value OR a fixed value specified • Default value is automatically assigned to the element when no other value is specified <xs:element name="color" type="xs:string" default="red"/> • Fixed value is also automatically assigned to the element, and one cannot specify another value <xs:element name=”nationality" type="xs:string" fixed=”USA"/>
<xs:attribute> • The syntax for defining an attribute is: <xs:attribute name="xxx" type="yyy"/> • Where xxx is the name of the attribute and yyy specifies the data type of the attribute. • Simple elements cannot have attributes since they are SIMPLE
<xs:attribute> • An XML element with an attribute: <lastname lang="EN">Smith</lastname> • Corresponding attribute definition: <xs:attribute name="lang" type="xs:string"/> • Attributes can have default or fixed values. If the attribute is required, add use=“required”
Conforming to Types • When an XML element or attribute has a data type defined, it puts restrictions on the element's or attribute's content • If an XML element is of type "xs:date" and contains a string like "Hello World", the element will not validate • With XML Schemas, you can also add your own restrictions to your XML elements and attributes
Constraining User-Defined Types • Defines an element called "age" with a restriction • The value of age cannot be lower than 0 or greater than 120 <xs:element name="age"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> </xs:restriction> </xs:simpleType> </xs:element>
Constraining User-Defined Types • Defines an element called "car" with a restriction • The only acceptable values are: Audi, Golf, BMW: <xs:element name="car" type="carType"/> <xs:simpleType name="carType"> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:simpleType> • Note: In this case the type "carType" can be used by other elements because it is not a part of the "car" element.
Complex Element • What is a Complex Element? • A complex element is an XML element that contains other elements and/or attributes • There are four kinds of complex elements: • Empty elements • Elements that contain only other elements • Elements that contain only text • Elements that contain both other elements and text • Note: Each of these elements may contain attributes as well!
Complex Element: Type 1 • A complex XML element, "product", which has an empty content model: <product pid="1345"/>
Complex Element: Type 2 • A complex XML element, "employee", which contains only other elements: <employee> <firstname>John</firstname> <lastname>Smith</lastname> </employee>
Complex Element: Type 3 • A complex XML element, "food", which contains only text: <food type="dessert">Ice cream</food>
Complex Element: Type 4 • A complex XML element, "description", which contains both elements and text: <description> It happened on <date lang="norwegian">03.03.99</date> .... </description>
Eg, Define a Complex Element • Type 2: element with only sub-elements <employee> <firstname>John</firstname> <lastname>Smith</lastname> </employee>
Eg, Define a Complex Element • Method 1: no re-use foreseen <xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname“ type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
Eg, Define a Complex Element • Method 2: can reuse “myInfo” type <xs:element name="employee”type=“myInfo”> <xs:complexType name=“myInfo”> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname“ type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>
Eg, Define a Complex Element • Method 2: 3 elements can reuse “myInfo” type <xs:element name="employee" type="myInfo"/> <xs:element name="student" type="myInfo"/> <xs:element name="member" type="myInfo"/> <xs:complexType name="myInfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>
Indicators • Order • <xs:all>: in any order, occur zero or once • <xs:choice>: either A or B occur • <xs:sequence>: appear in a specific order • Occurrence • maxOccurs • minOccurs • Group • Group Name • attributeGroup Name
Eg, <xs:all> <firstname> and <lastname> can appear in ANY order but MUST appear ONCE <firstname> and <lastname> can appear in ANY order and can appear ZERO or ONCE
<xs:sequence>: family.xml <?xml version="1.0" encoding="ISO-8859-1"?><persons xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation="family.xsd” <person> <full_name>Hege Refsnes</full_name> <child_name>Cecilie</child_name></person><person> <full_name>Tove Refsnes</full_name> <child_name>Hege</child_name> <child_name>Stale</child_name> <child_name>Jim</child_name> <child_name>Borge</child_name></person><person> <full_name>Stale Refsnes</full_name></person></persons>
<xs:sequence>: family.xsd <?xml version="1.0" encoding="ISO-8859-1"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"elementFormDefault="qualified"><xs:element name="persons"> <xs:complexType> <xs:sequence> <xs:element name="person" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="full_name" type="xs:string"/> <xs:element name="child_name" type="xs:string" minOccurs="0" maxOccurs="5"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType></xs:element></xs:schema>
DTD vs. XML Schema <!ELEMENT e1 ((e2,e3?)+|e4)> <element name=“e1”> <complexType> <choice> <sequence maxOccurs=“unbounded”> <element ref=“e2”/> <element ref=“e3” minOccurs=“0”/> </sequence> <element ref=“e4”> </choice> </complexType> </element>
note.dtd <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>
note.xsd <?xml version="1.0"?> <xs:schema xmlns:xs= “http://www.w3.org/2001/XMLSchema” targetNamespace= “http://pike.psu.edu” xmlns= “http://pike.psu.edu” elementFormDefault= "qualified"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
<schema> element <?xml version="1.0"?> <xs:schema xmlns:xs = “http://www.w3.org/2001/XMLSchema” targetNamespace = “http://pike.psu.edu” xmlns = “http://pike.psu.edu” elementFormDefault= "qualified"> . . . </xs:schema> • <schema> element is the root element of every XML Schema
<schema> element <?xml version="1.0"?> <xs:schema xmlns:xs = “http://www.w3.org/2001/XMLSchema” targetNamespace = “http://pike.psu.edu” xmlns = “http://pike.psu.edu” elementFormDefault= "qualified"> . . . </xs:schema> • Some built-in elements & data types in this schema file come from http://www.w3.org/2001/XMLSchemanamespace, defined by W3C folks • They are to be prefixed with “xs:” • Eg, <xs:schema>
<schema> element <?xml version="1.0"?> <xs:schema xmlns:xs = “http://www.w3.org/2001/XMLSchema” targetNamespace = “http://pike.psu.edu” xmlns = “http://pike.psu.edu” elementFormDefault= "qualified"> . . . </xs:schema> • Indicates that the elements being defined by this schema (eg, note, to, from, heading, body.) are BOUND to this SYMBOLIC namespace • http://pike.psu.edu • Such URL may not correspond to actual URL
<schema> element <?xml version="1.0"?> <xs:schema xmlns:xs = “http://www.w3.org/2001/XMLSchema” targetNamespace = “http://pike.psu.edu” xmlns = “http://pike.psu.edu” elementFormDefault= "qualified"> . . . </xs:schema> • Default namespace • Unqualified elements (ie, w/o prefix) are assumed from this default namespace: http://pike.psu.edu
<schema> element <?xml version="1.0"?> <xs:schema xmlns:xs = “http://www.w3.org/2001/XMLSchema” targetNamespace = “http://pike.psu.edu” xmlns = “http://pike.psu.edu” elementFormDefault= "qualified"> . . . </xs:schema> • By default, locally-declared elements do not need to be qualified • To change this: elementFormDefault=“qualified” • Now, even locally-declared elements need to add prefix
note.xml with Reference to XML Schema • <?xml version="1.0"?> • <notexmlns="http://pike.psu.edu"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation= • "http://pike.psu.edu note.xsd”> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note>
note.xml with Reference to XML Schema • <?xml version="1.0"?> • <notexmlns="http://pike.psu.edu"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://pike.psu.edu note.xsd”> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note> • Default namespace for the “note.xml” file • Tell schema validator that all the elements used in “note.xml” file are declared in this namespace
note.xml with Reference to XML Schema • <?xml version="1.0"?> • <notexmlns="http://pike.psu.edu"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation= • "http://pike.psu.edu note.xsd”> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note> • Once the XML Schema Instance namespace is available Then, one can use schemaLocation attribute in the next line
note.xml with Reference to XML Schema NOTE Space here as the delimiter • <?xml version="1.0"?> • <notexmlns="http://pike.psu.edu"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation= • "http://pike.psu.edu note.xsd”> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note> • schemaLocation needs two inputs • First value: the namespace to use • Second value: the location of the XML schema to use for that namespace. Eg, • Relative: note.xsd • Absolute: http://pike.psu.edu/foo/bar/note.xsd
note.xml with Reference to XML Schema (alternative) • <?xml version="1.0"?> • <notexmlns="http://pike.psu.edu"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation=“note.xsd”> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note> • noNamespaceSchemaLocation requires just one input for the location of the XML schema to use for that namespace. Eg, • note.xsd • http://pike.psu.edu/foo/bar/note.xsd
Eg: Multiple References in xsi:schemaLocation • schemaLocation can take PAIRS of two inputs (namespace, location) • <?xml version="1.0"?> • <webster • xmlns:A=“http://www.webster.com/author” • xmlns:B=“http://www.webster.com/book” • xmlns=“http://pike.psu.edu”xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation= • "http://www.webster.com/author author.xsd • http://www.webster.com/book book.xsd”> • <A:author><A:title>Associate Prof</A:title></> • <B:book><B:title>Gone with the wind</B:title></ • </webster>
Common Errors • schemaLocation in an XML file requires TWO inputs and delimiter in-between • xsi:schemaLocation="http://pike.psu.edu note.xsd” … • targetSpace used in both XML and XSD files must be EXACTLY identical • The following minor discrepancy of the “/” at the end could trigger an error • In XML file: xmlns=http://pike.psu.edu/ • In XSD file: xmlns=http://pike.psu.edu
Schema Validation http://www.w3.org/2001/03/webdata/xsv
Lab #1 (DUE: Sep. 18 11:55PM) • https://online.ist.psu.edu/ist516/labs • Tasks • Given XML files, infer DTD and XML Schema • Validate them using W3C’s schema validator • Accessible from the Web • Turn-In • DTD and XML Schema files • Screenshots showing validation succeeded