1 / 56

XML & XML Schema

XML & XML Schema. Semantic Web - Fall 2005 Computer Engineering Department Sharif University of Technology. Outline. Markup Languages SGML, HTML, XML XML Building Blocks XML Applications Namespaces XML Schema. SGML(ISO 8879). S tandard G eneralized M arkup L anguage

berg
Download Presentation

XML & XML Schema

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML & XML Schema Semantic Web - Fall 2005 Computer Engineering Department Sharif University of Technology

  2. Outline • Markup Languages • SGML, HTML, XML • XML Building Blocks • XML Applications • Namespaces • XML Schema Semantic web - Computer Engineering Dept. - Fall 2005

  3. SGML(ISO 8879) • Standard Generalized Markup Language • The international standard for defining descriptions of structure and content in text documents • Interchangeable: device-independent, system-independent • tags are not predefined • Using DTD to validate the structure of the document • Large, powerful, and very complex • Heavily used in industrial and commercial usages for over a decade Semantic web - Computer Engineering Dept. - Fall 2005

  4. HTML(RFC 1866) • HyperText Markup Language • A small SGML application used on web (a DTD and a set of processing conventions) • Only uses a predefined set of tags Semantic web - Computer Engineering Dept. - Fall 2005

  5. What is XML? • eXtensible Markup Language • A simplified version of SGML • Maintains the most useful parts of SGML • Designed so that SGML can be delivered over the Web • More flexible and adaptable than HTML • XHTML: a reformulation of HTML 4 in XML 1.0 Semantic web - Computer Engineering Dept. - Fall 2005

  6. HTML vs. XML Semantic web - Computer Engineering Dept. - Fall 2005

  7. HTML vs. XML (2) • HTML is for humans • HTML describes web pages • You don’t want to see error messages about the web pages you visit • Browsers ignore and/or correct as many HTML errors as they can, so HTML is often sloppy • XML is for computers • XML describes data • The rules are strict and errors are not allowed • In this way, XML is like a programming language • Current versions of most browsers can display XML • However, browser support of XML is spotty at best Semantic web - Computer Engineering Dept. - Fall 2005

  8. XML-related technologies • DTD (Document Type Definition) and XML Schemas are used to define legal XML tags and their attributes for particular purposes • XSLT (eXtensible Stylesheet Language Transformations) and XPath are used to translate from one form of XML to another • SAX (Simple API for XML) Semantic web - Computer Engineering Dept. - Fall 2005

  9. XML Building blocks - Elements • Delimited by angle brackets • Identify the nature of the content they surround • General format: <element> … </element> • Empty element: <empty-Element /> • XML Elements have Relationships • Elements are related as parents and children • Elements have Content • Elements can have different content types: • Element, mixed, Simple, empty Semantic web - Computer Engineering Dept. - Fall 2005

  10. XML Building blocks - Attributes Name-value pairs that occur inside start-tags after element name, like: <element attribute=“value” /> • Provide additional information about elements that often is not a part of data. • Attributes and elements are somewhat interchangeable • Should I use an element or an attribute? • Example using just elements: • <name> <first>David</first> <last>Matuszek</last></name> • Example using attributes: • <name first="David" last="Matuszek"></name> metadata (data about data) should be stored as attributes, and that data itself should be stored as elements Semantic web - Computer Engineering Dept. - Fall 2005

  11. XML Building blocks - Entities Five special characters must be written as entities: • &amp; for & (almost always necessary) • &lt; for < (almost always necessary) • &gt; for > (not usually necessary) • &quot;for " (necessary inside double quotes) • &apos; for ' (necessary inside single quotes) These entities can be used even in places where they are not absolutely required. These are the only predefined entities in XML. Semantic web - Computer Engineering Dept. - Fall 2005

  12. XML Building blocks - Declaration The XML declaration looks like this:<?xml version="1.0" encoding="UTF-8" standalone="yes"?> • The XML declaration is not required by browsers, but is required by most XML processors (so include it!) • If present, the XML declaration must be first--not even whitespace should precede it • Note that the brackets are <? and ?> • version="1.0"is required (this is the only version so far) • encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or something else, or it can be omitted • standalone tells whether there is a separate DTD Semantic web - Computer Engineering Dept. - Fall 2005

  13. XML Building blocks - Processing instructions • PIs (Processing Instructions) may occur anywhere in the XML document (but usually first) • A PI is a command to the program processing the XML document to handle it in a certain way • XML documents are typically processed by more than one program • Programs that do not recognize a given PI should just ignore it • General format of a PI: <?target instructions?> • Example: <?xml-stylesheet type="text/css" href="mySheet.css"?> Semantic web - Computer Engineering Dept. - Fall 2005

  14. XML Building blocks - Comments • <!-- This is a comment in both HTML and XML --> • Comments can be put anywhere in an XML document • Comments are useful for: • Explaining the structure of an XML document • Commenting out parts of the XML during development and testing • The character sequence -- cannot occur in the comment • Comments are not displayed by browsers, but can be seen by anyone who looks at the source code Semantic web - Computer Engineering Dept. - Fall 2005

  15. CDATA • By default, all text inside an XML document is parsed • You can force text to be treated as unparsed character data by enclosing it in <![CDATA[ ... ]]> • Any characters, even & and <, can occur inside a CDATA • Whitespace inside a CDATA is (usually) preserved • The only real restriction is that the character sequence]]>cannot occur inside a CDATA • CDATA is useful when your text has a lot of illegal characters (for example, if your XML document contains some HTML text) Semantic web - Computer Engineering Dept. - Fall 2005

  16. XML Syntax • All XML elements must have a closing tag • XML tags are case sensitive • All XML elements must be properly nested • All XML documents must have a root tag • Attribute values must always be quoted • With XML, white space is preserved • With XML, a new line is always stored as LF • Comments in XML: <!-- This is a comment --> Semantic web - Computer Engineering Dept. - Fall 2005

  17. Well-formed XML • Every element must have both a start tag and an end tag, e.g. <name> ... </name> • But empty elements can be abbreviated: <break />. • XML tags are case sensitive • XML tags may not begin with the letters xml, in any combination of cases • Elements must be properly nested, e.g. not<b><i>bold and italic</b></i> • Every XML document must have one and only one root element • The values of attributes must be enclosed in single or double quotes, e.g. <time unit="days"> • Character data cannot contain < or & Semantic web - Computer Engineering Dept. - Fall 2005

  18. Displaying XML • XML documents do not carry information about how to display the data • We can add display information to XML with • CSS (Cascading Style Sheets) • XSL (eXtensible Stylesheet Language) --- preferred Semantic web - Computer Engineering Dept. - Fall 2005

  19. XML Applications (1) Separate data XML can Separate Data from HTML • Store data in separate XML files • Using HTML for layout and display • Using Data Islands • Data Islands can be bound to HTML elements Benefits: Changes in the underlying data will not require any changes to your HTML Semantic web - Computer Engineering Dept. - Fall 2005

  20. XML Applications (2) Exchange data XML is used to Exchange Data • Text format • Software-independent, hardware-independent • Exchange data between incompatible systems, given that they agree on the same tag definition. • Can be read by many different types of applications Benefits: • Reduce the complexity of interpreting data • Easier to expand and upgrade a system Semantic web - Computer Engineering Dept. - Fall 2005

  21. XML Application (3) Store Data XML can be used to Store Data • Plain text file • Store data in files or databases • Application can be written to store and retrieve information from the store • Other clients and applications can access your XML files as data sources Benefits: Accessible to more applications Semantic web - Computer Engineering Dept. - Fall 2005

  22. XML Applications (4) Create new language XML can be used to Create new Languages, e.g. : • WML (Wireless Markup Language) used to markup Internet applications for handheld devices like mobile phones (WAP) • MusicXML used to publishing musical scores Semantic web - Computer Engineering Dept. - Fall 2005

  23. Names in XML • Names (as used for tags and attributes) must begin with a letter or underscore, and can consist of: • Letters, both Roman (English) and foreign • Digits, both Roman and foreign • . (dot) • - (hyphen) • _(underscore) • : (colon) should be used only for namespaces • Combining characters and extenders (not used in English) Semantic web - Computer Engineering Dept. - Fall 2005

  24. Namespaces • Namespaces are a simple mechanism for creating globally unique names for the elements and attributes of your markup language. • Benefits: • De-conflicts the meaning of identical names in different markup languages. • Allows different markup languages to be mixed together without ambiguity. • Namespaces are implemented by requiring every XML name to consist of two parts: a prefix and a local part: <xsd:integer> Semantic web - Computer Engineering Dept. - Fall 2005

  25. Namespaces and URIs • A namespace is defined as a unique string • To guarantee uniqueness, typically a URI (Uniform Resource Indicator) is used, because the author “owns” the domain • It doesn't have to be a “real” URI; it just has to be a unique string • Example:http://ce.sharif.edu/sw • There are two ways to use namespaces: • Declare a default namespace • Associate a prefix with a namespace, then use the prefix in the XML to refer to the namespace Semantic web - Computer Engineering Dept. - Fall 2005

  26. Namespace syntax • In any start tag you can use the reserved attribute name xmlns: • <book xmlns="http://ce.sharif.edu/sw"> • This namespace will be used as the default for all elements up to the corresponding end tag • You can override it with a specific prefix • You can use almost this same form to declare a prefix: • <book xmlns:dave="http://ce.sharif.edu/sw"> • Use this prefix on every tag and attribute you want to use from this namespace, including end tags--it is not a default prefix • <dave:chapterdave:number="1">To Begin</dave:chapter> • You can use the prefix in the start tag in which it is defined: • <dave:book xmlns:dave=“http://ce.sharif.edu/sw"> Semantic web - Computer Engineering Dept. - Fall 2005

  27. Review of XML rules • Start with <?xml version="1"?> • XML is case sensitive • You must have exactly one root element that encloses all the rest of the XML • Every element must have a closing tag • Elements must be properly nested • Attribute values must be enclosed in double or single quotation marks • There are only five pre-declared entities Semantic web - Computer Engineering Dept. - Fall 2005

  28. novel foreword chapternumber="1" paragraph paragraph paragraph This is the greatAmerican novel. It was a darkand stormy night. Suddenly, a shotrang out! XML as a tree • An XML document represents a hierarchy; a hierarchy is a tree Semantic web - Computer Engineering Dept. - Fall 2005

  29. Extended document standards • You can define your own XML tag sets, but here are some already available: • XHTML: HTML redefined in XML • SMIL: Synchronized Multimedia Integration Language • MathML: Mathematical Markup Language • SVG: Scalable Vector Graphics • DrawML: Drawing MetaLanguage • ICE: Information and Content Exchange • ebXML: Electronic Business with XML • cxml: Commerce XML • CBL: Common Business Library Semantic web - Computer Engineering Dept. - Fall 2005

  30. XML Schema

  31. XML Validation • "Well Formed" XML document • correct XML syntax • "Valid" XML document • “well formed” • Conforms to the rules of a DTD • XML DTD • defines the legal building blocks of an XML document • Can be inline in XML or as an external reference • XML Schema • an XML based alternative to DTD, more powerful • Support namespace and data types Semantic web - Computer Engineering Dept. - Fall 2005

  32. An Example XML with DTD <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </note> Semantic web - Computer Engineering Dept. - Fall 2005

  33. XML Schemas • “Schema” is a general term • DTDs are a form of XML schemas • When we say “XML Schemas,” we usually mean the W3C XML Schema Language • This is also known as “XML Schema Definition” language, or XSD. Semantic web - Computer Engineering Dept. - Fall 2005

  34. XSD vs. DTD • DTDs provide a very weak specification language • You can’t put any restrictions on text content • You have very little control over mixed content (text plus elements) • You have little control over ordering of elements • DTDs are written in a strange (non-XML) format • You need separate parsers for DTDs and XML • The XML Schema Definition language solves these problems • XSD gives you much more control over structure and content • XSD is written in XML Semantic web - Computer Engineering Dept. - Fall 2005

  35. Referring to a schema • To refer to a DTD in an XML document, the reference goes before the root element: • <?xml version="1.0"?><!DOCTYPE rootElement SYSTEM "url"><rootElement> ... </rootElement> • To refer to an XML Schema in an XML document, the reference goes in the root element: • <?xml version="1.0"?><rootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"(The XML Schema Instance reference is required) xsi:noNamespaceSchemaLocation="url.xsd">(This is where your XML Schema definition can be found) ...</rootElement> Semantic web - Computer Engineering Dept. - Fall 2005

  36. The XSD document • Since the XSD is written in XML, it can get confusing which we are talking about. • The file extension is .xsd • The root element is <schema> • The XSD starts like this: • <?xml version="1.0"?><xs:schema xmlns:xs="http://www.w3.rg/2001/XMLSchema"> Semantic web - Computer Engineering Dept. - Fall 2005

  37. <schema> • The <schema> element may have attributes: • xmlns:xs="http://www.w3.org/2001/XMLSchema" • This is necessary to specify where all our XSD tags are defined • elementFormDefault="qualified" • This means that all XML elements must be qualified (use a namespace) • It is highly desirable to qualify all elements, or problems will arise when another schema is added Semantic web - Computer Engineering Dept. - Fall 2005

  38. “Simple” and “complex” elements • A “simple” element is one that contains text and nothing else • A simple element cannot have attributes • A simple element cannot contain other elements • A simple element cannot be empty • However, the text can be of many different types, and may have various restrictions applied to it • If an element isn’t simple, it’s “complex” • A complex element may have attributes • A complex element may be empty, or it may contain text, other elements, or both text and other elements Semantic web - Computer Engineering Dept. - Fall 2005

  39. Defining a simple element • A simple element is defined as<xs:element name="name" type="type" />where: • name is the name of the element • the most common values for type are xs:boolean xs:integer xs:date xs:string xs:decimal xs:time • Other attributes a simple element may have: • default="default value"if no other value is specified • fixed="value"no other value may be specified Semantic web - Computer Engineering Dept. - Fall 2005

  40. Defining an attribute • Attributes themselves are always declared as simple types • An attribute is defined as<xs:attribute name="name" type="type" />where: • name and type are the same as forxs:element • Other attributes a simple element may have: • default="defaultvalue"if no other value is specified • fixed="value"no other value may be specified • use="optional" the attribute is not required (default) • use="required" the attribute must be present Semantic web - Computer Engineering Dept. - Fall 2005

  41. Restrictions, or “facets” • The general form for putting a restriction on a text value is: • <xs:element name="name"> (or xs:attribute) <xs:restriction base="type">... the restrictions ... </xs:restriction></xs:element> • For example: • <xs:element name="age"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"> <xs:maxInclusive value="140"> </xs:restriction></xs:element> Semantic web - Computer Engineering Dept. - Fall 2005

  42. Restrictions on numbers • minInclusive -- number must be ≥ the given value • minExclusive -- number must be > the given value • maxInclusive -- number must be ≤ the given value • maxExclusive -- number must be < the given value • totalDigits -- number must have exactly valuedigits • fractionDigits -- number must have no more than valuedigits after the decimal point Semantic web - Computer Engineering Dept. - Fall 2005

  43. Restrictions on strings • length -- the string must contain exactly valuecharacters • minLength -- the string must contain at least valuecharacters • maxLength -- the string must contain no more than valuecharacters • pattern -- the valueis a regular expression that the string must match • whiteSpace -- not really a “restriction”--tells what to do with whitespace • value="preserve" Keep all whitespace • value="replace" Change all whitespace characters to spaces • value="collapse" Remove leading and trailing whitespace, and replace all sequences of whitespace with a single space Semantic web - Computer Engineering Dept. - Fall 2005

  44. Enumeration • An enumeration restricts the value to be one of a fixed set of values • Example: • <xs:element name="season"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Spring"/> <xs:enumeration value="Summer"/> <xs:enumeration value="Autumn"/> <xs:enumeration value="Fall"/> <xs:enumeration value="Winter"/> </xs:restriction> </xs:simpleType></xs:element> Semantic web - Computer Engineering Dept. - Fall 2005

  45. Complex elements • A complex element is defined as<xs:element name="name"> <xs:complexType>... information about the complex type... </xs:complexType> </xs:element> • Example: <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> • <xs:sequence> says that elements must occur in this order • Remember that attributes are always simple types Semantic web - Computer Engineering Dept. - Fall 2005

  46. Declaration and use • So far we’ve been talking about how to declare types, not how to use them • To use a type we have declared, use it as the value oftype="..." • Examples: • <xs:element name="student" type="person"/> • <xs:element name="professor" type="person"/> • Scope is important: you cannot use a type if is local to some other type Semantic web - Computer Engineering Dept. - Fall 2005

  47. xs:sequence • We’ve already seen an example of a complex type whose elements must occur in a specific order: • <xs:element name="person"> <xs:complexType><xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> Semantic web - Computer Engineering Dept. - Fall 2005

  48. xs:all • xs:all allows elements to appear in any order • <xs:element name="person"> <xs:complexType> <xs:all> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:all> </xs:complexType> </xs:element> • Despite the name, the members of an xs:all group can occur once or not at all • You can useminOccurs="0"to specify that an element is optional (default value is 1) • In this context, maxOccursis always 1 Semantic web - Computer Engineering Dept. - Fall 2005

  49. Empty elements • Empty elements are (ridiculously) complex • <xs:complexType name="counter"> <xs:complexContent> <xs:extension base="xs:anyType"/> <xs:attribute name="count" type="xs:integer"/> </xs:complexContent></xs:complexType> Semantic web - Computer Engineering Dept. - Fall 2005

  50. Mixed elements • Mixed elements may contain both text and elements • We addmixed="true" to the xs:complexType element • The text itself is not mentioned in the element, and may go anywhere (it is basically ignored) • <xs:complexType name="paragraph" mixed="true"> <xs:sequence> <xs:element name="someName” type="xs:anyType"/> </xs:sequence></xs:complexType> Semantic web - Computer Engineering Dept. - Fall 2005

More Related