550 likes | 742 Views
Good Evening!. Dr. Tappert and all our wonderful classmates!. XML. George Mathew Elaine Li Lisa Jordan Ping Gallivan. Overview. A history lesson The Web and the birth of XML when, why, and who What does XML give us? Examples, illustrations, and applications The future. Ftp.
E N D
Good Evening! Dr. Tappert and all our wonderful classmates!
XML George Mathew Elaine Li Lisa Jordan Ping Gallivan
Overview • A history lesson • The Web and the birth of XML • when, why, and who • What does XML give us? • Examples, illustrations, and applications • The future
Ftp News Email • HTML Web Server • HTTP • URL Db & other software URLs (location e.g --http://www.foo.org/boo.html) Internet communication protocols HTML (data/display) Hello There Here’s a zippy HTML page, with lots of Colors and Links ...!!! Fun, Eh? HTTP (transfer) In The Beginning ..... • …. was the birth of the Web (Tim Berners-Lee, 1992)
The Birth of XML... • ..happened in 1996, when a group of experts assembled to try and find a way out of the problem. • First draft came out in late 1996 ... Final version of the XML 1.0 specification came out in February 1998 Core Principles • Simple But as not simple as HTML, in particular with stricter formal sytax • Extensible • Distributed environment-friendly
What is XML? • XML stands for Extensible Markup Language • XML is a markup language much like HTML • XML was designed to describe data • XML tags are not predefined in XML. You must define your own tags. Openingtags <tagname> Closingtags</tagname> • XML is self describing. • XML uses a DTD(Document Type Definition) or a Schema to formally describe the data.
XML vs HTML • XML is not a replacement for HTML. • XML and HTML were designed with different goals. • XML was designed to describe data and to focus on what data is. • HTML was designed to display data and to focus on how data looks. • HTML is about displaying information, XML is about describing information.
Different XML varieties • VoiceXML: Voice Applications • WML: Wireless Markup Language • MathML: Mathematics domain • Xforms: Forms to be filled out • XML Query: Asking information of databases • XMLSchema: XML "dialect" • XSLT: Extensible Style Sheet Translations • Xpath: "root" of a document down the document "tree"
Why XML is so important…. • Many standard bodies are working on or have completed work on XML dialects for different industries. • The academic community to standardize XML dialects for Math, Physics, Chemistry, Engineering, and Social Sciences as well.
Defining Specific Language Dialects…… • Two ways of doing so: • XML Document Type Declaration (DTD) -- Part of core XML spec. • XML Schema -- New XML specification (2001), stronger constraints on XML documents.
Defining Specific Language Dialects • Adding dialect specifications implies two classes of XML data: • Well-formedAn XML document that is syntactically correct • ValidAn XML document that is both well-formed and consistent with a specific DTD (or Schema) • Most current dialects defined using DTDs. • Schemas often used for type validation.
industry std XML Core XML 1.0 Xfragment XML names RDF Canonical Xpath MathML APIs XSLT SMIL 1 & 2 Xpointer XML base W3C rec JDOM VoiceXML JAXP Xlink Infoset XSL …... SVG DOM 1 XHTML events XML signature XHTML 1.0 DOM 2 XML query …. DOM 3 XHTML basic Xforms XML schema SAX 1 SAX 2 UDDI Modularized XHTML RSS SOAP IFX TEI Biztalk IMS XML-RPC CSS 1 HEML Docbook ... 100's more .... ebXML XMI CSS 2 WDDX CellML XUL ... WSDL CSS 3 Jabber ... ... Style Protocols Web Services Application areas Data/presentaion XML (and related) Specifications W3C draft ‘Open’ std
Classes of use for XML…. For machine-machine communication • Financial information exchange • FpML, FinXML, OFX/IFX, FixML, GOLD, XBRL, SwiftML • Directory services metadata • dirXML, DSML (Directory Services Markup Language), …. • Other business transactions • FRML (first retail markup language), …. ebXML (generic business) • News, data syndication (exchanging data between machines) • XMLnews, ICE, NewsML, RSS, WDDX
Classes of use for XML Manage the connection between machines • Control of machine-machine applications • XML-RPC, SOAP • Brokering of Web "Services" • Biztalk, UDDI, ebXML
XML Messaging + Processing SOAP interface Place order (XML/edi) using SOAP Supplier Factory Supplier Supplier Response (XML/edi) using SOAP
XML Software… • XML parser -- Reads in XML data, checks for syntactic (and possibly DTD/Schema) constraints, and makes data available to an application. There are three 'generic' parser APIs • SAX Simple API to XML (event-based) • DOM Document Object Model (object/tree based) • JDOM Java Document Object Model (object/tree based)
XML Processing: SAX… A) SAX: Simple API for XML • An event-based interface • Parser reports events whenever it sees a tag/attribute/text node/other • Programmer attaches “event handlers” to handle the event
XML Processing: SAX • Advantages • Simple to use • Very fast (not doing very much before you get the tags and data) • Low memory use (doesn’t read an XML document entirely into memory) • Disadvantages • Not doing very much for you -- you have to do everything yourself • Not useful if you have to dynamically modify the document once it’s in memory (since you’ll have to do all the work to put it in memory yourself!)
XML Processing: DOM… B) DOM: Document Object Model • An object-oriented interface • Parser generates an in-memory tree corresponding to the document • DOM interface defines methods for accessing and modifying the tree
XML Processing: DOM Advantages • Very useful for dynamic modification of, access to the tree • Useful for querying (I.e. looking for data) that depends on the tree structure element.childNode("2").getAttributeValue("boobie")] • Same interface for many programming languages (C++, Java, ...) Disadvantages • Can be slow (needs to produce the tree), and can take up lots of memory • DOM programming interface is a bit awkward, not terribly object oriented
XML Processing: JDOM… C) JDOM: Java Document Object Model • A Java-specific object-oriented interface • Parser generates an in-memory tree corresponding to the document • JDOM interface has methods for accessing and modifying the tree
XML Processing: JDOM • Advantages • Very useful for dynamic modification of the tree • Useful for querying (I.e. looking for data) that depends on the tree structure • Much nicer Object Oriented programming interface than DOM • Disadvantages • Can be slow (make that tree...), and can take up lots of memory • New, and not entirely cooked (but close) • Only works with Java, and not (yet) part of Core Java standard
What are XML Schemas? • Data Model • With XML Schemas you specify how your XML data will be organized, and the datatypes of your data. That is, with XML Schemas you model how your data is to be represented in an instance document. • A Contract • Organizations agree to structure their XML documents in conformance with an XML Schema. Thus, the XML Schema acts as a contract between the organizations. • A rich source of metadata • An XML Schema document contains lots of data about the data in the XML instance documents, such as the datatype of the data, the data's range of values, how the data is related to another piece of data (parent/child, sibling relationship), i.e., XML Schemas contain metadata
Elements Attributes and Types The basic building blocks of XML Schemas are Elements Attributes Data types define the valid content that elements and attributes contain. When you create XML schemas, you define the individual elements and attributes and assign valid types to them. Elements describe data, whereas attributes are like properties of an element,
Limitations of DTDs • DTD itself is not in XML format – more work for parsers • Does not express data types(weak data typing) • No namespace support • Document can override external DTD • No DOM support • XML Schema is intended to resolve these issues but…DTDs are going to be around for a while
Let's see an example • Convert the BookStore.dtd (next page) to the XML Schema syntax • for this first example we will make a straight, one-to-one conversion, i.e., Title, Author, Date, ISBN, and Publisher will hold strings, just like is done in the DTD • We will gradually modify the XML Schema to use stronger types
BookStore.dtd <!ELEMENT BookStore (Book)+> <!ELEMENT Book (Title, Author, Date, ISBN, Publisher)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> <!ELEMENT Date (#PCDATA)> <!ELEMENT ISBN (#PCDATA)> <!ELEMENT Publisher (#PCDATA)>
ELEMENT ATTLIST BookStore Author #PCDATA Book ID Title CDATA NMTOKEN ISBN Publisher Date ENTITY This is the vocabulary that DTDs provide to define your new vocabulary
http://www.w3.org/2001/XMLSchema http://www.books.org (targetNamespace) complexType element BookStore Author sequence Book schema Title boolean string ISBN Publisher Date integer This is the vocabulary that XML Schemas provide to define your new vocabulary One difference between XML Schemas and DTDs is that the XML Schema vocabulary is associated with a name (namespace). Likewise, the new vocabulary that you define must be associated with a name (namespace). With DTDs neither set of vocabulary is associated with a name (namespace) [because DTDs pre-dated namespaces].
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> (explanations on succeeding pages) BookStore.xsd (see example01) xsd = Xml-Schema Definition
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> <!ELEMENT BookStore (Book)+> <!ELEMENT Book (Title, Author, Date, ISBN, Publisher)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> <!ELEMENT Date (#PCDATA)> <!ELEMENT ISBN (#PCDATA)> <!ELEMENT Publisher (#PCDATA)>
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> All XML Schemas have "schema" as the root element.
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> The elements and datatypes that are used to construct schemas - schema - element - complexType - sequence - string come from the http://…/XMLSchema namespace
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> Says that the elements defined by this schema - BookStore - Book - Title - Author - Date - ISBN - Publisher are to go in this namespace
Referencing a schema in an XML instance document <?xml version="1.0"?> <BookStore xmlns ="http://www.books.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.books.org BookStore.xsd"> <Book> <Title>My Life and Times</Title> <Author>Paul McCartney</Author> <Date>July, 1998</Date> <ISBN>94303-12021-43892</ISBN> <Publisher>McMillin Publishing</Publisher> </Book> ... </BookStore> 1 3 2 1. First, using a default namespace declaration, tell the schema-validator that all of the elements used in this instance document come from the Book namespace. 2. Second, with schemaLocation tell the schema-validator that the http://www.books.org namespace is defined by BookStore.xsd (i.e., schemaLocation contains apair of values). 3. Third, tell the schema-validator that the schemaLocation attribute we are using is the one in the XMLSchema-instance namespace.
Note multiple levels of checking BookStore.xml BookStore.xsd XMLSchema.xsd (schema-for-schemas) Validate that the xml document conforms to the rules described in BookStore.xsd Validate that BookStore.xsd is a valid schema document, i.e., it conforms to the rules described in the schema-for-schemas
<xsd:complexType> or <xsd:simpleType>? • When do you use the complexType element and when do you use the simpleType element? • Use the complexType element when you want to define child elements and/or attributes of an element • Use the simpleType element when you want to create a new type that is a refinement of a built-in type (string, date, gYear, etc)
Primitive Datatypes string boolean decimal float double duration dateTime time date gYearMonth gYear gMonthDay Atomic, built-in "Hello World" {true, false} 7.08 12.56E3, 12, 12560, 0, -0, INF, -INF, NAN 12.56E3, 12, 12560, 0, -0, INF, -INF, NAN P1Y2M3DT10H30M12.3S format:CCYY-MM-DDThh-mm-ss format:hh:mm:ss.sss format:CCYY-MM-DD format:CCYY-MM format:CCYY format:MM-DD Built-in Datatypes Note: 'T' is the date/time separator INF = infinity NAN = not-a-number
Built-in Datatypes (cont.) • Derived types • negativeInteger • long • int • short • byte • nonNegativeInteger • unsignedLong • unsignedInt • unsignedShort • unsignedByte • positiveInteger • Subtype of primitive datatype • negative infinity to -1 • -9223372036854775808to 9223372036854775808 • -2147483648to2147483647 • -32768to32767 • -127to128 • 0 to infinity • 0to18446744073709551615 • 0to4294967295 • 0to65535 • 0to255 • 1 to infinity Note: the following types can only be used with attributes (which we will discuss later): ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, and ENTITIES. Do Lab 3
Creating your own Datatypes • A new datatype can be defined from an existing datatype (called the "base" type) by specifying values for one or more of the optional facets for the base type. • Example. The string primitive datatype has six optional facets: • length • minLength • maxLength • pattern • enumeration • whitespace (legal values: preserve, replace, collapse)
Example of Creating a New Datatype by Specifying Facet Values <xsd:simpleType name="TelephoneNumber"> <xsd:restriction base="xsd:string"> <xsd:length value="8"/> <xsd:pattern value="\d{3}-\d{4}"/> </xsd:restriction> </xsd:simpleType> 1 2 3 4 1. This creates a new datatype called 'TelephoneNumber'. 2. Elements of this type can hold string values, 3. But the string length must be exactly 8 characters long and 4. The string must follow the pattern: ddd-dddd, where 'd' represents a 'digit'. (Obviously, in this example the regular expression makes the length facet redundant.)
Another Example <xsd:simpleType name="US-Flag-Colors"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="red"/> <xsd:enumeration value="white"/> <xsd:enumeration value="blue"/> </xsd:restriction> </xsd:simpleType> This creates a new type called US-Flag-Colors. An element declared to be of this type must have either the value red, or white, or blue.
Annotating Schemas • The <annotation> element is used for documenting the schema, both for humans and for programs. • Use <documentation> for providing a comment to humans • Use <appinfo> for providing a comment to programs • The content is any well-formed XML • Note that annotations have no effect on schema validation <xsd:annotation> <xsd:documentation> The following constraint is not expressible with XML Schema: The value of element A should be greater than the value of element B. So, we need to use a separate tool (e.g., Schematron) to check this constraint. We will express this constraint in the appinfo section (below). </xsd:documentation> <xsd:appinfo> <assert test="A > B">A should be greater than B</assert> </xsd:appinfo> <xsd:/annotation>
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element name="Book" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Can put annotations only at these locations
What is VoiceXML? VoiceXML is a XML based language for creating voice-user interfaces, particularly for the telephone. It uses speech recognition and touchtone (DTMF keypad) for input, and pre-recorded audio and text-to-speech synthesis (TTS) for output.
History of VXML • VoiceXML has its roots in a research project called PhoneWeb at AT&T Bell Laboratories. • Motorola embraced the markup approach as a way to provide mobile users with up-to-the-minute information and interactions. • In October 1998, the World Wide Web Consortium (W3C) sponsored a workshop on Voice Browsers. A number of leading companies, including AT&T, IBM, Lucent, Microsoft, Motorola, and Sun, participated.
Scope of VXML • VoiceXML use: -As a way to voice-enable a Web site, or -As an open-architecture solution for building next-generation interactive voice response telephone services. • One popular type of application is the voice portal, a telephone service where callers dial a phone number to retrieve information
Future of VoiceXML • Voice-enabled web is in demand • Reduces costs • Opens up new opportunities for business
XML Advantages….. • Self-describing, i.e., its tags are meaningful. • Extensible in many ways: extensible tag library, extensible document structure, and extensible document elements. • Distributable because its elements can be distributed entities in the web.
XML Advantages…. • User-friendly because it is human-readable. • Web-friendly because its components can be drawn for various sources in the web. For example, data came from one source, stylesheet from second source, and DTD from third source. • Very suitable for manipulation by the object-oriented language because each XML element is an object entity.