1 / 56

Introduction to XML: A Short Online Tutorial

Learn the basics of XML, its history, syntax, elements, attributes, and more with this short online tutorial.

ayoho
Download Presentation

Introduction to XML: A Short Online Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Technical Workshop Bill Cafiero 972-231-2180 jcaf@airmail.net A short on-line XML Tutorial may be found at www.gegxs.com

  2. Internet HTML XML Java Where Does XML Fit? The Internet creates a need for platform-independent technology. platform presentation data processing

  3. XML is designed to transfer structured text and data among systems in multiple organizations XML and HTML both evolved from SGML XML focuses on document data content HTML focuses on document display All markup languages use tags “< >” and “</>” to markup the text and data to provide information about the information eXtensible Markup Language

  4. HTML • HTML - HyperText Markup Language • Non-proprietary document formatting standard • Displayable on any web browser • HTML - When my <B>dog</B> jumped over the <U>lazy fox</U>, I didn't know <I>what</I> to do! • Result - When my dog jumped over the lazy fox, I didn't know what to do! • HTML - <FONT SIZE=20 COLOR=Red>My Red Text!</FONT> • Result - My Red Text!

  5. History of XML • XML eXtensible Markup Language • Conceived, 1996 by team chaired by Jon Bosak. • W3C (World Wide Web Consortium) recommended for standard, January 1998. • Derived from SGML (Structured General Markup Language) parent. • HTML (Hyper-Text Markup Language) is an earlier cousin. • This is what XML looks like: • <garage_sale> • <date>February 29, 1999</date> • <time>7:30 am</time> • <place>249 Cedar Elm Road</place> • <notes>Lots of high quality junk for sale</notes> • </garage_sale>

  6. An XML Tutorial

  7. This is an XML “Instance” document <?XML version=“1.0”?> <!DOCTYPE Book SYSTEM ”Book.dtd"> <Bookisbn="1111"> <title>The Catcher in the Rye</title> <author>J.D. Salinger</author> <year>1948</year> <price>11.95</price> </Book> This sample contains a “prolog”, a reference to the accompanying “rules” for the document, element and attribute names and “content”. These are all identified by “markup syntax”

  8. Tags, Elements and Attributes • Tags are labels that tells an application or other agent to do something to whatever is encased in the tags • <title> is a “start” tag. </title> is the closing or “end” tag • An Element refers to both the tags plus the content (the stuff between the tags). E.G. • <title>The Catcher in the Rye</title> • The outermost element in the hierarchy is called the “Root Element” (Book in our example) • Any tag can have an Attribute that takes the form of name/value pairs, <tag Attribute = “value”/> • E.G. <Bookisbn="1111"> • Note: • XML is case sensitive so that tags like <Book>, <book>, and <BOOK> are all different

  9. Unicode is the character set for XML Universal Multiple-Octet Coded Character Set (UCS) 16-bit encoding for the worlds principle languages, including ancient languages ISO/IEC 19646-1:1993 Description of the whole set is available from http://www.unicode.org The XML Character Set

  10. Defined by UNICODE (ISO 10646), supported by NT, Win95/98 and Java platforms. 0000 1000 2000 3000 4000 5000 6000 7000 8000 9000 A000 B000 C000 D000 E000 F000 XML Characters Double Byte CJK Ideographs Hangul CJK Misc Surrogates Symbols Private Use General scripts Compatibility

  11. Names Names begin with a letter or one of a few punctuation characters (“-”, “.”) and continues with letters, digits, hyphens, underscores, colons or periods. Spaces are not allowed in names! Names beginning with the string “XML” are reserved. <data_element> <Order-Date> <Shipping_Address>

  12. Elements There are two kinds of elements - those that have content and those that don’t (empty elements) <title>This is the title</title> <empty_elementattr =“attribute-value”/>

  13. Attributes Attributes are a way of attaching characteristics or properties to elements of a document. Attributes have names and values. <personheight=“165cm”>Bill Smith</person> <personheight=“165cm” weight=“165lb”>John Doe</person>

  14. Encoding declaration Must always precede the XML content Processing instructions, so no closing tag The Prolog is made up of an XML declaration and a document type declaration (both optional). We will look at the DOCTYPE declaration in more detail later. XML Prolog • <?xml version=“1.0” encoding=“UTF-8” ?> • <!DOCTYPE docbook SYSTEM “http://www.davenport.org/docbook”>

  15. Comments Adding Comments to XML <?xml version=“1.0”?> <!-- There is no other version yet --> <!-- Now on to the Doctype --> <!DOCTYPE sample...

  16. Hierarchy and Navigation

  17. Structure in XML documents resembles storage containers Each storage container fits inside a larger one which fits inside another, and so on The storage containers make up the physical structure and the way they fit inside one another makes up the logical structure of the document Structure

  18. An Order in XML • <?xml version="1.0"?> • <!DOCTYPE Order SYSTEM "Orders.dtd"> • <Order> • <Order_Number>1001</Order_Number> • <Order_Date>04/24/00</Order_Date> • <Customer>Bill's Supply Company</Customer> • <Detail> • <Line_Number>1</Line_Number> • <Item>A-123</Item> • <Quantity>10</Quantity> • <Price>1.50</Price> • </Detail> • <Detail> • <Line_Number>2</Line_Number> • <Item>B-987</Item> • <Quantity>20</Quantity> • <Price>2.00</Price> • </Detail> • <Shipment> • <Shipment_Number>1</Shipment_Number> • <Ship_Date>3/15/00</Ship_Date> • <Shipment_Detail> • <Line_Number>1</Line_Number> • <Quantity>10</Quantity> • </Shipment_Detail> • <Shipment_Detail> • <Line_Number>2</Line_Number> • <Quantity>15</Quantity> • </Shipment_Detail> • </Shipment> • </Order> • Orders • Order_Number • Order_Date • Customer • Detail • Line_Number • Item • Quantity • Price • Shipment • Shipment_Number • Shipment_Date • Detail • Line_Number • Item • Quantity • Price • Shipment • Shipment_Number • Shipment_Date • Shipment_Detail • Line_Number • Quantity • Shipment_Detail • Line_Number • Quantity

  19. Well-formed documents are tightly constructed -- no “loose ends” Well-formed documents use complete storage containers No missing end tags in well-formed XML documents Structure : Well Formed

  20. Document Type Definitions(DTD’s)

  21. Document Type Definitions • Because we can create our own tags and document structures using XML, we need a mechanism for defining the tags and what the valid structure is. • A DTD is where we declare our specific elements tags. • The DTD is where we declare the attributes of each tag. • The DTD specifies the “occurrence indicators” for the child elements • ? zero or one • * zero or more • + one or more • (none) exactly one • <!ELEMENT book (title, author+, year?, price)> • <!ELEMENT title (#PCDATA)> • <!ELEMENT author (#PCDATA)> • <!ELEMENT year (#PCDATA)> • <!ELEMENT price (#PCDATA)> • <!ATTLIST book • isbn ID #REQUIRED>

  22. Declarative markup consists of: Markup open delimiter: <! A keyword Declaration information Markup close delimiter: > <!KEYWORDdeclaration_information> DTD Syntax

  23. Must precede all markup and character data Links document and declarations Can be an external reference Not required for well-formed non-validating XML document Name must match the root tag element Document Type Declaration • <!DOCTYPE example SYSTEM “greeting.dtd” • [ • …… • ]>

  24. Internal or External DTDs • <?xml version=“1.0”?> • <!DOCTYPElabel [ • <!ELEMENT label (name, street, city, state, country, code)> • <!ELEMENT name (#PCDATA) • <!ELEMENT street (#PCDATA) • <!ELEMENT city (#PCDATA) • <!ELEMENT state (#PCDATA) • <!ELEMENT country (#PCDATA) • <!ELEMENT code (#PCDATA) • ]> • <label> • <name>Rock N. Robyn>/name> • <street>Jay Bird Street</street> • <city>Baltimore</city> • <state>MD</state> • <country>USA</country> • <code>43214</code> • </label> Here the DTD is part of the same data file as the XML data

  25. DTD in an External File • <?xml version=“1.0”?> • <!DOCTYPE label SYSTEM “label.dtd”> • <label> • <name>Rock N. Robyn>/name> • <street>Jay Bird Street</street> • <city>Baltimore</city> • <state>MD</state> • <country>USA</country> • <code>43214</code> • </label> Here the DTD is stored in a local file

  26. DTD on the Web • <?xml version=“1.0”?> • <!DOCTYPE label SYSTEM“http//www.myserver.com/label.dtd”> • <label> • <name>Rock N. Robyn>/name> • <street>Jay Bird Street</street> • <city>Baltimore</city> • <state>MD</state> • <country>USA</country> • <code>43214</code> • </label> Here the DTD is on a remote web server

  27. Elements are the "building blocks" within the logical structure of a document like paragraph, title, section... Elements have unique names and lengths are not restricted The first NAME character must be a letter, “_” or “:” (Note: numerics are not allowed). Declarations must be all uppercase words, names are mixed case and case sensitive XML Elements

  28. Element Declarations <!ELEMENT sender (msgType, msgDetails) > element name content model keyword name ELEMENT is the keyword “sender” is the element name. Element names specify the name of the declared element; element names are sometimes called “generic identifiers” (gi).

  29. XML Connectors | OR, in any sequence , THEN, in sequence ( ) GROUP connector Connectors provide the rules for the sequence or order the items in the content model may appear

  30. Sequence Connector Elements separated by the sequence must appear in the order they are listed. <!ELEMENT chapter (title, paragraph)> A chapter consists of a titlefollowed by a paragraph.

  31. OR Connector The OR connector means only one of these elements may appear. <!ELEMENT Item (Product | Service)> A Item consists of a Product OR A Item consists of a Service But not both!

  32. XML Occurrence Indicators ? ZERO or ONE (optional) * ZERO or MORE (optional repeatable) + ONE or MORE (required repeatable) (null) ONE ONLY (required) Occurrence indicators provide the rules that show how many times items in the content model may appear

  33. Element appears 1 or more times <!ELEMENT chapter (title, para+)> A chapter could consist of one of these: A title followed by a paragraph A title followed by two paragraphs A title followed by three paragraphs A title followed by thirty-seven paragraphs many other options NOTE: You would not be allowed to have a chapter with only a title. Required and Repeatable

  34. Optional The optional occurrence indicator means the element may appear 0 or 1 times. • <!ELEMENT chapter (title?, para)> • A chapter could consist of one of these: • A title followed by a paragraph • A paragraph NOTE: You could not have a chapter with 2 titles followed by a paragraph

  35. Nested Model Groups • Model groups can be nested inside one another. • <!ELEMENT chapter (title?, (para+ | illus)) > A chapter could consist of one of these: • A title followed by at least one paragraph • A title followed by an illustration • An illustration • A paragraph • Many paragraphs

  36. Attributes with a Choice of Values Attributes describe special conditions associated with individual elements; they are often used as the “adjectives” of XML <!ELEMENT person (#PCDATA) <!ATTLIST personemail CDATA #REQUIRED> element attribute Default/Requirement attribute type name keyword

  37. Attribute Defaults An attribute may have a default value specified in the DTD. <!ATTLIST shirtsize(small|medium|large) medium> <!ATTLIST shoessizeCDATA “13”>

  38. An XML Document and its DTD <!ELEMENT book (title, author+, year?, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT price (#PCDATA)> <!ATTLIST book isbn ID #REQUIRED> Book.dtd <?XML version=“1.0”?> <!DOCTYPE Book SYSTEM ”Book.dtd"> <Book isbn="1111"> <title>The Catcher in the Rye</title> <author>J.D. Salinger</author> <year>1948</year> <price>11.95</price> </Book> Book.xml

  39. Nesting: elements may contain more than one other element: Buyer(Company,Contact) Elements that have a single element in their models are ones where that element is repeatable: Street(Line+) Data modelling and logical naming of elements ensures accurate representation of relationship between components. Keep role of DTD simple – don’t overload Good Style for DTD’s

  40. Namespaces

  41. Why Namespaces? • The appeal of XML lies in the ability to invent tags that convey meaningful information. For example, XML allows you to represent information about a book as: • <BOOK> • <TITLE>A Suitable Boy</TITLE> • <PRICEcurrency="US Dollar">22.95</PRICE> • </BOOK> • Similarly, you can represent information about an author as: • <AUTHOR> • <TITLE>Mr</TITLE> • <NAME>Vikram Seth</NAME> • </AUTHOR> • This example illustrates a problem. While the human reader can distinguish between the different interpretations of the "TITLE" element, a computer program does not have the context to tell them apart.

  42. Namespaces • Namespaces solve this problem by associating a vocabulary (or namespace) with a tag name. For example, the titles can be written as: • <BookInfo:TITLE>A Suitable Boy</BookInfo:TITLE> • <AuthorInfo:TITLE>Mr.</AuthorInfo:TITLE> • The name preceding the colon, the prefix, refers to a namespace, a Universal Resource Identifier (URI). The URI ensures global uniqueness when merging XML sources, while the associated prefix, a short name that substitutes for the namespace, need only be unique in the tightly scoped context of the document. With this scheme, there are no conflicts in tags and attributes, and two tags can be the same only if they are from the same namespace and have the same tag name. This allows a document to contain both book and author information without confusion about whether the "TITLE" element refers to the book or the author.

  43. Namespaces - Examples • An XMLnamespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names. This example shows both an element (publisher) and an attribute (category) qualified by the prefix “pubspace”: • <booksxmlns:pubspace="http://www.foo.com/bar"> • <bookpubspace:category="research">Numerical Analysis of Partial Differential Equations</book> • <pubspace:publisher>Addison Wesley</pubspace:publisher> • </books> • The attribute "xmlns" is an XML keyword for a namespace declaration.

  44. What XML namespaces are Not • Two things that XML namespaces are not have caused a lot of confusion, so we'll mention them here: • XML namespaces are not a technology for joining XML documents that use different DTDs. Although they might be used in such a technology, they don't provide it themselves. • The URIs used as XML namespace names do not point to schemas, information about the namespace, or anything else -- they're just identifiers. URIs were used simply because they're a well-known system for creating unique identifiers. Don't even think about trying to resolve these URIs.

  45. XML Schemas

  46. Why Schemas? • Although the DTD may have been powerful enough in many instances, it is inadequate to meet the needs of many applications that have been envisaged to use XML. • The DTD does not support data types beyond character data, which is a severe limitation for describing standards and exposing database schemas. • The DTD is not integrated with new XML technologies like Namespaces, so it is not possible to import constructs from external schemas to enable code reuse. • Applications simply need a more flexible mechanism to specify constraints on document structure than a context-free grammar.

  47. Additional Features of XML Schema • One of the main weaknesses of DTD was its lack of support for data types beyond character strings. For example: • <year>A few years ago</year> • is correct using the previous DTD. • XML Schema supports the following additional data types: • string, boolean, real, decimal, integer, non-negative integer, positive integer, non-positive integer, negative integer, dateTime, date,time, timePeriod, binary, uri, language

  48. User Defined Data Types Further constraints can be placed on the range of possible data values by creating new data types that extend built-in data types. For example, if our book list covered Twentieth Century literature, in XML Schema, we can limit the values of the year element to be between 1900 and 1999 <datatypename="YearType"> <basetypename="positive-integer"/> <minInclusive>1900</minInclusive> <maxInclusive>1999</maxInclusive> </datatype> <elementname="year" type="YearType"></element> Note the Schema itself is written in XML!

  49. Save effort by using XML Schemas Code to actually do the work Code to check the structure and content of the data In a typical program, up to 60% of the code is spent checking the data!

  50. Save effort using XML Schemas (cont.) If your data is structured as XML, and there is a schema, then you can hand the data-checking task off to a schema validator. Thus, your code is reduced by up to 60%!!! Big $$ savings! Code to actually do the work Code to check the structure and content of the data

More Related