450 likes | 457 Views
Learn the basics of XML, a text-based markup language used for data interchange. Discover its structure, syntax elements, and how it enables universal data exchange.
E N D
Extensible Markup LanguageXML MSI 602 – Spring 2003
DatabasesTypes • Data is facts and figures • Database is a related set of data Kinds of databases • Unstructured • Meaning of data interpreted by user • Semi-Structured • Structure of data wrapped around data • Structured • Fixed structure of data • Data added to the fixed structure
XMLDefinition and Example • XML is a text based markup language that is fast becoming a standard of data interchange • An open standard from W3C • A direct descendant from SGML Example: Product Inventory Data <Product> <Name>Refrigerator</Name> <Model Number>R3456d2h</Model Number> <Manufacturer>General Electric</Manufacturer> <Price>1290.00</Price> <Quantity>1200</Quantity> </Product>
XMLData Interchange • XMLs key role is data interchange • Two business partners want to exchange customer data • Agree on a set of tags • Exchange data without having to change internal databases • Other business partners can participate by using the same tagset • New tags can be added to extend the functionality Key to successful data interchange is building consensus and standardizing of tag sets
XML Universal Data • TCP/IP Universal Networking • HTML Universal Rendering • Java Universal Code • XML Universal Data • Numerous standard bodies are set up for standardization of tags in different domains • ebXML • XBRL • MML • CML
HTML vs. XMLComparison • Both are markup languages • HTML has fixed set of tags • XML allows user to specify the tags based on requirements • Usage • HTML tags specify how to display data • XML tags specify semantics of the data • Tag Interpretation • HTML specifies what each tag and attribute means • XML tags delimit data & leave interpretation to the parsing application • Well formedness • HTML very tolerant of rule violations (nesting, matching tags) • XML very strictly follows rules of well formedness
XML Structure • Prolog • Instructs the parser as to what it it parsing • Contains processing instructions for processor • Body • Tags - Entities • Attributes - Properties of Entities • Comments - Statements for clarification in the document Example <?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?> Prolog <contact> <name> <first name>Sanjay</first name> <last name>Goel</last name> </name> <address> Body <street>56 Della Street</street> <city>Phoenix</city> <state>AZ</state> <zip>15784</zip> </address> </contact>
XML Prolog • Syntax: <?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?> • Contains eclaration that identifies a document as xml • Version • Version of XML markup language used in the data • Not optional • Encoding • Identifies the character set used to encode the data • Default compressed Unicode: UTF-8 • Standalone • Tells whether or not this document references external entity • May contain entity definitions and tag specifications
XML SyntaxElements & Attributes • Uses less-than and greater-than characters (<…>) as delimiters • Every opening tag must having an accompanying closing tag • <First Name>Sanjay</First Name> • Empty tags do not require an accompanying closing tag. • Empty tags have a forward slash before the greater-than sign e.g. <Name/> • Tags can have attributes which must be enclosed in double quotes • <name first=“Sanjay” last=“Goel”) • Elements should be properly nested • The nesting can not be interleaved • Each document must have one single root element • Elements and attribute names are case sensitive
Tree Structure Elements • XML documents have a tree structure containing multiple levels of nested tags. • Root element is a single XML element which encloses all of the other XML elements and data in the document • All other elements are children of the root element <?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?> <contact> Root Element <name> <first name>Sanjay</first name> <last name>Goel</last name> </name> <address> <street>56 Della Street</street> Child Elements <city>Phoenix</city> <state>AZ</state> <zip>15784</zip> </address> </contact>
Attributes Definition and Example • Attributes are properties associated with an element • Each attribute is a name value pair • No element may contain two attributes with same name • Name and value are strings • Example • <?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?> • <contact> • <name first=“Sanjay” last=“Goel”></name> Attributes • <address> • <street>56 Della Street</street> Nested Elements • <city>Phoenix</city> • <state>AZ</state> • <zip>15784</zip> • </address> • </contact>
Elements vs. Attributes Comparison • Data should be stored in Elements • Information about data (meta-data) should be stored in attributes • When in doubt use elements • Rules of thumb • Elements should have information which some one may want to read. • Attributes are appropriate for information about document that has nothing to do with content of document e.g. URLs, units, references, ids belong to attributes • What is your meta-data may be some ones data
CommentsBasics • XML comments begin with “<!--”and end with “-->” • All data between these delimiters is discarded • <!-- This is a list of names of people --> • Comments should not come before XML declaration • Comments can not be placed inside a tag • Comments may be used to hide and surround tags <Name> <first>Sanjay</first> <!-- <last>Goel</last> --> Last tag is ignored </Name> • “--” string may not occur inside a comment except as part of its opening and closing tag • <!-- the Red door -- that is the second --> Illegal
Namespaces Basics • XML documents come from different sources • Combining elements from different sources can result in name conflict • Namespaces allow the interpreter to resolve the elements • Namespaces • Declared within element start-tag using attribute xmlns • Represented as an actual URI (since namespaces are globally unique) • e.g. <Collection xmlns:book="http://www.mjyOnline.com/books" xmlns:cd=http://www.mjyOnline.com/books> • Here book and cd are short hands for the full namespace name • Default namespace is used if no other namespace is defined • It does not have any prefix associated with it
Namespaces Example <?xml version="1.0"?> <!-- File Name: Collection.xml --> <COLLECTION <ITEM> <TITLE>Violin Concertos Numbers 1, 2, and 3</TITLE> <COMPOSER>Mozart</COMPOSER> <PRICE>$16.49</PRICE> </ITEM> <TITLE>Violin Concerto in D</TITLE> <COMPOSER>Beethoven</COMPOSER> <PRICE>$14.95</PRICE> </ITEM> </COLLECTION> <?xml version="1.0"?> <!-- File Name: Collection.xml --> <COLLECTION xmlns:book="http://www.mjyOnline.com/books" xmlns:cd="http://www.mjyOnline.com/cds"> <ITEM Status="in"> <TITLE>The Adventures of Huckleberry Finn</book:TITLE> <AUTHOR>Mark Twain</book:AUTHOR> <PRICE>$5.49</book:PRICE> </ITEM> <ITEM Status="in"> <TITLE>The Marble Faun</TITLE> <AUTHOR>Nathaniel Hawthorne</AUTHOR> <PRICE>$10.95</PRICE> </ITEM> <ITEM> <ITEM Status="out"> <TITLE>Leaves of Grass</TITLE> <AUTHOR>Walt Whitman</AUTHOR> <PRICE>$7.75</PRICE> </ITEM> <ITEM Status="out"> <TITLE>The Legend of Sleepy Hollow</TITLE> <AUTHOR>Washington Irving</AUTHOR> <PRICE>$2.95</PRICE> </ITEM> Books and CDs are tracked in different files if combined will lead to conflicts
Namespaces Example <cd:ITEM> <cd:TITLE>Violin Concertos Numbers 1, 2, and 3</cd:TITLE> <cd:COMPOSER>Mozart</cd:COMPOSER> <cd:PRICE>$16.49</cd:PRICE> </cd:ITEM> <book:ITEM Status="out"> <book:TITLE>The Legend of Sleepy Hollow</book:TITLE> <book:AUTHOR>Washington Irving</book:AUTHOR> <book:PRICE>$2.95</book:PRICE> </book:ITEM> <book:ITEM Status="in"> <book:TITLE>The Marble Faun</book:TITLE> <book:AUTHOR>Nathaniel Hawthorne</book:AUTHOR> <book:PRICE>$10.95</book:PRICE> </book:ITEM> </COLLECTION> <?xml version="1.0"?> <!-- File Name: Collection.xml --> <COLLECTION xmlns:book="http://www.mjyOnline.com/books" xmlns:cd="http://www.mjyOnline.com/cds"> <book:ITEM Status="in"> <book:TITLE>The Adventures of Huckleberry Finn</book:TITLE> <book:AUTHOR>Mark Twain</book:AUTHOR> <book:PRICE>$5.49</book:PRICE> </book:ITEM> <cd:ITEM> <cd:TITLE>Violin Concerto in D</cd:TITLE> <cd:COMPOSER>Beethoven</cd:COMPOSER> <cd:PRICE>$14.95</cd:PRICE> </cd:ITEM> <book:ITEM Status="out"> <book:TITLE>Leaves of Grass</book:TITLE> <book:AUTHOR>Walt Whitman</book:AUTHOR> <book:PRICE>$7.75</book:PRICE> </book:ITEM>
Display XML Style Sheets • A style sheet is a file that contains instructions for rendering individual elements in an XML document • Two kinds of style sheets exist • Cascading Style Sheets (CSS) • Extensible Stylesheet language (XSLT) • Please refer to the following web site for comprehensive information on style sheets • http://www.w3schools.com/css/default.asp
Cascading Style SheetsExample <BOOK> <TITLE>The Legend of Sleepy Hollow</TITLE> <AUTHOR>Washington Irving</AUTHOR> <BINDING>mass market paperback</BINDING> <PAGES>98</PAGES> <PRICE>$2.95</PRICE> </BOOK> <BOOK> <TITLE>The Marble Faun</TITLE> <AUTHOR>Nathaniel Hawthorne</AUTHOR> <BINDING>trade paperback</BINDING> <PAGES>473</PAGES> <PRICE>$10.95</PRICE> </BOOK> <BOOK> <TITLE>Moby-Dick</TITLE> <AUTHOR>Herman Melville</AUTHOR> <BINDING>hardcover</BINDING> <PAGES>724</PAGES> <PRICE>$9.95</PRICE> </BOOK> </INVENTORY> <?xml version="1.0"?> <!-- File Name: Inventory01.xml --> <?xml-stylesheet type="text/css" href="Inventory01.css"?> <INVENTORY> <BOOK> <TITLE>The Adventures of Huckleberry Finn</TITLE> <AUTHOR>Mark Twain</AUTHOR> <BINDING>mass market paperback</BINDING> <PAGES>298</PAGES> <PRICE>$5.49</PRICE> </BOOK> <BOOK> <TITLE>Leaves of Grass</TITLE> <AUTHOR>Walt Whitman</AUTHOR> <BINDING>hardcover</BINDING> <PAGES>462</PAGES> <PRICE>$7.75</PRICE> </BOOK>
Cascading Style Sheets Example BINDING {display:block; margin-left:15pt} PAGES {display:none} PRICE {display:block; margin-left:15pt} /* File Name: Inventory02.css */ BOOK {display:block; margin-top:12pt; font-size:10pt} TITLE {display:block; font-size:12pt; font-weight:bold; font-style:italic} AUTHOR {display:block; margin-left:15pt; font-weight:bold}
Formal Languages/Grammars Basics • A formal language is a set of strings • It is characterized by a set of rules which determine which strings are a part of the language and which are not • In case of programming languages, programs which compile are grammatical corret (others are not) • In a natural language, like English, correct sentences follows rules of the English language grammar • More precisely grammar a defines four things • A vocabulary out of which the strings are constructed (terminal symbols) • Vocabulary that is used to formulate grammar rules (non terminal symbols) • Grammar rules (productions), each of which has a lhs and a rhs • A designated start symbol
Validated XML Document Basics • An XML document is valid if it conforms to the grammar of the language • Validity is different from well-formedness • Two ways to specify the grammar of the language • Document Type Definition (DTD) • XML Schema • Why bother with the language grammar • It provides the blueprint of the language • Ensures that the data is interchangable • Eliminates processing errors in custom software which expects a particular document content and structure • Validity of the document is checked by using a validator
Document Type Declaration Basics • Document type declaration is a block of XML markup added to the prolog of the document • It has to follow the XML declaration • It has to be outside of other markup language • It defines the content and structure of the language • Without a document type declaration or schema a document is merely checked for well-formedness and not validity • Why bother with the language grammar • It provides the blueprint of the language • Ensures that the data is interchangable • Eliminates processing errors in custom software which expects a particular document content and structure • The form of a document type declaration is: • <!DOCTYPE Name DTD> • DTD is document type definition • Name specifies the name of the document element
Document Type Definitions Basics • Document type definition (DTD) consists of a series of markup declarations enclosed in square brackets <?xml version=“1.0” standalone=“yes”?> <!DOCTYPE GREETING [ <!ELEMENT GREETING (#PCDATA)> ]> <GREETING> Hello XML! </GREETING> • A DTD can also be stored separately from the XML document and referenced in it.
Document Type Definitions Syntax • Element Type Declaration • Syntax: <!Element Name contentspec> • Name is the name of the element • contentspec is the content specification • Example: • <!Element Title (#PCDATA)> • Content specification can have four types of values • EMPTY content – Element must not have content <!Element Image EMPTY> • ANY Content – Can contain any thing <!Element misc ANY> • Element Content – Child elements but no character data <!DOCTYPE BOOK [ <!ELEMENT BOOK (TITLE, AUTHOR)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT AUTHOR (#PCDATA)> • Mixed Content – character data and child elements interspersed
Element Content Specification Types • Content Specification indicates allowed child elements and their order • If element has element content it can not contain any character data • Types of content specifications • Sequence: Indicates that each element must have a specific sequence of child elements • Example <!Doctype Mountain [ <!ELEMENT MOUNTAIN (NAME, HEIGHT, STATE)> <!ELEMENT NAME (#PCDATA) <!ELEMENT HEIGHT (#PCDATA) <!ELEMENT STATE (#PCDATA) ]> • Valid XML <MOUNTAIN> <NAME>Wheeler</NAME> <HEIGHT>13161</HEIGHT> <STATE>New Mexico</STATE> </MOUNTAIN>
Element Content Specification Types • Types of content specifications • Choice: Indicates that element can have one of a series of child elements • Each element is separated by a | sign • Example <!Doctype FILM [ <!ELEMENT FILM (STAR | NARRATOR | INSTRUCTOR)> <!ELEMENT STAR (#PCDATA)> <!ELEMENT NARRATOR (#PCDATA)> <!ELEMENT INSTRUCTOR (#PCDATA)> ]> • Valid XML <FILM> <STAR>ROBERT REDFORD</STAR> </FILM> • Invalid XML <FILM> <NARRATOR>Sir Gregory Parsloe</NARRATOR> <INSTRUCTOR>Galahad Threepwood</INSTRUCTOR> </FILM>
Element Content Specification Number of Elements • Specifying the number of elements allowed • ? zero or one • + one or more • * zero or more • Example <!Doctype Mountain [ <!ELEMENT MOUNTAIN (NAME+, HEIGHT?, STATE)> <!ELEMENT NAME (#PCDATA) <!ELEMENT HEIGHT (#PCDATA) <!ELEMENT STATE (#PCDATA) ]> • Valid XML <MOUNTAIN> <NAME>Peublo Peak</NAME> <NAME>Taos Mountain</NAME> <STATE>New Mexico</STATE> </MOUNTAIN>
Element Content Specification Modification • Modifying a group of elements • Example <!Doctype FILM [ <!ELEMENT FILM (STAR | NARRATOR | INSTRUCTOR)+> <!ELEMENT STAR (#PCDATA)> <!ELEMENT NARRATOR (#PCDATA)> <!ELEMENT INSTRUCTOR (#PCDATA)> ]> • Valid XML <FILM> <NARRATOR>Sir Gregory Parsloe</NARRATOR> <STAR>ROBERT REDFORD</STAR> <NARRATOR>PLUG BASHMAN</NARRATOR> </FILM>
Element Content Specification Nesting • Nesting in specification • Example <!Doctype FILM [ <!ELEMENT FILM TITLE, CLASS,(STAR | NARRATOR | INSTRUCTOR)+> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT CLASS (#PCDATA)> <!ELEMENT STAR (#PCDATA)> <!ELEMENT NARRATOR (#PCDATA)> <!ELEMENT INSTRUCTOR (#PCDATA)> ]> • Valid XML <FILM> <TITLE>The Net</TITLE> <CLASS>Action</CLASS> <STAR>Sandra Bullock</STAR> </FILM>
Element Content Specification Mixed Content Model • Mixed Content Model: Allows element to contain • Character Data • Child elements in any position and any frequency (zero or more repetitions) • Child elements can be interspersed with data • Character data only • Example <!ELEMENT TITLE (#PCDATA)> • Character data and elements • Example: <!ELEMENT TITLE (#PCDATA | SUBTITLE)+> <!ELEMENT SUBTITLE (#PCDATA)> • Valid XML <TITLE>Moby Dick <SUBTITLE>Or, The Whale</SUBTITLE></TITLE> <TITLE><SUBTITLE>Or, The Whale</SUBTITLE>Moby Dick</TITLE>
Attribute Specification Basics • All attributes in the document need to be specified using an attribute declaration list. It defines • Defines the name of the attribute • Defines the data type of each attribute • Specifies whether an attribute is required or noe • Syntax: <!ATTLIST Name Attdefs> • Name is the name of the element • Attdefs is a series of one or more attribute definitions • Attribute definition Syntax: Name AttType DefaultDecl • Name is the attribute name • AttType is the type of the attribute (CDATA, Token Type, Enumerated) • DefaultDecl specifies if attribute is required & default values • Example: <!ELEMENT FILM (TITLE, (STAR | NARRATOR | INSTRUCTOR))> <!ATTLIST FILM Class CDATA “fictional” Year CDATA #REQUIRED>
Entity Specification Types • There are two kinds of entities in XML documents1 • Character entities (referred by character unicode number) • Named entities, referred to by name
XML Parsing Definition and Types • An XML parser is a program that reads an XML document and makes its contents available for processing • There are two standard types of parsers for XML • Document Object Model (DOM) which makes the document available as a tree • Simple XML Parser (SAX) which associates an event with each tag and each block of text • XML parsers are available from many vendors • Each vendor conforms to the standardized XML interfaces • One of the best parsers is the xerces parser • Suns API for XML parsing is JAXP (supports basic classes and interfaces that a Java XML parser should support) • Often SAX parsers are used for writing DOM parsers
SAX Parser Basics • As the parser scans the document it sends notifications of events, for instance • Element start • Element end • Character sequence between two elements is found • SAX provides standard names for these callback functions that are triggerd by these events void characters (char[] ch, int start, int length): notification of character data void startDocument(): notification of start of document void endDocument(): notification of end of document void startElement(String name, AttributeList atts): notification of start of element void endElement(String name): notification of end of element void processingInstruction(String target, String data): notification of processing instruction
SAX Parser Example From professional JSP page 658
XSLT Parser Definition and Uses • XSLT is an XML structure transforming language • Any treee transforming language needs an ability to refer to tree paths • Xpath is the sub-language underneath XSLT for tree path description • There are two scenarios for use of XSLT • Browser contains an XSLT and uses it to render XML documents • XSLT is used for changing the structure of an existing XML document • To run XSLT the following components are required • Java 1.4 standard development kit • James Clark’s xt (xt.jar)
XSLT Parser Basics • XSLT style sheet is an XML document • Consists of two parts • Standard XML declaration including namespace declaractions • Top level elements that set up the general framework for the output, e.g., variables or import parameters from the command line • Processing involves the following • A current list of nodes from the source document is created by matching a pattern • Output to the current node is generated by instantiating a template corresponding the current pattern • In process of transformation new nodes can be added to the list • The processing begins by processing a list containing the entire document • Transformation ends when the node list is empty
XSLT Parser Example • XSLT
Web Services Definition • Web Services are software programs that use XML to exchange information with other software programs via common Internet protocols. • Web services communicate over the network to provide specific methods that other applications can invoke. • Thus applications residing on different computer can work synergistically by invoking methods on each other • Http is the key protocol used for Web Services. • Characteristics • Programmable • Encapsulate a task • XML based data exchange allows programs on heterogenous platforms to communicate (SOAP) • Self-describing (WSDL) • Discoverable (UDDI)
Web Services SOAP • SOAP – Simple Object Access Protocol • Enables data transfer between systems distributed over a network • A SOAP method send to the a Web Service invokes a method provided by the service • Web Service may return the result via another SOAP message • SOAP consists of standardized XML schemas • Defines a format for transmitting XML messages over network • Includes data types and message structure • Layered over an Internet protocol, such as HTTP and can be used to transfer data across the Web and other networks • Http allows message transfer across firewall since Http messages are usually accepted by firewalls
Web Services SOAP • SOAP message consists of three parts • Envelope • Header • Body • Envelope wraps the entire message and contains header and body • Header (optional) provides information on security and routing • Body contains application specific data that is being transferred • Other alternative to SOAP are XML-RPC • SOAP de facto standard due to simplicity, extensibility and interoperability
Web Services WSDL • WSDL – Web Services Description Language • Provides means to provide information about a web service • Instructions of its use • Capability of the service • Provides information on connection to the service and communicate • Syntax is fairly complex • Normally created using automated tools • Not important to understand the precise syntax of WSDL while developing web services
Web Services UDDI • UDDI – Universal Description, Discovery and Integration • Allows developers and businesses to publish and locate web services on a network via use of registries • The registries can be made private or public • Structure similar to a phone book • White pages contain contact information and textual description • Yellow pages provides classification information about companies and details of company’s electronic capability • Green pages list technical data relating to services and business processes