500 likes | 631 Views
Lecture 12 Introduction to XML. Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU. What is XML?. e X tensible M arkup L anguage A markup language much like HTML Main difference between XML and HTML XML is not a replacement for HTML (by XHTML).
E N D
Lecture 12Introduction to XML Presented By Dr. ShazzadHosain Asst. Prof. EECS, NSU
What is XML? eXtensibleMarkupLanguage A markup language much like HTML Main difference between XML and HTML XML is not a replacement for HTML (by XHTML). XML was designed to describe data and to focus on whatdata is. HTML was designed to displaydata and to focus onhow data looks. HTML is about displaying information, while XML is about describinginformation. 2
XML is a Meta-Language A language used to define other languages XML itself has no tags Cannot be used directly to markup documents It is used to define a set of tags Eg, XHTML What is a language? A set of tags Vocabulary: elements and attributes Grammar: where a tag can appear and how they can appear 3
Universal language Application X Database Repository Configuration Documents • XML is a “use everywhere” data specification XML XML XML XML 4
XML Example <bank> <account> <acct-num> A-101 </acct-num> <branch> Downtown </branch> <balance> 500 </balance> </account> <depositor> <acct-num> A-101 </acct-num> <cust-name> Johnson </cust-name> </depositor> </bank> human and machine readable self documenting based on nested tags Unlike in HTML, you can define your own tags XML doc bank account acct-num branch balance depositor acct-num cust-name start tag end tag 5
Benefits of XML Open W3C standard Representation of data across heterogeneous environments Cross platform Allows for high degree of interoperability Strict rules Syntax Structure Case sensitive 6
XML Syntax An XML document are composed of An XML declaration: <?xml version = “1.0” encoding="ISO-8859-1"?> Optional, but should be used to identify the XML version, namespace, and encoding schema to which the doc conforms XML markup, six kinds of Elements (and attributes) Entity references Comments: <!-- what ever you want to say --> Processing instructions: <?instruction options?> Marked sections Document type declarations Content 7
Elements and Attributes Elements (or Tags): <account> </ account> Primary building blocks of an XML document All tags must be closed and nested properly (as in XHTML) Empty tags ok (e.g., < account />) Attributes: < acct-type = “Savings”> Additional information about an element Attribute value can be required, optionalor fixed, and can have a default value An attribute can often be replaced by nested elements All values must be quoted (even for numbers, as in XHTML) Elements and attributes can be named in almost any way you like Should be descriptive and not confusing (human-readable!) No length limit, case sensitive and ok to use the underscore (_) No names are reserved for XML (namespaces can solve naming conflicts) Must not start with a number or punctuation character or XML or Xml Cannot contain spaces, the colon (:), space, greater-than (>) , or less-than (<). Avoid using the hyphen (-) and period (.) 8
Structure of XML Data Tag: label in angle brackets for a section of data Element: section of data beginning with start tag <tagname> and ending with matching end tag </tagname><tagname> <sub-ele-1> </sub-ele-1> <sub-ele-2> </sub-ele-2> </tagname> Can invent our own tags.<ca-super-tag> </ca-super-tag> Every document must have a single top-level element 9
Element Relationship <bank> <account> <acct-num> A-101 </acct-num> <branch> Downtown </branch> <balance> 500 </balance> </account> <depositor> <acct-num> A-101 </acct-num> <cust-name> Johnson </cust-name> </depositor> </bank> XML doc bank account acct-num branch balance depositor acct-num cust-name start tag end tag bankis the root element. Account, depositor are child elements of bank. Acct-num, branch, balanceare siblings(or sister elements) because they have the same parent. 10
Structure of XML Data Elements must be properly nested. <bank> <customer> <customer-name> Hayes </customer-name> <customer-street> Main </customer-street> <customer-city> Harrison </customer-city> <account> <account-number> A-102 </account-number> <branch-name> Perryridge </branch-name> <balance> 400 </balance> </account> <account> <account-number> A-205 </account-number> <branch-name> Oxford street </branch-name> <balance> 329 </balance> </account> </customer> </bank> 11
Improper Nested Elements <bank> <customer> <customer-name> Hayes </customer-name> <customer-street> Main </customer-street> <customer-city> Harrison </customer-city> <account> <account-number> A-102 </account-number> <branch-name> Perryridge </branch-name> <balance> 400 </balance> <account> <account-number> A-205 </account-number> <branch-name> Oxford street </branch-name> <balance> 329 </balance> </account> </bank2> 12
Attributes Elements can have attributes <account acct-type = “checking” > <account-number> A-102 </account-number> <branch-name> Nikaton </branch-name> <balance> 400 </balance> </account> Attributes are specified by name=value pairs inside the starting tag of an element An element may have several attributes, but each attribute name can only occur once <account acct-type = “checking”monthly-fee= “5”> 13
More on XML Syntax Elements without subelements can be abbreviated by ending the start tag with a /> and deleting the end tag <account number=“A-101”branch=“Nikaton”balance=“200” /> To store string data that may contain tags, without the tags being interpreted as sub elements, use CDATA as below <![CDATA[<account>]]> Here, <account> is treated as just strings bad style! should use subelements here. But we needed an example 14
Well-formed Documents A document which conforms to the XML syntax is called well-formed. all tag properly nested single top level element ... How to check this? load the XML-file in your web browser. Use NetBeans many more tools ... 15
Is a Wellformed Document Valid? An example of a document that is well-formed but not valid based upon the XHTML grammar. <body><p>Example of Well-formed HTML</p><head><title>Example</title></head><zorko>What is this?</zorko></body> Why? 16
Namespaces XML is often use for data exchange between organizations. Same tag name may have different meaning in different organizations, causing confusion on exchanged documents. Avoid using long unique names all over document by using defining a short namespace prefix within a document. FB:branch is just an abbreviation to make the file more readable. <bank Xmlns:FB=‘http://www.FirstBank.com’> … <FB:branch> <FB:branchname> Downtown </FB:branchname> <FB:branchcity> Brooklyn </FB:branchcity> </FB:branch>… </bank> The full (qualified) name of the branch element is: http://www.firstbank.com:branch 17
Why do we need namespaces? It is conceivable that several XML languages may want to use the same name for an element or attribute, e.g. <html><table/></html> <furniture><table/></furniture> The meaning of <table> is different in all each context Therefore we need some way of distinguishing between the two Namespaces guarantee unique naming of XML elements and attributes 18
Example <html> <h1>Order</h1> <furniture> <table>Large Oak</table> <table> <tr> <td>2.00 m</td> <td>1.45 m</td> </tr> <table> <furniture> </html> But which table is which???? 19
XML Namespace <hx:html xmlns:hx='http://www.w3.org/TR/REC-html40'> <hx:h1>Order</hx:h1> <fx:furniture xmlns:fx='http://www.furniture.com'> <fx:table>Large Oak</fx:table> <hx:table> <hx:tr> <hx:td>2.00 m</td> <hx:td>1.45 m</td> </hx:tr> <hx:table> <fx:furniture> </hx:html> Prefixes hx is the prefix bound to the html namespace fx is the prefix bound to the furnitureML namespace Local names html, table, tr, td furniture, table Qualified names qName = prefix:localName hx:html, hx:table, hx:tr, hx:td fx:furniture, fx:table 20
Scope of Namespace A namespace comes into scope on the element it is declared on It remains in scope until that element closes It is in scope for all of children of that element Unless the prefix associated with that namespace is (temporarily) overridden by another namespace An overridden namespace comes back into scope… when the one ‘over-ridding’ it goes out of scope 21
Tools of the Trade There are numerous “tools” that can be used to build an XML language – some relatively simple, some much more complex. They include: DTD (Document Type Definition) RELAX TREX RELAX NG XML Schema Schmatron 22
Document Type Definition (DTD) The Document Type Definition, a direct descendant of SGML. DTDs are what are used to describe HTML and XHTML in addition to other markup languages. DTDs possess their own special notation – which is where we will start. In general we can say that there are two main types of DTDs: Internal External Internal DTDs reside within the XML instance file whereas External DTDs reside in an separate DTD document. Later we will find out how to merge the two to formulate a mixed DTD. 23
Internal DTDs An internal DTD is embedded within the XML instance document using the following syntax. <!DOCTYPE root-element [ <!-- DTD declarations --> ] > 24
An Example <?xml version=“1.0”?> <!DOCTYPE body [ <!ELEMENT body (#PCDATA)> <!ATTLIST body color CDATA #IMPLIED> ] > <body color=“blue”>Content Goes Here</body> 25
External DTDs External DTDs come in two forms: LOCAL and PUBLIC Regardless of whether you are using a local or public DTD, to link an external DTD to a document, you must include a DOCTYPE declaration within your XML document just as you should with HTML or XHTML document. 26
Specifying a LOCAL DTD To specify a DTD on a LOCAL, non-public server, you would use the following format for including the DOCTYPE declaration: <!DOCTYPE root-element SYSTEM “url-of-dtd”> 27
An Example <?xml version=“1.0”?> <!DOCTYPE address-book SYSTEM “/httpd/dtds/catalog.dtd”> <address-book> <contact> <last-name>Smith</last-name> <first-name>Joe</first-name> <phone type=“home”>408-555-5555</phone> <email>Joe.Smith@samplemail.com</email> </contact> </address-book> 28
Specifying a PUBLIC DTD To specify a DTD on a PUBLIC server that is widely known and advertised, you would use the following format within the DOCTYPE declaration: <!DOCTYPE root-element PUBLIC “public-identifier” • “url-of-dtd”> 29
An Example <?xml version=“1.0”?> <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”> <html> <head> <title>Sample Document</title> </head> <body>Content Goes Here</body> <html> 30
XML Document Schema • When we build a database, we first design the database schema. • Database schemas constrain what information can be stored in the database. • By requiring that the data conforms to the schema, we give data a meaning. • XML documents are not required to have a schema. • In practice, they are very important. • e.g. when defining a new file format based on XML • or for data exchange 31
XML Schema Mechanisms • Document Type Definition (DTD) • Widely used • XML Schema • Newer, increasing use • A document that confirms with its DTD or XML schema is called valid. • There exist tools to check whether a XML-file is valid. • www.xmlvalidation.com • NetBeans • many others • A validator checks also whether the XML-file is well-formed. 32
Document Type Definition (DTD) • The type of an XML document can be specified using a DTD • DTD constraints structure of XML data • What elements can occur • What attributes can/must an element have • What subelements can/must occur inside each element, and how many times. • DTD does not constrain data types • All values represented as strings in XML • DTD syntax • <!ELEMENT element (subelements-specification)> • <!ATTLIST element (attributes) > 33
Element Specification in DTD • Subelements can be specified as • names of elements, or • #PCDATA (parsed character data), i.e., character strings • EMPTY (no subelements) or ANY (anything can be a subelement) • Example <! ELEMENT depositor (customer-name account-number)> <! ELEMENT customer-name (#PCDATA)> <! ELEMENT account-number (#PCDATA)> • Subelement specification may have regular expressions <!ELEMENT bank ( ( account | customer | depositor)+)> • Notation: • “|” - alternatives • “+” - 1 or more occurrences • “*” - 0 or more occurrences 34
Bank DTD <!ELEMENT bank ( ( account | customer | depositor)+)> <!ELEMENT account (branch-name, balance)> <!ELEMENT branch-name (#PCDATA)> <!ELEMENT balance (#PCDATA)> <!ELEMENT customer (customer-name, customer-street, customer-city)><!ELEMENT customer-name (#PCDATA)> <!ELEMENT customer-street (#PCDATA)> <!ELEMENT customer-city (#PCDATA)> <!ELEMENT depositor (customer-name, account-number)> 35
Note.xml with DTD included <?xmlversion="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to> Tove </to> <from> Jani </from> <heading> Reminder </heading> <body>Don’t forget me this weekend!</body> </note> DTD 36
Attribute specification : for each attribute Name Type of attribute CDATA ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) more on this later Whether mandatory (#REQUIRED) has a default value (value), or neither (#IMPLIED) Examples <!ATTLIST account acct-type CDATA “checking”> <!ATTLIST customer customer-id ID # REQUIRED accounts IDREFS # REQUIRED > Attribute Specification in DTD 37
Bank DTD with Attributes <!ELEMENT bank2 ( ( account | customer )+)><!ELEMENT account (branch, balance)><!ATTLIST account account-number ID #REQUIRED owners IDREFS #REQUIRED><!ELEMENT customer (customer-name, customer-street, customer-city)><!ATTLIST customer customer-id ID #REQUIRED accounts IDREFS #REQUIRED><!ELEMENT branch (#PCDATA)><!ELEMENT balance (#PCDATA)><!ELEMENT customer-name (#PCDATA)><!ELEMENT customer-street (#PCDATA)><!ELEMENT customer-city (#PCDATA)> 38
XML data with ID and IDREF attributes <bank2><accountaccount-number="A-401"owners="C100 C102"><branch> Downtown </branch><balance> 500 </balance></account><accountaccount-number="A-402"owners="C102"><branch> Downtown </branch><balance> 8000 </balance></account><customercustomer-id="C100"accounts="A-401"><customer-name> Joe </customer-name><customer-street> Monroe </customer-street><customer-city> Madison </customer-city></customer><customercustomer-id="C102"accounts="A-401 A-402"><customer-name> Mary </customer-name><customer-street> Erin </customer-street><customer-city> Newark </customer-city></customer></bank2> 39
What is JDOM? JDOM is, quite simply, a Java representation of an XML document. JDOM provides a way to represent that document for easy and efficient reading, manipulation, and writing. It has a lightweight and fast, and optimized API, for the Java programmer. 41
Download the JDOM Library JDOM official site at http://www.jdom.org/downloads/source.html 42
Example XML File <?xml version="1.0"?> <company> <staff> <firstname>yong</firstname> <lastname>mookkim</lastname><nickname>mkyong</nickname><salary>100000</salary> </staff> <staff> <firstname>low</firstname> <lastname>yin fong</lastname> <nickname>fongfong</nickname> <salary>200000</salary> </staff> </company>
RSS • RSS is a simple XML vocabulary for use in news feeds • RSS stands for Really Simple Syndication, among other things • an example is BBC News (view the source to see the XML, since a stylesheet has been applied) • you can subscribe to news feeds using a news reader • or they can be used by developers to create other pages, such as news plotted on a map of the UK: http://www.benedictoneill.com/content/newsmap/
RSS Example <rss> <channel> <title> ... </title> ... <item> <title> ... </title> <description> ... </description> <link> ... </link> <pubDate> ... </pubDate> </item> ... <item> <title> ... </title> <description> ... </description> <link> ... </link> <pubDate> ... </pubDate> </item> <channel> </rss> • a fragment of old news has been stored in rss-fragment.xml • the basic structure is given below