560 likes | 658 Views
Understanding XML. An Introduction to XML Sandeep Bhattaram. Summary of Introduction *. HTML was designed to ‘ Display ’ and format data based on ‘ Syntax ’ XML is designed to ‘ Describe ’ and structure data based on ‘ Semantics ’. Types of ‘ Data ’ *. Structured Data
E N D
Understanding XML An Introduction to XML Sandeep Bhattaram
Summary of Introduction * • HTML was designed to ‘Display’ and format data based on ‘Syntax’ • XML is designed to ‘Describe’ and structure data based on ‘Semantics’
Types of ‘Data’ * • Structured Data • Semi-Structured Data • Unstructured Data
Semi-Structured Data - An Example • Schema information is mixed with data objects and values • No predefined schema for the data to conform to.
Unstructured Data • No structure for the data to conform to. • Example : HTML <table> <TR> <TD> XML Class </TD> <TD> very interesting</TD> </TR> </table>
Basics of XML * • XML stands for eXtensible Markup Language • XML is a mark up language • User defines his own tags in XML • XML is Self-Descriptive
Basic example of XML <note> <to>Everyone</to> <from>Sandeep</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> • XML doesn’t “DO” anything • With XML your data is stored outside your HTML
Basics Contd.. • Exchange of data between incompatible systems via XML • XML can be used to Store Data Definition: XML is a cross-platform, software and hardware independent tool for describing and transmitting information.
XML Syntax * • The syntax rules of XML are very simple, self-describing & very strict. <?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>Everyone</to> <from>Sandeep</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
XML Syntax Rules • XML Data Model has TWO structuring concepts • Elements • Attributes
XML Syntax Rules Contd... • All XML elements must have a closing tag • XML tags are case sensitive • All XML elements must be properly nested… NOT -> <B> <I> abc </B> </I>
XML Syntax Rules Contd... • All XML documents must have a root element – Tree Model <root> <child> <subchild>.....</subchild> </child> </root>
XML Syntax Rules Contd... • Attribute values must always be quoted <?xml version="1.0" encoding="ISO-8859-1"?> <note date=12/11/2002> <to>Everyone</to> <from>Sandeep</from> </note> • Comments in XML <!-- This is a comment -->
XML Elements * • Elements classified w.r.t Contents • element content • mixed content • simple content • empty content
XML Elements Contd... • Element – content example <book> <title>My First XML</title> <prod id="33-657” media="paper"></prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> </book>
XML Elements Contd... • XML Elements are Extensible <note> <to>Everyone</to> <from>Sandeep</from> <body>Don't forget me this weekend!</body> </note> <date>2004-04-08</date>. Will the application crash ??
XML Element Naming Rules XML elements must follow these basic naming rules: • Names can contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML or Xml ..) • Names cannot contain spaces
XML Attributes * • <img src="computer.gif"> - HTML • <person sex="female"> - XML • So is this right?? <note day="12" month="11" year="2002" to=“Everyone" from=“Sandeep” heading="Reminder" body="Don't forget me this weekend!"> </note>
XML Attributes Contd... • NO ! • Use Child Elements: • <date>12/11/2002</date> • <date> <day>12</day> <month>11</month> <year>2002</year> </date> Not <date day = “11” month = “12”...></date>
XML Attributes Contd... Problems using attributes • attributes cannot contain multiple values • attributes are not easily expandable • attributes cannot describe structures • attributes are more difficult to manipulate by program code • attribute values are not easy to test against a Document Type Definition (DTD) - which is used to define the legal elements of an XML document When do we use attributes ?
XML Documents * • Basic object in XML • Types: • Data-Centric • Document-Centric • Hybrid XML Documents • Document Declaration <?xml version="1.0" encoding="ISO-8859-1” standalone = “yes”?>
XML Document Type Definition • Well Formed XML Document = syntax • Valid XML Document = Well Formed XML Document + Conforms to DTD/XSD rules. Definition A DTD defines the legal elements of an XML document.
XML DTD Contd... • Inline DTD Document <!DOCTYPE root-element [element-declarations]> • <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Everyone</to> <from>Sandeep</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </note>
XML DTD Contd... • External DTD <!DOCTYPE root-element SYSTEM "filename"> • <!DOCTYPE note SYSTEM "note.dtd"> <note>... </note> • <!DOCTYPE note SYSTEM "http://www.uark.edu/dtd/note.dtd"> <note> … </note>
XML DTD Contd... • Building blocks of XML DTD • Elements • Tags • Attributes • Entities - <, >, &, ", ' • PCDATA – Parsed Character DATA • CDATA – Character Data
XML DTD Contd... • DTD Element Declarations (w.r.t content) • Empty - <!ELEMENT element-name EMPTY> • Character - <!ELEMENT element-name (#PCDATA)> • Any - <!ELEMENT element-name ANY> • Children – <!ELEMENT element-name (child-element-name,child-element-name,.....)> • + (required multivalued), * optional multivalued, | (or), ? (optional singlevalued), required single valued
XML DTD Contd... • XML DTD Attribute Declaration <!ATTLIST element-name attribute-name attribute-type default-value> • Attribute Types • Default Types: value, EMPTY, #REQUIRED, #IMPLIED, #FIXED “value”
XML DTD Contd... Limitations of DTD: • Data types in DTD are not very general • DTD needs specialized processors • Unordered elements are not permitted
XML SCHEMA * • XML Schema is used to structure the XML Document into ‘legal’ blocks. • Advantages: • Supports data types • Written in XML • Facilitates secure data communication.
XML Schema – Key Points XML Schema defines • Elements • Attributes • What are the data types of elements and attributes • Number of Children, Copies for an element • Which elements are children, or have text or are empty • Order of Children
XML Schema-Key Points contd XML Schema supports • Name Spaces df/f:note, df/f:note xmlns:df/f = “www…..” • Data Types • Extensible to future additions XML Schema is a W3C Recommendation now!
XML Schema - Simple Elements * • A simple element is an XML element that can contain only text • <xs:element name="xxx" type="yyy"/> • Example - <lastname>Refsnes</lastname> <xs:element name="lastname" type="xs:string"/> • XML Schema Datatypes: xs:string,xs:decimal,xs:integer,xs:boolean,xs:date,xs:time • <xs:element …… default = “aaa” /> • <xs: element …… fixed = “bbb” />
XML Schema – Attributes * • All attributes are declared as simple types. • Only complex elements can have attributes! • <xs:attribute name="xxx" type="yyy"/> • Example -<lastname lang="EN">Smith</lastname> <xs:attribute name="lang" type="xs:string"/> • fixed = “”, default = “” • use = “optional/required”
XML Schema – Facets * • Facets are restrictions applied on Elements and Attributes • Facets , Constraint used– • Single value, minInclusive maxInclusive etc • Series of values, enumeration • White spaces, whiteSpace • Length, length minLength maxLength etc • Restrictions on Datatypes
XML Schema – Facets on Series of values • Pattern constraint – [],[][]..,([][]..)*, ([][]..)+, ([]|[]|..), [] {} • Example: <xs:element name="letter"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[a-z]"/> ///or [a-zA-Z] </xs:restriction> </xs:simpleType> </xs:element>
XML Schema – Facets on White space characters • <xs:element name="address"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:whiteSpace value="preserve"/> </xs:restriction> </xs:simpleType> </xs:element> • <xs:whiteSpace value="replace"/> • <xs:whiteSpace value="collapse"/>
XML Schema – Facets on Length • Constraints = length, minLength, maxLength • Example: <xs:element name="password"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:length value="8"/> </xs:restriction> </xs:simpleType> </xs:element>
XML Schema –Complex Elements* • A complex element is an XML element that contains other elements and/or attributes. • There are four kinds of complex elements: • empty elements • elements that contain only other elements • elements that contain only text • elements that contain both other elements and text
XML Schema – Complex Elements example – Elements only • EXAMPLE: <employee> <firstname> .. </firstname> <lastname>..</lastname> </employee> <xs:element name="employee" type="personinfo"/> <xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType>
XML Schema – Complex Elements example – Empty elements only • EXAMPLE: <product prodid="1345" /> <xs:element name="product” type="prodtype"/> <xs:complexType name="prodtype"> <xs:attribute name="prodid" type="xs:positiveInteger"/> </xs:complexType> </xs:element> • Similarly for Text Only elements and Mixed elements
XML Schema – Indicators * • Indicators are used to control How these elements are used in the documents. • Order Indicators: All, Choice, Sequence • Occurrence Indicators: maxOccurs, minOccurs • Group Indicators: Group name, attributeGroup name • See Text Book Example.