850 likes | 1.06k Views
XML. Internet Engineering Spring 2014 Bahador Bakhshi CE & IT Department, Amirkabir University of Technology. Questions. Q6) How to define the data that is transferred between web server and client? Q6.1) Which technology? Q6.2) Is data correctly encoded?
E N D
XML Internet Engineering Spring 2014 Bahador Bakhshi CE & IT Department, Amirkabir University of Technology
Questions • Q6) How to define the data that is transferred between web server and client? • Q6.1) Which technology? • Q6.2) Is data correctly encoded? • Q6.3) How to access the data in web pages? • Q6.4) How to present the data?
Homework • Homework 3 • Will be announced
Outline • Introduction • Namespaces • Validation • Presentation • XML Processing (using JavaScript) • Conclusion
Outline • Introduction • Namespaces • Validation • Presentation • XML Processing (using JavaScript) • Conclusion
Introduction • HTML + CSS + JavaScript Dynamic Web pages • Web server is not involved after page is loaded • JavaScript reacts to user events • However, most web applications needs data from server after the page is loaded • e.g., new emails data in Gmail • A mechanism to communication: AJAX • A common (standard) format to exchange data • In most applications, the data is structured
Introduction (cont’d) • In general (not only in web) to store or transport data, we need a common format, to specify the stucture of data; e.g., • Documents: PDF, HTML, DOCx, PPTx, ... • Objects: Java Object Serialization/Deserialization • How to define the data structure? • Binary format (similar to binary files) • Difficult to develop & debug, machine depended, … • Text format (similar to text files) • Human readable, machine independent & easier
Introduction (cont’d) • Example: Data structure of a class • Course name, teacher, # of students, each student information IE Bakhshi 48 Ali Hassani 1111 Babak Hosseini 2222 …. Student num: 48 Name: IE Teacher: Bakhshi Ali Hassani 1111 BabakHosseini 2222 …. class Course{ string name; string teacher; integer num; Array st of Students; } c = new Course(); c.name = IE; c.teacher = Bakhshi; c.num = 48 st[1] = new Student(); st[1].name=Ali; st[2].fam=Hassani ….
Introduction (cont’d) • W3C approach • XML: eXtensibleMarkup Language • A meta-markup language to describe data structure • In each application, a markup language (set of tags & attributes) are defined using XML <course> <title> IE </title> <num> 48 </num> <teacher> Bakhshi </teacher> <students> <student><name>Ali</name> <fam>Hassani</fam> <id> 1111 </id></student> … </students> </course>
Introduction (cont’d) • Standard Generalized Markup Language (SGML) • Expensive, complex to implement • XML: a subset of SGML • Goals: simplicity, generality, and usability • Simplifies SGML by: • leaving out many syntactical options and variants • SGML ~ 600pp, XML ~ 30pp • XML = SGML {complexity, document perspective} + {simplicity, data exchange perspective}
Why to Study XML: Benefits • Simplify data sharing & transport • XML is text based and platform independent • Extensive tools to process XML • To validate, to present, to search, … • In web application, data separation from HTML • E.g., table structure by HTML, table data by XML • Extensible for different applications • A powerful tool to model/describe complex data • E.g., MS Office!!!
XML Document Elements • Markup • Elements • Tag + Content • Attributes • Comments • Processing instructions • Content • Parsed Character Data • Unparsed Character Data (CDATA)
XML Elements • XML element structure • Tag + content <tagname attribute=“value”> Content </tagname> • No predefined tag • If content is not CDATA, is parsed by parser • A value for this element • Child elements of this element
XML Elements’ Attributes • Tags (elements) are customize by attribute • No predefined attributes <os install="factory">Windows</os> <os install="user">Linux</os> • Attribute vs. Tags (elements) • Attributes can be replaced by elements • Attribute cannot be repeated for an element • Attribute cannot have children • Attributes mainly used for metadata, e.g., ID, class
Processing Instructions • Processing instructions pass information (instruction) to the application that process the XML file • They are not a part of user data <?Target String ?> • Common usage <?xml-stylesheet href="URL" type="text/xsl"?> • XML Declaration is a special PI <?xml version="1.0" encoding="UTF-16"?> • XML Declaration is always first line in file
Basic XML Document Structure <?xml version="1.0" encoding="UTF-16"?> <root-tag> <inner-tags> Data </inner-tags> <!-- Comment --> </root-tag>
Example <?xml version="1.1" encoding="UTF-8" ?> <notebook> <name>ThinkPad</name> <model>T500</model> <spec> <hardware> <RAM>4GB</RAM> </hardware> <software> <OS>Linux, FC20 </OS> </software> </spec> </notebook>
Example (CDATA) <?xml version="1.1" encoding="UTF-8" ?> <operator> <mathematic> + - * / % </mathematic> <comparison> <![CDATA[ < <= == >= > != ]]> </comparison> </operator>
XML vs. HTML • Tags • HTML: Predefined fixed tags • XML: No predefined (meta-language) • User defined tags & attributes • Purpose • HTML: Information display • XML: Data structure & transfer • Rules’ strictness • HTML (not XHTLM): loose • XML: strong/strict rule checking
XML in General Application • XML by itself does not do anything • XML just describes the structure of the data • Other applications parse XML and use it • A similar approach is used for formats (event user-defined format); so, what is the advantages of XML?!!! • XML is standard • Available XML tools & technologies XML document XML processor (aka. XML Parser) Apache Xerces SAX, DOM application
XML Technology Components • Data structure (tree) representation • XML document (a text file) • Validation & Conformance • Document Type Definition (DTD) or XML Schema • Element access & addressing • XPath, DOM • Display and transformation • XSLT or CSS • Programming, Database, Query, …
Outline • Introduction • Namespaces • Validation • Presentation • XML Processing (using JavaScript) • Conclusion
Namespaces • In XML, element names are defined by developers • Results in a conflict when trying to mix XML documents from different XML applications • XML file 1 <table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table> • XML file 2 <table> <name>Dinner Table</name> <width>80</width> <length>120</length> </table>
Namespaces • Name conflicts in XML can easily be avoided by using a qualified names according to a prefix • Prefix is the namespaces • Qualified name is the prefixed name • Step 1: Namespace declaration • Defines a label (prefix) for the namespace and associates it to the namespace identifier • URI/URL is used to be universally unique • Step 2: Qualified name • namespace prefix: local name
Namespaces <?xml version="1.0"?> <ceit:course xmlns:ceit="http://ceit.aut.ac.ir"> <ceit:department> <ceit:name> Computer Engineering & Information Technology </ceit:name> </ceit:department> <ceit:name> Internet Engineering </ceit:name> </ceit:course> Actual name of this tag of parser: http://ceit.aut.ac.ir:department
Default Namespaces <alltables> <table xmlns="http://www.w3.org/TR/html4/"> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table> <table xmlns="http://www.dinnertable.com"> <name>Dinner Table</name> <width>80</width> <length>120</length> </table> </alltables>
Outline • Introduction • Namespaces • Validation • Presentation • XML Processing (using JavaScript) • Conclusion
Valid XML • XML is used to describe a structured data • The description must be correct • A valid XML file • Correctness • Syntax • Syntax error parser fails to parse the file • Syntax rules: e.g., all XML tags must be closed • Symantec (structure) • Application specific rules, e.g. student must have ID • Error Application failure
XML Syntax Rules (Well-Formed) • Start-tag and End-tag, or self-closing tag • Tags can’t overlap • XML documents can have only one root element • XML naming conventions • Names can start with letters or the dash (-) character • After the first character, numbers, hyphens, and periods are allowed • Names can’t start with “xml”, in uppercase or lowercase • There can’t be a space after the opening < character • XML is case sensitive • Value of attributes must be quoted • White-spaces are preserved • &, <, > are represented by & < >
How to Validate XML? • 1) Application specific programs need to check structure of XML document • Different applications different programs • Change in data structure code modification • 2) General XML parser + reference document • Reference document • Tag names, attributes, tree structure, tag relations, … • Different reference documents • DTD, XML Schema, RELAX NG
XML Validation (cont’d) • Document Type Definition (DTD) or XML Schema • A language to define document type • The rules of the structure of XML • Internal or External parser interface parser XML-based application XML data DTD / Schema
DTD • DTD is a set of structural rules called declarations, specify • A set of elements and attributes that can be in XML • Where these elements and attributes may appear • <!keyword …> • ELEMENT: to define tags • For leaf nodes: Character pattern • For internal nodes: List of children • ATTLIST: to define tag attributes • Includes: name of the element, the attribute’s name, its type, and a default option
ELEMENT Declaration • General form of internal nodes • <!ELEMENTelement_name(list of children)> • To control the number of times a child may appear • + : One or more • * : Zero or more • ? : Zero or one • General form of leaf nodes • <!ELEMENTelement_name(#type)> • Where, types • PCDATA: Most commonly used, the content will be parsed, • i.e. < > & is not allowed • ANY: Any character can be used • EMPTY: No content
ATTLIST Declaration • <!ATTLISTelement_nameattribute_nameattribute_typedefault_option> • element_name: The name of the corresponding element • attribute_name: The name of attribute • attribute_type: Commonly CDATA is used • default_option: • A value: The default value of the attribute • #REQUIRED: The attribute is mandatory
Example: Internal DTD <?xml version="1.0"?> <!DOCTYPE name [ <!ELEMENT name (first, middle, last)> <!ATTLIST namenickename (#CDATA)> <!ELEMENT first (#PCDATA)> <!ELEMENT middle (#PCDATA)> <!ELEMENT last (#PCDATA)> ]> <name nickname="Jo"> <first>John</first> <middle>Johansen</middle> <last>Smith</last> </name>
External DTD • System <!DOCTYPE root_nameSYSTEM "URL" > • Public <!DOCTYPE root_name PUBLIC "-//name//DTD Name//EN" "URL"> • Common format is FPI(Defined in the document ISO 9070) • Example <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Example: External DTD sample.dtd <!ELEMENT note (to+,from,heading*,main)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT main (#PCDATA)> -------------------------------------------------------------------------- external-dtd.xml <?xml version="1.0" ?> <!DOCTYPE note SYSTEM "sample.dtd" > <note> <to>Ali</to> <to>Hassan</to> <from>Babak</from> <main>This is message</main> </note>
XML Schema • XML Schema describes the structure of an XML file • Also referred to as XML Schema Definition (XSD) • XML Schemas benefits (DTD disadvantages) • Created using basic XML syntax (DTD has its own syntax) • Validate text element content based on built-in and user-defined data types (DTD does not fully support data type) • Similar to OOP • Schema is a class & XML files are instances • Schema specifies • Elements and attributes, where and how often • Data type of every element and attribute
Schema (cont’d) • XML schema is itself a XML-based language • Has its own predefined tags & namespace xmlns:xs="http://www.w3.org/2001/XMLSchema" • Two categories of data types • Simple: Cannot have attribute or nested elements • Primitive: string, Boolean, integer, float, ... • Derived: byte, long, unsignedInt, … • User defined: restriction of base types • Complex: Can have attribute or/and nested elements
XML Schema (cont’d) • Simple element declaration <xs:element name="a name" type="a type" /> • Complex element declaration <xs:element name="a name"> <xs:complexType> <xs:sequence> or <xs:all> <xs:elementname minOccures="…" maxOccures="…"/> </xs:sequence> or </xs:all> </xs:complexType> </xs:element>
XML Schema Example: note.xsd <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="date" type="xs:date"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
XML Schema Example: note.xml <?xml version="1.0"?> <note xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="note.xsd"> <to>Ali</to> <from>Reza</from> <date>1391/1/1 </date> </note>
XML Validation Tools • Online validators • validator.w3.org • www.xmlvalidation.com • XML tools & commands • xmllint commands in Linux • xmllintxmlfile--valid --dtdvalidDTD • xmllintxmlfile--schema schema • XML libraries • LibXML2 for C • Java & C# XML libraries
Outline • Introduction • Namespaces • Validation • Presentation • XML Processing (using JavaScript) • Conclusion
XML Presentation • By default, browsers parses & displays XML files • Tree structure of XML • Syntax checking Well-formed XML • Other presentations of XML • 1) Browsers support CSS for XML files • CSS is used to format the representation of XML • 2) Transform to HTML + CSS using XSLT • A powerful tool to separate data from HTML • 3) Use JavaScript to generate HTML for XML • Parse the XML and create HTML elements
XML & CSS • Attach styling instructions directly to XML <?xml-stylesheet href="URL" type="text/css" ?> • Can style but not rearrange elements • Block or inline style • Bold, italic, underline, font, color, etc. • … Tag_name {color:red; font-weight:bold; font-family:serif;}
CSS Example <?xml version="1.0" encoding="UTF-8"?> <programming> Good programming books: <C> <book> <title>The C Programming Language </title> <author>Ritchie</author> </book> </C> <Java> <book> <title>Thinking in Java </title> <author>Eckel</author> </book> </Java> </programming> ========================================== *{display: block;} programming{font-family: Arial; font-size:20pt;} C{color: blue;} Java{color: green;} author{ font-style:italic;} <?xml-stylesheet type="text/css" href="book.css" ?>
XML & CSS • CSS is mainly designed to format HTML presentation • It does not work well in XML • XML does not have any predefined tags/attributes • No ID, No Class • CSS for XML uses tag names • The same style for all tags with the same name • CSS can only present the data in XML • Cannot process & transform the data into other format • e.g., presenting data as a table
XSL • XSL stands for eXtensible Stylesheet Language, and is a style sheet language for XML documents • Started with XSL & leads to XSLT, XPath, and XSL-FO • Xpath • A language for navigating XML documents • XSLT (XSL Transform) • Transforms XML into other formats, like HTML • XSL-FO (XSL Formatting Objects) • Not discussed here!
XPath • XPath is a language for addressing different parts of XML document • XPath is a syntax for defining parts of an XML • XPath uses path expressions to navigate in XML documents