250 likes | 447 Views
Extensible Markup Language (XML). What is a Markup Language. A syntax and procedure for embedding in text documents tags that control formatting when the documents are viewed by a special application. <b> Hi </b>
E N D
What is a Markup Language • A syntax and procedure for embedding in text documents tags that control formatting when the documents are viewed by a special application. <b> Hi </b> • a set of codes or tags that surrounds content and tells a person or program what that content is (its structure) and/or what it should look like (its format). Markup tags have a distinct syntax that sets them apart from the content that they surround.
History of Markup Languages • 1967: GenCode • 1970s, 1980s :Tex • 1980s: Scribe • Early 80s: SGML • 1991: HTML • Late 90s: XML`
Motivation For XML. • XML is an attempt to package up the important virtues and most-used features of SGML in a compact, easily-implemented package that is optimized for delivery on the WWW. (Bray) • XML is started as a simplified subset of the Standard Generalized Markup Language (SGML), and is designed to be relatively human-legible. By adding semantic constraints, application languages can be implemented in XML. • Data Storage in an organized way. • Fast and easy exchange of Data.
XML Syntax • Header <?xml version="1.0" encoding="utf-8"?> • Comments:- <!-- This is a comment --> • Nesting:- <a> <b> </b> </a> <!-- OK --> • Empty elements:- <a value="123"></a> <a value="123"/> • One root element!
Some Standards that use XML • SVG • MathML • HL7 V. 3.0 and Medical Markup Language (MML)
XML Example <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="book.xsl"?> <dblp> <book mdate="2004-03-08" key="books/acm/Kim95"> <editor>Won Kim</editor> <title>Modern Database Systems: The Object Model, Interoperability, and Beyond.</title> <booktitle>Modern Database Systems</booktitle> <publisher>ACM Press and Addison-Wesley</publisher> <year>1995</year> <isbn>0-201-59098-0</isbn> <url>db/books/collections/kim95.html</url> </book> <book mdate="2002-01-03" key="books/aw/AbiteboulHV95"> <author>Serge Abiteboul</author> <author>Richard Hull</author> <author>Victor Vianu</author> <title>Foundations of Databases.</title> <publisher>Addison-Wesley</publisher> <year>1995</year> <isbn>0-201-53771-0</isbn> <url>db/books/dbtext/abiteboul95.html</url> </book> </dblp>
Defining an XML Language i.e. which <tags> in which order • Not strictly necessary • you can parse/produce XML without formally defining the structure of the language • This is called a Well-formed document • DTDs (“Document Type Definitions”) • Simple, limited • This is called a valid document • XML-Schema • (too?) complex and expressive (includes inheritance, restricted datatypes, ranges) • Data binding • Define (e.g.) java classes + mapping from object tree to XML Document • JAXB, Castor, JSX, NeuroML
DTD <!ELEMENT dblp (article|inproceedings|proceedings|book|incollection| phdthesis|mastersthesis|www)*> <!ENTITY % field "author|editor|title|booktitle|pages|year|address|journal|volume|number|month|url|ee|cdrom|cite|publisher|note|crossref|isbn|series|school|chapter"> ………………….. <!ELEMENT book (%field;)*> <!ATTLIST book key CDATA #REQUIRED mdate CDATA #IMPLIED ………………………. <!ELEMENT author (#PCDATA)> <!ELEMENT editor (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ENTITY % titlecontents "#PCDATA|sub|sup|i|tt|ref"> <!ELEMENT title (%titlecontents;)*> <!ELEMENT booktitle (#PCDATA)>
Databases Overview Data RelationalData XML Secondary Storage Main Memory • RDBMS as • - MySQL • MS SQL Server • Oracle • DB2 Tools as - SAX - DOM - LINQ DBMS as - Sedna
Tools to read/write an XML file • SAX • DOM • LINQ Why we need such tools??? • To make sure that the file is either valid or • well-formed. • To read document in term of entities or • attributes.
SAX (Simple API for XML) • Event based. • … you provide a startElement(), characters() endElement() methods. • You have to keep track of where you are in the tree/document. • Fast, but a bit painful to code. • Mainly adopted in Java.
DOM (Document Object Model) • You get a tree of objects of type “Element” and “Attribute” + methods to navigate the tree. • Contents are all strings, so you have to do data conversion yourself to set ints, floats, your Object types. • Mainly adopted in MS .NET.
LINQ (Language Integrated Query) • Microsoft Property and innovation. • Introduced in Nov 2007 as a library in .NET Framework 3.5. • Very fast and efficient library to query Relational Databases, XML Files or even arrays. • We write SQL like Query to get the information.
Querying Data Data Translator RelationalData XML Secondary Storage Main Memory Standard Query Language (SQL) XQuery
XQUERY • An XML Query Language • W3C Recommendation since 23 January 2007 • e.g: “/dblp/book[author=“John Smith”]” • Return of Query is XML Elements
Query Data (Once Again) Data Translator RelationalData XML Secondary Storage Main Memory Standard Query Language (SQL) Translator XQuery
Related Technologies • XPath • XSLT
XPATH • XPath: a way to refer to specific subset of elements / atributes in a document • Method to navigate in file not to query. • "/dblp" -- the root element • "/dblp/book[1]" -- first book element • "//book" -- all <book> elements • "//@title" -- all title= attributes ... used in XSLT for pattern matching
XSLT (Extensible Stylesheet Language Transformations) <html> <body> <h2>Currently Incollection</h2> <table border="1"> <trbgcolor="#9acd32"> <th align="left">Title</th> <th align="left">Book Title</th> <th align="left">Author</th> <th align="left">Pages</th> <th align="left">Year</th> </tr> <xsl:for-each select="dblp/incollection"> <tr> </body> </html>
XSLT (continued) • Add this line to original file <?xml-stylesheet type="text/xsl" href="book.xsl"?>