830 likes | 980 Views
XML-An Introduction. The eXtensible Markup Language (XML) created by the World Wide Web Consortium (W3C) in 1996 to address limitations of HTML XML – a language similar to HTML, but more extensible Supports user defined tags that allow both data and metadata (i.e. data about data) to be
E N D
XML-An Introduction • The eXtensible Markup Language (XML) created by the World Wide Web Consortium (W3C) in 1996 to address limitations of HTML • XML – a language similar to HTML, but more extensible • Supports user defined tags that allow both data and metadata (i.e. data about data) to be stored in a single document • At the same time, presentation aspects remain decoupled from data representation
A Brief History • HTML and XML – are like children of the same parent, Standardized General Markup Language (SGML). • SGML, made a standard of the ISO in 1986 SGML – originated in IBM, which wanted a means of publishing document content in different ways. • The result of the standards process – A rich document markup language, allowing authors to separate logical content from its presentation • SGML, a series of commands understood by another program.
Why Another Markup Language? • The question to be asked is: What's Wrong With SGML or HTML? • SGML is very large, powerful and COMPLEX. • SGML used in industry, for commercial purposes for over a decade. • SGML too complex to program for a Web environment
Why Another Markup Language? …contd • HTML can be thought of as a small application of the SGML used on the web • HTML defines a very simple class of report-style documents, with section headings,paragraphs, lists, tables, and illustrations etc. • It was the first computer language that could be understood and used by the masses. It gave the Web to the common person. • HTML is said to be static, one can do limited things with HTML
XML XML allows users to: • Bring multiple files together to form compound documents • Identify where illustrations are to be incorporated into text files, and the format used to encode each illustration • Provide processing control information to supporting programs, such as document validators and browsers • Add editorial comments to a file
XML Components • XML is based on the concept of documents composed of a series of entities • Each entity can contain one or more logical elements • Each of these elements can have certain attributes (properties) that describe the way in which it is to be processed
XML – Few Important Points • Tag names are case sensitive • Every opening tag must have a corresponding closing tag • A nested tag pair cannot overlap another tag • Attribute values must appear within quotes • Every document must have a root element
XML Editor • XML documents are raw text documents • Any simple text editor can be used as an XML editor • For eg., Windows users can use windows notepad or Wordpad • Microsoft XML editor – Microsoft XML notepad • Java based XML editor
XML Document • <exampledoc> - the root element of the document. • <eq> - a question and its associated answers. • question – a question. • a – the first possible answer to a question. • b – the second possible answer to a question • c – the third possible answer to a question.
XML Document (contd) • <?xml version = “1.0”?> • <exampledoc> • <eq answer = “a”> • <Question> • In 1994, a man had an accident while robbing a pizza restaurant in Akron, Ohio, that resulted in his arrest. What happened to him? • </Question>
XML Document (contd) • <A> he slipped on a patch of grease on the floor and knocked himself out. </A> • <B>he backed into a police car while attempting to drive off. </B> • <C>he choked on a breadstick that he had grabbed as he was running out. </C> • </tt> • </exampledoc>
Viewing XML Document • Style sheet is the best way to view an XML document. • Style sheet is a series of formatting descriptions that determines how elements are displayed on a web page. • In simple english, a style sheet controls how a web page content looks like in a web browser.
A CSS for Example doc XML Document CSS for eq tag: • eq { • Display: block; • Width: 750px; • Padding: 10px; • Margin-bottom: 10px; • Border: 4px double black; • Background color: silver; • }
Style Sheets (contd) • In the absence of a style sheet, internet explorer or any browser just displays the XML code • To attach the style sheet to the document, add the following line of code just after the XML declaration for the document <?xml-stylesheet type =“text/css” href=“exampledoc.css”?>
Is XML a Database? • XML and its surrounding technologies constitute a "database" in the looser sense of the term i.e. database management system (DBMS). • XML provides many of the things found in databases: storage (XML documents), schemas (DTDs, XML schema languages) • Query languages (XQuery, XPath, XQL, XML-QL, QUILT, etc.), programming interfaces (SAX, DOM, JDOM) etc.
Is XML a Database? Contd.. • But lacks many of the things found in real databases: efficient storage, indexes, security,transactions and data integrity, multi-user access, triggers, queries across multiple documents • Use XML documents as database in environments with small amounts of data, few users, and modest performance requirements. • Fails in an environment, with many users, strict data integrity requirements, and the need for good performance.
XML And Databases • XML’s proliferation raises questions: how is data transferred by XML documents to be read, stored and queried. • In other words how do DBMSs handle XML documents??? • Two ways to look at XML Documents: Data-Centric and Document Centric documents.
Data Centric Documents • Data-Centric documents use XML as a data transport • Such documents usually are found in business-to-business applications • Examples: Buyer-supplier trading automation, Sales orders, Flight Schedules, Scientific data • Data-centric documents have a regular structure • Data originates both in the database (in which case we want to expose it as XML) and outside the database (in which case we want to store it in a database)
Example - Data Centric Document <Employees> <Employee JobCode="A1"> <Dept No="1"/> <EmpNo>1234</EmpNo> <FirstName>John</FirstName> <LastName>Doe</LastName> <HireDate>1998-02-11</HireDate> </Employee> <Employee JobCode="B3"> <Dept No="2"/> <EmpNo>5678</EmpNo> <FirstName>Joy</FirstName> <LastName>Black</LastName> <HireDate>1998-03-09</HireDate> </Employee> </Employees>
Data Centric Documents • To manage Data-Centric documents, there need to be data extraction as well as data formatting services • Data Extraction: Receive XML documents from a network, and extract structured data from them, to be stored in a DBMS • To support data extraction, a mapping must be defined between XML documents and the DBMS data model • Data extracted stored in a table, follows a predefined schema. (that is why called a structured representation) • The original XML documents structure is not maintained in this case
Example: Data Extraction <clients> <row> <number> 7369 </number> <firstname> Paul </firstname> <lastname> Smith </lastname> </row> <number> 7000 </number> <firstname> Steve </firstname> <lastname> Adam </ lastname> </row> </clients> • Number First Name Last Name • Steve Adam • 7369 Paul Smith
Data Centric Documents • Data Formatting: XML encoding software, takes result of a query expressed in a DBMS Query language, and encode the resulting data in an XML document to be transferred over the network. • To support data formatting, implement a sort of a reverse formatting with respect to data extraction • After a set of tuples is selected from the database with a database query, data formatting services transform it into an XML document
Data Formatting - Data centric document SelectFirstName, LastName From Clients Where number = “7369” Table Clients Xml document <clients> <row> <firstname> Paul </firstname> <lastname> Smith </lastname> </row> </clients>
Document Centric Documents • In this view, XML documents are application-relevant objects, i.e. new data objects to be stored and managed by a DBMS • The meaning of the XML document depends on the document as a whole. • Structure is more irregular, and data are heterogeneous • Examples: books, email, advertisements • Unlike data-centric documents, they usually do not originate in the database
Document Centric Documents • Document centric documents are application-relevant objects • The meaning of the XML document depends on the document as a whole. • Structure is more irregular, and data are heterogeneous • Unlike data-centric documents, they usually do not originate in the database
Example - Document Centric Documents <Product> <Intro> The <Product Name>Turkey Wrench </Product Name> from <Developer>Full Fabrication Labs, Inc.</Developer> is <Summary>like a monkey wrench, but not as big.</Summary> </Intro> <Description> <Para>The turkey wrench, which comes in <i>both right- and left-handed versions (skyhook optional)</i>,is made of the <b> finest stainless steel</b>. The Ready-grip rubberized handle quickly adapts to your hands your hands, even in the greasiest situations. Adjustment is possible through a variety of custom dials.</Para> <Para>You can:</Para><list> <Item><Link URL="Order.html">Order your own turkey wrench</Link></Item> </list> </Description> </Product>
Document Centric Documents • This type of document requires a DBMS enhanced with new data types for representing XML data types • New capabilities for querying and managing the documents • Two types of data types devised are: • Unstructured representation • Hybrid representation
Document Centric Documents (Unstructured) Unstructured representation: • A single data field inside the DBMS is managed by the DBMS • A single data field outside the DBMS, but linked to the DBMS. In this case the operating system manages it • For unstructured XML documents, DBMSs extend query languages with XML based selection conditions
Example - Unstructured <clients> <row> 10 <number> 7369 </number> <firstname> Paul </firstname> <lastname> Smith </lastname> </row> <row> <number> 7000 </number> <firstname> Steve </firstname> <lastname> Adam </lastname> </row> </clients> Id XML Document <clients> <row> <number> 7369 </number> <firstname> Paul </firstname> <lastname> Smith </lastname> </row> <row> <number> 7000 </number> <firstname> Steve </firstname> <lastname> Adam </lastname> </row> </clients> 10
Document Centric Documents --Hybrid Hybrid Representation: • Combination of Structured and unstructured type. • Useful while mixing types, such as structural information about a book, but unstructured information consisting of the contents or chapters of the book.
Commercial Support In Databases Oracle 8i • Has extended architecture with tools to manage XML documents • Supports structured, unstructured and hybrid representation of XML documents • XML-SQL utility supports data extraction and data formatting for data-centric documents • Document-centric data stored using CLOB (character large object)
Commercial Support In Databases IBM DB2 • The XML Extender provides features to store and manage XML documents • Handles structured, unstructured as well as hybrid types • Data centric documents stored in a set of relational tables containing data extracted from XML documents • The Extender supports storage and access methods to compose an XML document from existing data or decompose data from an XML document
Commercial Support In Databases Document-centric documents stored as either XMLClob or XMLVarChar or XML File Microsoft SQL Server • Data-centric: The OpenXML function extracts data from XML document and stores it in a relational database • Extending the Select-From-Where statement with the FOR XML clause provides XML formatting of a query language • Permits construction of XDR Schemas: Schemas that generate views of the database in XML format, which can be queried with XPath.
Data-centric and Document-centric • In practice, the distinction between data-centric and document-centric documents is not always clear. • For example, a data-centric document, such as an invoice, might contain irregularly structured data, such as a part description. • An otherwise document-centric document, such as a user's manual, might contain regularly structured data (often metadata), such as an author's name and a revision date.
Document Schema,Database Schema • A schema is a set of rules that defines the structure of any document or database • Database schema describes over all structure of the database. • Document schema describes exact elements and attributes available with in a given markup language along with association between attributes and elements and relationship between elements • The schema will allow XML documents to be validated for accuracy
Document schemas • There are two different approaches for creating schemas in XML documents Document Type Definition(DTD) XML Schema Definition(XSD) • A DTD describes vital information about the structure of XML document i.e, it lists element types,attributes and their relationships to each other • It sets out what names are to be used for the different types of element, where they may occur, and how they all fit together
Limitations of DTD • Non –XML syntax • No data-type facility • Employs a closed-data model which does not allow much flexibility to extend markup languages
XML Schema • XSDs are not only significant in defining XML structures but also in providing data type capabilities to XML • Coded in XML tags • Supports Integrity constraints such as Primary and foreign keys etc. • Represents an open-ended data model allowing to extend custom markup languages and establishing complex relationships between elements
Mapping Document Schemas to Database Schema • Two mappings used commonly: Table-based mapping and Object-relational mapping • The data transfer software is built on top of this mapping. • Use an XML query language (such as XPath, XQuery, or a proprietary language) OR • Simply transfer data according to the mapping (the XML equivalent of SELECT *FROM Table).
Table Based mapping • Used by many of the middleware products that transfer data between an XML document and a relational database • It models XML documents as a single table or set of tables. That is, the structure of an XML document must be as follows: <database> <table> <row> <column1>...</column1> <column2>...</column2> ... </row> <row> ... </row> ... </table> <table> ... </database>
Table based mapping • Advantages Its simplicity because it matches structure of tables and result sets in relational databases Mainly useful for transferring data between databases • Disadvantages Applies to only limited set of XML documents It doesn't exploit ability of XML to represent hierarchies of data It doesn’t preserve physical structure i.e., DTD
Object-relational mapping • The object-relational mapping is used by all XML-enabled relational databases and some middleware products. • It models the data in the XML document as a tree of objects that are specific to the data in the document. • Object–relational mapping is done in two steps : • An Document Schema( DTD) is mapped to object schema • The object schema is mapped to database schema
Object-relational mapping Contd.. • In this model, element types with attributes are generally modeled as classes. • The model is then mapped to relational databases using traditional object-relational mapping techniques • i.e. Classes are mapped to tables, scalar properties are mapped to columns, and object valued properties are mapped to primary key / foreign key
Object-relational mapping - contd For example , consider the following XML document : <SalesOrder> <Number>1234</Number> <Customer>ABC Industries</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>123</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>456</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item> </SalesOrder>
Object-relational mapping Contd.. Which maps to the following objects : Object SalesOrder { number = 1234; Customer = “ABC Industries”; orderdate = 12.15.98; Items = { ptrs to Item Objects}; Object Item{ Number = 2; Part = “456”; Quantity = 600; Price =3.99; } Object Item{ Number = 1; Part = “123”; Quantity = 12; Price = 10.95; }
Object-relational mapping Contd.. and then to rows in the following tables: SaleOrders ---------- Number Customer Date ------ -------------------- -------- 1234 ABC Industries 29.10.00 ... ... ... ... ... ... Items ----- SONumber Item Part Quantity Price -------- ---- ---- -------- ----- 1234 1 123 12 10.95 1234 2 456 600 3.99 ... ... ... ... ...
Query Languages • Use of XSLT or Integrate limited number of transformations into mappings • Long Term:Implementation of query languages that return XML • Almost all of XML query languages (including XQuery 1.0) are read-only, so different means needed to insert, update, and delete data • In the long term, XQuery will add these capabilities
Template-Based Query Languages • Most of these languages rely on SELECT statements embedded in templates <?xml version="1.0"?> <FlightInfo> <Introduction>The following flights have available seats:</Introduction> <SelectStmt>SELECT Airline, FltNumber, Depart, Arrive FROM Flights</SelectStmt> <Flight> <Airline>$Airline</Airline> <FltNumber>$FltNumber</FltNumber>