1 / 82

XML-An Introduction

XML-An Introduction. The eXtensible Markup Language (XML) created by the World Wide Web Consortium (W3C) in 1996 to address limitations of HTML XML – a language similar to HTML, but more extensible Supports user defined tags that allow both data and metadata (i.e. data about data) to be

blythe
Download Presentation

XML-An Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML-An Introduction • The eXtensible Markup Language (XML) created by the World Wide Web Consortium (W3C) in 1996 to address limitations of HTML • XML – a language similar to HTML, but more extensible • Supports user defined tags that allow both data and metadata (i.e. data about data) to be stored in a single document • At the same time, presentation aspects remain decoupled from data representation

  2. A Brief History • HTML and XML – are like children of the same parent, Standardized General Markup Language (SGML). • SGML, made a standard of the ISO in 1986 SGML – originated in IBM, which wanted a means of publishing document content in different ways. • The result of the standards process – A rich document markup language, allowing authors to separate logical content from its presentation • SGML, a series of commands understood by another program.

  3. Why Another Markup Language? • The question to be asked is: What's Wrong With SGML or HTML? • SGML is very large, powerful and COMPLEX. • SGML used in industry, for commercial purposes for over a decade. • SGML too complex to program for a Web environment

  4. Why Another Markup Language? …contd • HTML can be thought of as a small application of the SGML used on the web • HTML defines a very simple class of report-style documents, with section headings,paragraphs, lists, tables, and illustrations etc. • It was the first computer language that could be understood and used by the masses. It gave the Web to the common person. • HTML is said to be static, one can do limited things with HTML

  5. Advantages of XML over HTML

  6. XML XML allows users to: • Bring multiple files together to form compound documents • Identify where illustrations are to be incorporated into text files, and the format used to encode each illustration • Provide processing control information to supporting programs, such as document validators and browsers • Add editorial comments to a file

  7. XML Components • XML is based on the concept of documents composed of a series of entities • Each entity can contain one or more logical elements • Each of these elements can have certain attributes (properties) that describe the way in which it is to be processed

  8. XML – Few Important Points • Tag names are case sensitive • Every opening tag must have a corresponding closing tag • A nested tag pair cannot overlap another tag • Attribute values must appear within quotes • Every document must have a root element

  9. XML Editor • XML documents are raw text documents • Any simple text editor can be used as an XML editor • For eg., Windows users can use windows notepad or Wordpad • Microsoft XML editor – Microsoft XML notepad • Java based XML editor

  10. XML Document • <exampledoc> - the root element of the document. • <eq> - a question and its associated answers. • question – a question. • a – the first possible answer to a question. • b – the second possible answer to a question • c – the third possible answer to a question.

  11. XML Document (contd) • <?xml version = “1.0”?> • <exampledoc> • <eq answer = “a”> • <Question> • In 1994, a man had an accident while robbing a pizza restaurant in Akron, Ohio, that resulted in his arrest. What happened to him? • </Question>

  12. XML Document (contd) • <A> he slipped on a patch of grease on the floor and knocked himself out. </A> • <B>he backed into a police car while attempting to drive off. </B> • <C>he choked on a breadstick that he had grabbed as he was running out. </C> • </tt> • </exampledoc>

  13. Viewing XML Document • Style sheet is the best way to view an XML document. • Style sheet is a series of formatting descriptions that determines how elements are displayed on a web page. • In simple english, a style sheet controls how a web page content looks like in a web browser.

  14. A CSS for Example doc XML Document CSS for eq tag: • eq { • Display: block; • Width: 750px; • Padding: 10px; • Margin-bottom: 10px; • Border: 4px double black; • Background color: silver; • }

  15. Style Sheets (contd) • In the absence of a style sheet, internet explorer or any browser just displays the XML code • To attach the style sheet to the document, add the following line of code just after the XML declaration for the document <?xml-stylesheet type =“text/css” href=“exampledoc.css”?>

  16. Is XML a Database? • XML and its surrounding technologies constitute a "database" in the looser sense of the term i.e. database management system (DBMS). • XML provides many of the things found in databases: storage (XML documents), schemas (DTDs, XML schema languages) • Query languages (XQuery, XPath, XQL, XML-QL, QUILT, etc.), programming interfaces (SAX, DOM, JDOM) etc.

  17. Is XML a Database? Contd.. • But lacks many of the things found in real databases: efficient storage, indexes, security,transactions and data integrity, multi-user access, triggers, queries across multiple documents • Use XML documents as database in environments with small amounts of data, few users, and modest performance requirements. • Fails in an environment, with many users, strict data integrity requirements, and the need for good performance.

  18. XML And Databases • XML’s proliferation raises questions: how is data transferred by XML documents to be read, stored and queried. • In other words how do DBMSs handle XML documents??? • Two ways to look at XML Documents: Data-Centric and Document Centric documents.

  19. Data Centric Documents • Data-Centric documents use XML as a data transport • Such documents usually are found in business-to-business applications • Examples: Buyer-supplier trading automation, Sales orders, Flight Schedules, Scientific data • Data-centric documents have a regular structure • Data originates both in the database (in which case we want to expose it as XML) and outside the database (in which case we want to store it in a database)

  20. Example - Data Centric Document <Employees> <Employee JobCode="A1"> <Dept No="1"/> <EmpNo>1234</EmpNo> <FirstName>John</FirstName> <LastName>Doe</LastName> <HireDate>1998-02-11</HireDate> </Employee> <Employee JobCode="B3"> <Dept No="2"/> <EmpNo>5678</EmpNo> <FirstName>Joy</FirstName> <LastName>Black</LastName> <HireDate>1998-03-09</HireDate> </Employee> </Employees>

  21. Data Centric Documents • To manage Data-Centric documents, there need to be data extraction as well as data formatting services • Data Extraction: Receive XML documents from a network, and extract structured data from them, to be stored in a DBMS • To support data extraction, a mapping must be defined between XML documents and the DBMS data model • Data extracted stored in a table, follows a predefined schema. (that is why called a structured representation) • The original XML documents structure is not maintained in this case

  22. Example: Data Extraction <clients> <row> <number> 7369 </number> <firstname> Paul </firstname> <lastname> Smith </lastname> </row> <number> 7000 </number> <firstname> Steve </firstname> <lastname> Adam </ lastname> </row> </clients> • Number First Name Last Name • Steve Adam • 7369 Paul Smith

  23. Data Centric Documents • Data Formatting: XML encoding software, takes result of a query expressed in a DBMS Query language, and encode the resulting data in an XML document to be transferred over the network. • To support data formatting, implement a sort of a reverse formatting with respect to data extraction • After a set of tuples is selected from the database with a database query, data formatting services transform it into an XML document

  24. Data Formatting - Data centric document SelectFirstName, LastName From Clients Where number = “7369” Table Clients Xml document <clients> <row> <firstname> Paul </firstname> <lastname> Smith </lastname> </row> </clients>

  25. Document Centric Documents • In this view, XML documents are application-relevant objects, i.e. new data objects to be stored and managed by a DBMS • The meaning of the XML document depends on the document as a whole. • Structure is more irregular, and data are heterogeneous • Examples: books, email, advertisements • Unlike data-centric documents, they usually do not originate in the database

  26. Document Centric Documents • Document centric documents are application-relevant objects • The meaning of the XML document depends on the document as a whole. • Structure is more irregular, and data are heterogeneous • Unlike data-centric documents, they usually do not originate in the database

  27. Example - Document Centric Documents <Product> <Intro> The <Product Name>Turkey Wrench </Product Name> from <Developer>Full Fabrication Labs, Inc.</Developer> is <Summary>like a monkey wrench, but not as big.</Summary> </Intro> <Description> <Para>The turkey wrench, which comes in <i>both right- and left-handed versions (skyhook optional)</i>,is made of the <b> finest stainless steel</b>. The Ready-grip rubberized handle quickly adapts to your hands your hands, even in the greasiest situations. Adjustment is possible through a variety of custom dials.</Para> <Para>You can:</Para><list> <Item><Link URL="Order.html">Order your own turkey wrench</Link></Item> </list> </Description> </Product>

  28. Document Centric Documents • This type of document requires a DBMS enhanced with new data types for representing XML data types • New capabilities for querying and managing the documents • Two types of data types devised are: • Unstructured representation • Hybrid representation

  29. Document Centric Documents (Unstructured) Unstructured representation: • A single data field inside the DBMS is managed by the DBMS • A single data field outside the DBMS, but linked to the DBMS. In this case the operating system manages it • For unstructured XML documents, DBMSs extend query languages with XML based selection conditions

  30. Example - Unstructured <clients> <row> 10 <number> 7369 </number> <firstname> Paul </firstname> <lastname> Smith </lastname> </row> <row> <number> 7000 </number> <firstname> Steve </firstname> <lastname> Adam </lastname> </row> </clients> Id XML Document <clients> <row> <number> 7369 </number> <firstname> Paul </firstname> <lastname> Smith </lastname> </row> <row> <number> 7000 </number> <firstname> Steve </firstname> <lastname> Adam </lastname> </row> </clients> 10

  31. Document Centric Documents --Hybrid Hybrid Representation: • Combination of Structured and unstructured type. • Useful while mixing types, such as structural information about a book, but unstructured information consisting of the contents or chapters of the book.

  32. Example -- Hybrid

  33. Commercial Support In Databases Oracle 8i • Has extended architecture with tools to manage XML documents • Supports structured, unstructured and hybrid representation of XML documents • XML-SQL utility supports data extraction and data formatting for data-centric documents • Document-centric data stored using CLOB (character large object)

  34. Commercial Support In Databases IBM DB2 • The XML Extender provides features to store and manage XML documents • Handles structured, unstructured as well as hybrid types • Data centric documents stored in a set of relational tables containing data extracted from XML documents • The Extender supports storage and access methods to compose an XML document from existing data or decompose data from an XML document

  35. Commercial Support In Databases Document-centric documents stored as either XMLClob or XMLVarChar or XML File Microsoft SQL Server • Data-centric: The OpenXML function extracts data from XML document and stores it in a relational database • Extending the Select-From-Where statement with the FOR XML clause provides XML formatting of a query language • Permits construction of XDR Schemas: Schemas that generate views of the database in XML format, which can be queried with XPath.

  36. Data-centric and Document-centric • In practice, the distinction between data-centric and document-centric documents is not always clear. • For example, a data-centric document, such as an invoice, might contain irregularly structured data, such as a part description. • An otherwise document-centric document, such as a user's manual, might contain regularly structured data (often metadata), such as an author's name and a revision date.

  37. Document Schema,Database Schema • A schema is a set of rules that defines the structure of any document or database • Database schema describes over all structure of the database. • Document schema describes exact elements and attributes available with in a given markup language along with association between attributes and elements and relationship between elements • The schema will allow XML documents to be validated for accuracy

  38. Document schemas • There are two different approaches for creating schemas in XML documents Document Type Definition(DTD) XML Schema Definition(XSD) • A DTD describes vital information about the structure of XML document i.e, it lists element types,attributes and their relationships to each other • It sets out what names are to be used for the different types of element, where they may occur, and how they all fit together

  39. Limitations of DTD • Non –XML syntax • No data-type facility • Employs a closed-data model which does not allow much flexibility to extend markup languages

  40. XML Schema • XSDs are not only significant in defining XML structures but also in providing data type capabilities to XML • Coded in XML tags • Supports Integrity constraints such as Primary and foreign keys etc. • Represents an open-ended data model allowing to extend custom markup languages and establishing complex relationships between elements

  41. Mapping Document Schemas to Database Schema • Two mappings used commonly: Table-based mapping and Object-relational mapping • The data transfer software is built on top of this mapping. • Use an XML query language (such as XPath, XQuery, or a proprietary language) OR • Simply transfer data according to the mapping (the XML equivalent of SELECT *FROM Table).

  42. Table Based mapping • Used by many of the middleware products that transfer data between an XML document and a relational database • It models XML documents as a single table or set of tables. That is, the structure of an XML document must be as follows: <database> <table> <row> <column1>...</column1> <column2>...</column2> ... </row> <row> ... </row> ... </table> <table> ... </database>

  43. Table based mapping • Advantages Its simplicity because it matches structure of tables and result sets in relational databases Mainly useful for transferring data between databases • Disadvantages Applies to only limited set of XML documents It doesn't exploit ability of XML to represent hierarchies of data It doesn’t preserve physical structure i.e., DTD

  44. Object-relational mapping • The object-relational mapping is used by all XML-enabled relational databases and some middleware products. • It models the data in the XML document as a tree of objects that are specific to the data in the document. • Object–relational mapping is done in two steps : • An Document Schema( DTD) is mapped to object schema • The object schema is mapped to database schema

  45. Object-relational mapping Contd.. • In this model, element types with attributes are generally modeled as classes. • The model is then mapped to relational databases using traditional object-relational mapping techniques • i.e. Classes are mapped to tables, scalar properties are mapped to columns, and object valued properties are mapped to primary key / foreign key

  46. Object-relational mapping - contd For example , consider the following XML document : <SalesOrder> <Number>1234</Number> <Customer>ABC Industries</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>123</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>456</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item> </SalesOrder>

  47. Object-relational mapping Contd.. Which maps to the following objects : Object SalesOrder { number = 1234; Customer = “ABC Industries”; orderdate = 12.15.98; Items = { ptrs to Item Objects}; Object Item{ Number = 2; Part = “456”; Quantity = 600; Price =3.99; } Object Item{ Number = 1; Part = “123”; Quantity = 12; Price = 10.95; }

  48. Object-relational mapping Contd.. and then to rows in the following tables: SaleOrders ---------- Number Customer Date ------ -------------------- -------- 1234 ABC Industries 29.10.00 ... ... ... ... ... ... Items ----- SONumber Item Part Quantity Price -------- ---- ---- -------- ----- 1234 1 123 12 10.95 1234 2 456 600 3.99 ... ... ... ... ...

  49. Query Languages • Use of XSLT or Integrate limited number of transformations into mappings • Long Term:Implementation of query languages that return XML • Almost all of XML query languages (including XQuery 1.0) are read-only, so different means needed to insert, update, and delete data • In the long term, XQuery will add these capabilities

  50. Template-Based Query Languages • Most of these languages rely on SELECT statements embedded in templates <?xml version="1.0"?> <FlightInfo> <Introduction>The following flights have available seats:</Introduction> <SelectStmt>SELECT Airline, FltNumber, Depart, Arrive FROM Flights</SelectStmt> <Flight> <Airline>$Airline</Airline> <FltNumber>$FltNumber</FltNumber>

More Related