1 / 33

Enterprise Database Systems XML eXtended Markup Language

This educational collaboration explores structured, semistructured, and unstructured data handling using XML documents. Learn about database systems, data models, and web languages in-depth.

rbenson
Download Presentation

Enterprise Database Systems XML eXtended Markup Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enterprise Database Systems XMLeXtended Markup Language Technological Educational Institution of Larissa in collaboration with Staffordshire University Larissa 2006 Dr. Georgia Garani garani@teilar.gr Dr. Theodoros Mitakosteo_ms@yahoo.com

  2. Agenda • Structured, semistructured, unstructured data • XML Data Model • XML Documents, DTD, XML SCHEMA • XML and Databases

  3. Server Data processing Client Presentation logic Business logic Introduction • Internet Architectures (Two tier, three tier) Monolithic Presentation Logic Business Logic Data Processing Thin client Presentation logic Application Server Business logic Data Server Data processing

  4. Hyperlink Documents - Web languages - Tag languages • HTML (Hypertext markup Language) • Formatting and structuring web documents • XML • Structuring and exchanging data over the Web (structure and meaning). • Formating aspects are defined separately by XSL (Extended Stylesheet Language)

  5. Structured data • Data that have a strict format e.g. data that are stored in a relational database table (the same format for all records in a table) • We design the schema and DBMS checks to ensure that all data follows the structures and constraints specified in the schema.

  6. Semistructured data • In some applications data is collected before it is known how it will be stored and managed. This data may have a structure but not all the information collected will have identical strucuture. E.g. Some attributes may be shared among the various entities but other attributes may exist only in few entities. Moreover additional attribues can be introduced in some of the newer data items in any time and there is no predefined schema. This type of data is known as semistructured data.

  7. Difference between structured and semistructured data • In semistructured data, the schema information is mixed in with the data values, since each data object can have different attributes that are not known in advance. This type of data is called self described data.

  8. Semistructured data as a directed graph rojects project project name workerr location worker Product x 1 number bellaire ssn hours name ssn name hours 25 123 john 30.5 567 mary

  9. The schema information in the semistructured model is intermixed with the objects and their data values in the same data structure. • In the semistructured model there is no requirement for a predefined schema to which the data objects must conform

  10. Unstructured data • In this category of data there is a very limited indication of the type of the data. • E.g. a text document that contains information embedded within it.

  11. <body lang=EN-US style='tab-interval:36.0pt'> • <tr style='mso-yfti-irow:3;mso-yfti-lastrow:yes'> • <td width=197 valign=top style='width:147.6pt;border:solid windowtext 1.0pt; • border-top:none;mso-border-top-alt:solid windowtext .5pt;mso-border-alt:solid windowtext .5pt; • padding:0cm 5.4pt 0cm 5.4pt'> • <p class=MsoNormal>3</p> • </td> • <td width=197 valign=top style='width:147.6pt;border-top:none;border-left: • none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; • mso-border-top-alt:solid windowtext .5pt;mso-border-left-alt:solid windowtext .5pt; • mso-border-alt:solid windowtext .5pt;padding:0cm 5.4pt 0cm 5.4pt'> • <p class=MsoNormal>Kate</p> • </td> • </tr> • </table> • <p class=MsoNormal><o:p>&nbsp;</o:p></p> • <p class=MsoNormal><o:p>&nbsp;</o:p></p> • <p class=MsoNormal><b><span style='color:red'>SEMESTER A<o:p></o:p></span></b></p> • </div> • </body> • </html>

  12. HTML • Web pages with html are considered unstructured data. • Text that appears between angled brackets, <…> is an HTML tag • A tag with a backslash </…> indicates an end tag which represents ending of the effect of a matching start tag. • The tags mark up the document in order to instruct an HTML processor how to display the text between a start tag and a matching end tag • HTML has a very big number of tags but HTML documents are very difficult to interpret automatically by computer programs because they do not include schema information about the type of data in the documents.

  13. Example tags • <html> …</html> • <head> … </head> • <body>…</body> • Attributes describe addiotional properties of the tag. • <tr> • <td width="100%" height="28" colspan="3"> • <p align="center">&nbsp;</td> • </tr>

  14. Example <projects> <project> <Name> Product x </Name> <Number> 1 </Number> <Location> bellaire </Location> <Deptno> 5 </DeptNo> <Worker> <SSN> 123 </SSN> <Name> john </Name> <hours> 30.5</hours> </worker> <Worker> <SSN> 567 </SSN> <Name> mary </Name> <hours> 25 </hours> </worker> </project> <project> . . . </project> . . . </projects>

  15. XML tree data model • Elements and attributes • As in HTML elements are identified in a document by their start tag and end tag. The tag names are enclosed between angled brackets <…>, and end tags are identified by a backslash, </…>. Complex elements are constructed from other elements hierarchically whereas simple elements contain data values.A major difference between XML and HTML is that XML tag names are defined to describe the meaning of the data elements in the document rather than to display how the test is to be displayed. • An XML document can be represented as a tree structure. • XML attributes are used to describe properties and characteristics of the elements within which they appear.

  16. Types of XML Documents • Data-centric XML documents: These documents have man small data items that follow a specific structure and hence may be extracted from a structured database. They are formatted as XML documents in order to exchange them or display them over the web. • Document centric XML documents: These are documents with large amounts of text, such as news aticles or books. There are few or no structured data elements in these documents • Hybrid XML documents: These may have parts that contain structured data and other parts that are predominantly textual or unstructured.

  17. An XML DTD <!DOCTYPE projects [ <!ELEMENT projects (project+)> <!ELEMENT project (Name, Number, Location, DeptNo?, Workers)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Number (#PCDATA)> <!ELEMENT Location (#PCDATA)> <!ELEMENT Deptno (#PCDATA)> <!ELEMENT Workers (Worker*)> <!ELEMENT Worker(SSN, Name,hours)> <!ELEMENT SSN (#PCDATA)> <!ELEMENT Name (#PCDATA)> <!ELEMENT hours(#PCDATA)> ]>

  18. DTD • If an XML document conforms to a predefined XML schema or DTD then the document can be considered as structured data • XML documents that do not conform to any schema are considered as semistructured data. These are called schemaless XML documents.

  19. Well formed XML documents • It must be syntactically correct. It must follow the syntactic guidelines of the tree model. • There must be a single root element and every element must include a matching pair of start and end tags within the start and end tags of the parent element. • A standard set of API functions called DOM (Document Object Model) allows programs to manipulate the resulting tree representation corresponding to a well-formed XML document. The whole document must be parsed beforehand when using DOM. Another API called SAX allows processing of XML documents on the fly by notifying the processing program whenever a start or end tag is encountered.

  20. Notation • A * following the element name means that the element can be repeated zero or more times in the document. • A + following the element name means that the element can be repeated one or more times in the document. • A ? Following the element name means that the element can be repeated zero or one times • An element appearing without any of the preceding three symbols must appear exactly once in the document. • The type of the element is specified via parentheses following the element. If the parentheses include names of other elements these latter elements are the children of the element in the tree structure. If the parentheses include the keyword #PCDATA or one of the other data types available in XML DTD , the element is a leaf node. • A bar symbol (e1| e2) specifies that ither e1 or e2 can appear in the document.

  21. DTD limitations • The data types in DTD are not very general. • DTD has its own syntax and thus requires specialized processors. • All DTD elements are always forced to follow the specified ordering of the document so unordered elements are not permitted.

  22. XML Schema • The XML schema language is a standard for specifying the structure of XML documents. It uses the same syntax rules as regular XML documents, so that the same processors can be used both. • XML instance document or XML document • XML schema document for a document that specifies an XML document.

  23. definitions • Schema descriptions and XML namespaces: It is necessary to identify the specific set of XML schema language elements being used by specifying a file stored at a web locaton. E.g. “http//www.w3.org/2001/XMLSchema”.This definition is called an XML namespace. • Annotations, documentation and language used:the tags xsd:documentation and xsd:annotation are used for providing comments and other descriptions in the XML document. xml:lang element specifies the language being used.

  24. Storing XML documents • Using a DBMS to store the documents as text: A relational or object DBMS can be used to store whole XML documents as text fields within the DBMS records or objects. This approach can be used if the DBMS has a special module for document processing, and would work for storing schemaless and document-centric XML documents • Using a DBMS to store the document contents as data elements; This approach would work for storing a collection of documents that follow a specific XML DTD or XML schema.. Because all the documents have the same structure one can design a relational or object database to store the leaf-level data elements within the XML documents. • Designing a specialized system for storing native XML data: A new type of database system based on a tree model could be designed and implemented. • Creating or publishing customized XML documents from preexisting relational databases: Because there are enormous amounts of data already stored in relational databases, parts of this data may need to be formatted as documents for exchanging or displaying over the web. Use a a separate middleware software layer to handle the conversions needed between the XML documents and the relational database.

  25. Extracting XML documents from databases • Create the appropriate XML hierarchy and the coresponding XML schema document • Create the correct query in SQL to extract the desired informatio for the XML document • Once the query is executed its result must be structured from the flat relational foro to the XML tree structure. • The query can be customized to select either a single object or a multiple objects into the document.

  26. XML QUERYING - XPATH • An Xpath expression returns a collection of element nodes that satisfy certain patterns specified in the expression. The names in the XPath expression are node names in the XML document tree that are either tag (element) names or attribute names, possibly with additional quantifier conditions to further restrict the nodes that satisfy the pattern. Two main separators are used when specifying a path: single slash (/) and double slash (//). • A single slash before a tag specifies that the tag must appear as a direct child of the previous (parent) tag. • A double slash (//) specifies that the tag can appear as a descendant of the previous tag at any level.

  27. examples • /company • /company/department • //employee[employeeSalary gt 1000]/employeeName • /company/ employee[employeeSalary gt 1000]/employeeName • /company/project/projectworker [hours ge 20.0]

  28. XML QUERYING - XQuery • XQuery permits the specification of more general queries on one or more XML documents. The typical form of a query in XQuery is known as FLWR expression, which stands for the four main clauses of XQuery and has the following form: • FOR <variable bindings o individual nodes (elements) • LET <variable bindings to collections of nodes (elements)> • WHERE <qualifier conditions> • RETURN <query result specification>

  29. Examples FOR $x IN Doc(www.company.com/info.xml) //employee[employeeSalary gt 1000]/employeeName RETURN <res> $x/firstName, $x/lastName </res> FOR $x IN Doc(www.company.com/info.xml)/company/employee WHERE $ /employeeSalary gt 1000 RETURN <res> $x /employeeName /firstName, $x /employeeName /lastName </res>

  30. Example - DTD <?xml encoding=“UTF-8”?> <!ELEMENT catalog(product)* > <!ELEMENT product(item)*> <!ATTLIST product product_id CDATA #REQUIRED Product_desc CDATA #REQUIRED> <!ELEMENT item (item_no,price,size)*> <!ATTLIST item gender CDATA #REQUIRED> <!ELEMENT item_no (#PCDATA)> <!ELEMENT price(#PCDATA)> <!ELEMENT size(#PCDATA)>

  31. EXAMPLE XSL - XML XSL: <rule> <target-element type=“item_no”> <!-you write here the formatting information for the item_no element --> </rule> XML: <?xml version = “1.0”?> <?xml-stylesheet href=“catalog.xsl” type=“text/xsl?> <!DOCTYPE catalog SYSTEM “catalog.dtd”> <catalog> <product product_id=“soap” product_desc=“Soap for men”> <item gender=Men’s”> <item_no>SO1111</item_no> <price>2.99 </price> <size>20</size> </item> </product> </catalog>

  32. Executing queries • http://iiserver/virtualroot{?sql=string|?template=XMLtemplate}[{&param=value}...] • http://ntb11901/sample?sql=SELECT+<ROOT>’;SELECT+emp_no+mp_lname+FROM+employee+FOR+XML+RAW;SELECT+’</ROOT>’ • <ROOT> <row emp_lname=“2581”/> <row emp_lname=“9031”/> <row emp_lname=“1010”/> <row emp_lname=“3421”/> <row emp_lname=“4380”/> </ROOT>

  33. http://localhost/sample/queries/simpleselect.xml • <ROOT xmlns:sql=“urn:schemas-microsoft-com:xml-sql”> • <sql:query>SELECT emp_no,emp_lname • FROM employee WHERE emp_no = 28559 • FOR XML AUTO • </sql:query></ROOT> • <ROOT xmlns:sql=“urn:schemas-microsoft-com:xml-sql”> • <EMPLOYEE EMP_NO=“23456” EMP_LNAME=“KATE”> • </ROOT>

More Related