1 / 29

Understanding XML Databases: Structured, Semistructured, & Unstructured Data Models

Explore the distinctions between structured and semistructured data in XML databases. Learn about hierarchical data models, XML documents, DTD, XML schemas, XML querying, and more.

braunb
Download Presentation

Understanding XML Databases: Structured, Semistructured, & Unstructured Data Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 26 XML and Internet Databases

  2. Outline Structured, Semistructured, & Unstructured Data XML Hierarchical Data Model XML Document, DTD, & XML Schema XML Documents & Databases XML Querying

  3. Structured vs Semistructured Data • Structured Data: e.g., information stored in databases; all records have the same format as defined in the relational schema • Semistructured data may have a certain structure but no all the information collected will have identical structure.

  4. FIGURE 26.1Representing semistructured data as a graph.

  5. FIGURE 26.2Part of an HTML document representing unstructured data(c.f., the company database schema)

  6. XML Hierarchical (Tree) Data Model • Problem with HTML document: Difficult to interpret automatically by programs because they do not include schema information about the type of data in the documents Inappropriate as intermediate Web documents to be exchanged among various computer sites • Solution  XML documents Two main structuring concepts: elements, attributes • c.f.,In XML, tag names are defined to describe the meaning of the data elements, rather than to describe how the text is to be displayed (as in HTML).

  7. Standalone=“yes” - schemaless FIGURE 26.3A complex XML element called <projects>. Correction: <project> Complex elements: <projects>, <project>, <Worker> Simple elements: <Name>, <Number>, <SSN>, …

  8. XML Documents, DTD, and XML Schema • A well-formed XML document is one that follows a few conditions. • Start with an XML declaration (version, …) • Tree model • A single root element • Matching start and end tags for an element must be within the tags of the parent element • Syntactically correct

  9. XML Documents, DTD, and XML Schema • A valid XML document is well formed, and in addition the element names used in the start and end tag pairs must follow the structure specified in a separate XML DTD (Document Type Definition) file or XML schema file. • Figure 26.4: a sample XML DTD called projects * Zero or more, + one or more, ? Zero or one Otherwise: exactly once (data type) (#PCDATA) parsed character data

  10. FIGURE 26.4 An XML DTD file called projects • To use the DTD file: • Store the DTD file in the same file system as the XML document • <?xml version=“1.0” standalone=“no”?> • <!DOCTYPE projects SYSTEM “proj.dtd”>

  11. DTD Limitations • Data types in DTD are not very general • Has its own special syntax and thus requires specialized processors • All DTD elements are always forced to follow the specified ordering of the documents, so unordered elements are not permitted. • Solution  XML Schema

  12. FIGURE 26.5 An XML schema file called company  Schema namespace  the root element company; also an unnamed complex element • “Department”, “Employee”, etc. must be named types. • The selector “employeeDependent” is an attribute of “Employee”, of type “Dependent”. • The field “dependentName” in “Dependent” must be unique.

  13. FIGURE 26.5 (continued)An XML schema file called company. <xsd:uniqu …> specifies a key constraint for non-primary key element. <xsd:key> specifies a primary key. <xsd:keyref> specifies a foreign key; <xsd:selector> refers to the referencing element type; <xsd:field> refers to the referencing attribute.

  14. FIGURE 26.5 (continued)An XML schema file called company Exercise: Define the element “projectWorker” in the type “Project” as an embedded sub-element. Answer: <xsd:element name=“projectWorker” minOccurs=“1” maxOccurs=“unbound”> <xsd:sequence> <xsd:element name=“SSN” type=“xsd:string” /> <xsd:element name=“hours” type=“xsd:float” /> </xsd:sequence> </xsd:element>

  15. FIGURE 26.5 (continued)An XML schema file called company

  16. XML Documents and Databases • Approaches to Storing XML Documents • Extracting XML Documents from Relational Databases • Breaking Cycles to Convert Graphs into Trees • Other Steps for Extracting XML Documents from Databases

  17. FIGURE 26.6An ER schema diagram for a simplified UNIVERSITY database.

  18. FIGURE 26.7Subset of the UNIVERSITY database schema needed for XML document extraction.

  19. FIGURE 26.8Hierarchical (tree) view with COURSE as the root.

  20. FIGURE 26.9XML schema document with COURSE as the root.

  21. FIGURE 26.10Hierarchical (tree) view with STUDENT as the root.

  22. FIGURE 26.11XML schema document with STUDENT as the root.

  23. FIGURE 26.12Hierarchical (tree) view with SECTION as the root.

  24. FIGURE 26.13Converting a graph with cycles into a hierarchical (tree) structure.

  25. XML Query • XPath: Specifying Path Expressions in XML • XQuery: Specifying Queries in XML

  26. FIGURE 26.14Some examples of XPath expressions on XML documents that follow the XML schema file COMPANY in Figure 26.5

  27. FIGURE 26.15Some examples of XQuery queries on XML documents that follow the XML schema file COMPANY in Figure 26.5.

  28. Summary • XML documents • XML & databases

More Related