290 likes | 297 Views
Explore the distinctions between structured and semistructured data in XML databases. Learn about hierarchical data models, XML documents, DTD, XML schemas, XML querying, and more.
E N D
Chapter 26 XML and Internet Databases
Outline Structured, Semistructured, & Unstructured Data XML Hierarchical Data Model XML Document, DTD, & XML Schema XML Documents & Databases XML Querying
Structured vs Semistructured Data • Structured Data: e.g., information stored in databases; all records have the same format as defined in the relational schema • Semistructured data may have a certain structure but no all the information collected will have identical structure.
FIGURE 26.2Part of an HTML document representing unstructured data(c.f., the company database schema)
XML Hierarchical (Tree) Data Model • Problem with HTML document: Difficult to interpret automatically by programs because they do not include schema information about the type of data in the documents Inappropriate as intermediate Web documents to be exchanged among various computer sites • Solution XML documents Two main structuring concepts: elements, attributes • c.f.,In XML, tag names are defined to describe the meaning of the data elements, rather than to describe how the text is to be displayed (as in HTML).
Standalone=“yes” - schemaless FIGURE 26.3A complex XML element called <projects>. Correction: <project> Complex elements: <projects>, <project>, <Worker> Simple elements: <Name>, <Number>, <SSN>, …
XML Documents, DTD, and XML Schema • A well-formed XML document is one that follows a few conditions. • Start with an XML declaration (version, …) • Tree model • A single root element • Matching start and end tags for an element must be within the tags of the parent element • Syntactically correct
XML Documents, DTD, and XML Schema • A valid XML document is well formed, and in addition the element names used in the start and end tag pairs must follow the structure specified in a separate XML DTD (Document Type Definition) file or XML schema file. • Figure 26.4: a sample XML DTD called projects * Zero or more, + one or more, ? Zero or one Otherwise: exactly once (data type) (#PCDATA) parsed character data
FIGURE 26.4 An XML DTD file called projects • To use the DTD file: • Store the DTD file in the same file system as the XML document • <?xml version=“1.0” standalone=“no”?> • <!DOCTYPE projects SYSTEM “proj.dtd”>
DTD Limitations • Data types in DTD are not very general • Has its own special syntax and thus requires specialized processors • All DTD elements are always forced to follow the specified ordering of the documents, so unordered elements are not permitted. • Solution XML Schema
FIGURE 26.5 An XML schema file called company Schema namespace the root element company; also an unnamed complex element • “Department”, “Employee”, etc. must be named types. • The selector “employeeDependent” is an attribute of “Employee”, of type “Dependent”. • The field “dependentName” in “Dependent” must be unique.
FIGURE 26.5 (continued)An XML schema file called company. <xsd:uniqu …> specifies a key constraint for non-primary key element. <xsd:key> specifies a primary key. <xsd:keyref> specifies a foreign key; <xsd:selector> refers to the referencing element type; <xsd:field> refers to the referencing attribute.
FIGURE 26.5 (continued)An XML schema file called company Exercise: Define the element “projectWorker” in the type “Project” as an embedded sub-element. Answer: <xsd:element name=“projectWorker” minOccurs=“1” maxOccurs=“unbound”> <xsd:sequence> <xsd:element name=“SSN” type=“xsd:string” /> <xsd:element name=“hours” type=“xsd:float” /> </xsd:sequence> </xsd:element>
XML Documents and Databases • Approaches to Storing XML Documents • Extracting XML Documents from Relational Databases • Breaking Cycles to Convert Graphs into Trees • Other Steps for Extracting XML Documents from Databases
FIGURE 26.6An ER schema diagram for a simplified UNIVERSITY database.
FIGURE 26.7Subset of the UNIVERSITY database schema needed for XML document extraction.
FIGURE 26.8Hierarchical (tree) view with COURSE as the root.
FIGURE 26.10Hierarchical (tree) view with STUDENT as the root.
FIGURE 26.12Hierarchical (tree) view with SECTION as the root.
FIGURE 26.13Converting a graph with cycles into a hierarchical (tree) structure.
XML Query • XPath: Specifying Path Expressions in XML • XQuery: Specifying Queries in XML
FIGURE 26.14Some examples of XPath expressions on XML documents that follow the XML schema file COMPANY in Figure 26.5
FIGURE 26.15Some examples of XQuery queries on XML documents that follow the XML schema file COMPANY in Figure 26.5.
Summary • XML documents • XML & databases