280 likes | 422 Views
Management of XML Documents in Object-Relational Databases. Thomas Kudrass Matthias Conrad HTWK Leipzig. EDBT-Workshop XML-Based Data Management Prague, 24 March 2002. Overview. Motivation Object-Relational Database Concepts Parsing XML Documents XML-to-ORDB Mapping Meta-Data
E N D
Management of XML Documents in Object-Relational Databases Thomas Kudrass Matthias Conrad HTWK Leipzig EDBT-WorkshopXML-Based Data ManagementPrague, 24 March 2002
Overview • Motivation • Object-Relational Database Concepts • Parsing XML Documents • XML-to-ORDB Mapping • Meta-Data • Special Issues • Conclusions
Motivation • Storing of XML documents in DBMS • Use existing database technology • Dealing with complex objects: • XML documents = complex objects • avoid any decomposition • object-relational database technology good choice to represent complex objects
User-Defined Types in ORDB • Complex Data Types • Object Type • Collection Type • Object References • Object Views
Example: Object Types CREATE TYPE Type_Professor AS OBJECT ( PName VARCHAR(80), Subject VARCHAR(120) ); object-valued object table attribute CREATE TYPE Type_Course AS OBJECT ( CREATE TABLE TabProfessor OF Name VARCHAR(100), Type_Professor; Professor Type_Professor );
Example: Collection Types CREATE TYPE Type_Professor AS OBJECT ( PName VARCHAR(80), Subject VARCHAR(120) ); ArrayNested Table CREATE TYPE TypeVa_ Professor AS CREATE TYPE Type_TabProfessor AS VARRAY(5) OF Type_Professor; TABLE OF Type_Professor; CREATE TABLE TabDept ( DName VARCHAR(80), ProfessorType_TabProfessor ) NESTED TABLE Professor STORE AS TabProfessor_List;
Example: Object References CREATE TYPE Type_Professor AS OBJECT ( PName VARCHAR(80), Dept VARCHAR(120) ); CREATE TABLE TabProfessor OF Type_Professor; CREATE TYPE Type_Course ASOBJECT ( Name VARCHAR(200), Prof_Ref REF Type_Professor ); CREATE TABLE TabCourse OF Type_Course; Reference to objects of object table TabProfessor
---------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------- Parsing DTD and XML XML Document DTD Well-Formedness Validity Check Syntax Check XML V2 Parser DTD Parser XML DOM Tree DTD DOM Tree Schema Definition XML2 Oracle JDBC / ODBC DBMS Oracle
1 <!ELEMENT University (StudyCourse,Student*)> 2 <!ELEMENT Student (LName,FName,Course*)> 3 <!ATTLIST Student StudNr CDATA #REQUIRED> 4 <!ELEMENT Course (Name,Professor*,CreditPts?)> 5 <!ELEMENT Professor (PName,Subject+,Dept)> 6 <!ENTITY cs “Computer Science“> 7 <!ELEMENT LName (#PCDATA)> 8 <!ELEMENT FName (#PCDATA)> 9 <!ELEMENT Name (#PCDATA)> 10 <!ELEMENT CreditPts (#PCDATA)> 11 <!ELEMENT PName (#PCDATA)> 12 <!ELEMENT Subject (#PCDATA)> 13 <!ELEMENT Dept (#PCDATA)> 14 <!ELEMENT StudyCourse (#PCDATA)>
DTDClassesTables <!ELEMENT A (B,C)> CLASS A { CREATE TABLE A ( <!ELEMENT C (D)> STRING b; a_pk INTEGER NOT NULL, <!ELEMENT D (#PCDATA)> C c; b VARCHAR(30) NOT NULL); <!ELEMENT B (#PCDATA)> CLASS C { CREATE TABLE C ( STRING d;} c_pk INTEGER NOT NULL, a_fk INTEGER NOT NULL, d VARCHAR(10) NOT NULL); Object–Based–Mapping • Modification of the Mapping Algorithm [Bourret] • No class definitions • Use objects of the DTD tree
1 <!ELEMENT University (StudyCourse,Student*)> 2 <!ELEMENT Student (LName,FName,Course*)> 3 <!ATTLIST Student StudNr CDATA #REQUIRED> 4 <!ELEMENT Course (Name,Professor*,CreditPts?)> 5 <!ELEMENT Professor (PName,Subject+,Dept)> 6 <!ENTITY cs “Computer Science“> 7 <!ELEMENT LName (#PCDATA)> 8 <!ELEMENT FName (#PCDATA)> 9 <!ELEMENT Name (#PCDATA)> 10 <!ELEMENT CreditPts (#PCDATA)> 11 <!ELEMENT PName (#PCDATA)> 12 <!ELEMENT Subject (#PCDATA)> 13 <!ELEMENT Dept (#PCDATA)> 14 <!ELEMENT StudyCourse (#PCDATA)> Step 1 • Each Complex Element Table • Each Set-Valued Element Table • Primary Key in each Table 1 <!ELEMENT University (StudyCourse,Student*)>CREATE TABLE TabUniversity ( IDUniversity 2 <!ELEMENT Student (LName,FName,Course*)> CREATE TABLE TabStudent ( IDStudent 4 <!ELEMENT Course (Name,Professor*,CreditPts?)> CREATE TABLE TabCourse ( IDCourse 5 <!ELEMENT Professor (PName,Subject+,Dept)> CREATE TABLE TabProfessor ( IDProfessor CREATE TABLE TabSubject ( IDSubject
1 <!ELEMENT University (StudyCourse,Student*)> 2 <!ELEMENT Student (LName,FName,Course*)> 3 <!ATTLIST Student StudNr CDATA #REQUIRED> 4 <!ELEMENT Course (Name,Professor*,CreditPts?)> 5 <!ELEMENT Professor (PName,Subject+,Dept)> 6 <!ENTITY cs “Computer Science“> 7 <!ELEMENT LName (#PCDATA)> 8 <!ELEMENT FName (#PCDATA)> 9 <!ELEMENT Name (#PCDATA)> 10 <!ELEMENT CreditPts (#PCDATA)> 11 <!ELEMENT PName (#PCDATA)> 12 <!ELEMENT Subject (#PCDATA)> 13 <!ELEMENT Dept (#PCDATA)> 14 <!ELEMENT StudyCourse (#PCDATA)> Step 2 Other Elements & Attributes Table Columns CREATE TABLE TabCourse ( IDCourse, attrName, attrCreditPts, CREATE TABLE TabProfessor ( IDProfessor, attrPName, attrDept, CREATE TABLE TabSubject ( IDSubject, attrSubject, CREATE TABLE TabUniversity ( IDUniversity, attrStudyCourse, CREATE TABLE TabStudent ( IDStudent, attrStudNr, attrLName, attrFName, CREATE TABLE TblMatrikelNr ( IDMatrikelNr, attrMNummer,
Step 3 Relationships between Elements Foreign Keys CREATE TABLE TabUniversity ( IDUniversity INTEGER NOT NULL, attrStudyCourse VARCHAR(4000) NOT NULL, PRIMARY KEY (IDUniversity)); CREATE TABLE TabStudent ( IDStudent INTEGER NOT NULL, IDUniversity INTEGER NOT NULL, attrStudNr VARCHAR(4000) NOT NULL, attrLName VARCHAR(4000) NOT NULL, attrFName VARCHAR(4000) NOT NULL, PRIMARY KEY (IDStudent), CONSTRAINT conMatrikel FOREIGN KEY (IDUniversity) REFERENCES TabUniversity (IDUniversity)); ...
ORDBS Oracle and XML • Basic Idea: • Generate an object-relational schema from the DTD • Natural representation of an XML document by combining user-defined types • Different Mapping Rules: • Simple elements • Complex elements • Set-valued elements • Complex set-valued elements
XML Attributes & Simple Elements • Elements of #PCDATA type and XML attributes Attributes of the object type • Domain of Simple Elements: • No type information in the DTD: • numeric vs. alphanumeric? • length? • Restrictions of the DBMS (e.g. VARCHAR [Oracle] 4000 characters) • Mapping of an XML attribute of a simple element Definition of an object type for both attribute and element
XML Attributes & Simple Elements <!ELEMENT Professor (PName,Subject,Dept)> <!ATTLIST Professor PAddress CDATA #REQUIRED> <!ELEMENT PName (#PCDATA)> <!ELEMENT Subject (#PCDATA)> <!ELEMENT Dept (#PCDATA)> <!ATTLIST Dept DAddressCDATA #REQUIRED> CREATE TABLE TabProfessor OF Type_Professor; CREATE TYPE Type_Professor AS OBJECT ( attr PAddress VARCHAR(4000), attrPName VARCHAR(4000), attrSubject VARCHAR(4000), attrDept Type_Dept); CREATE TYPE Type_DeptAS OBJECT ( attrDept VARCHAR(4000), attrDAddress VARCHAR(4000));
Complex Elements Nesting of elements by composite DB object types CREATE TABLE TabUniversity ( attrStudyCourseVARCHAR(4000), attrStudent Type_Matrikel ); CREATE TYPE Type_Student AS OBJECT ( attrStudNr VARCHAR(4000), attrLName VARCHAR(4000), attrFName VARCHAR(4000), attrCourse Type_Vorlesung ); CREATE TYPE Type_Course AS OBJECT ( attrName VARCHAR(4000), attrProfessor Type_Professor, attrCreditPts VARCHAR(4000)); CREATE TYPE Type_Professor AS OBJECT ( attrPName VARCHAR(4000), attrSubject VARCHAR(4000), attrDeptVARCHAR(4000)); INSERT INTO TabUniversity VALUES ( ‘Computer Science' , Type_Student('23374','Conrad','Matthias', Type_Course(‘Databases II‘, Type_Professor(‘Kudrass‘ , ‘Database Systems‘', ‘Computer Science‘), '4'))); SELECT u.attrStudent.attrLname FROM TabUniversity u WHERE u.attrStudent.attrCourse.attrProfessor.attrPName = ‘Kudrass';
Set-Valued Elements • Multiple Occurrence (in DTD): marked by + or * • DBMS Restrictions • collection type applicable to set-valued elements with text-valued subelements, e.g. ARRAY OF VARCHAR • collection type not applicable to set-valued elements with complex subelements • subelements may be set-valued again • Solutions • use newer DBMS releases (e.g. Oracle 9i) • model relationships with object references
Set-Valued Elements <!ELEMENT University (StudyCourse,Student*)> Reference to University Objects CREATE TYPE Type_Student AS OBJECT ( attrJahrgang VARCHAR(4000) , attrUniversity REF Type_University ); CREATE TABLE TabStudent OF Type_Student; CREATE TYPE Type_University AS OBJECT( attrStudyCourse VARCHAR(4000)); CREATE TABLE TabUniversity OF Type_University; • Set-valued element Student • Modeling in object type Type_Student with a reference • to objects of the table TabUniversity
Set-Valued Elements CREATE TYPE TypeVA_Course AS VARRAY(100) OF Type_Course; CREATE TYPE TypeVA_Professor AS VARRAY(100) OF Type_Professor; CREATE TYPE TypeVA_Subject AS VARRAY(100) OF VARCHAR(4000); CREATE TABLE TabUniversity ( attrStudyCourseVARCHAR(4000), attrStudent Type_Matrikel ); CREATE TYPE Type_Student AS OBJECT ( attrStudNr VARCHAR(4000), attrLName VARCHAR(4000), attrFName VARCHAR(4000), attrCourse Type_Vorlesung ); CREATE TYPE Type_Course AS OBJECT ( attrName VARCHAR(4000), attrProfessor Type_Professor, attrCreditPts VARCHAR(4000)); CREATE TYPE Type_Professor AS OBJECT ( attrPName VARCHAR(4000), attrSubject VARCHAR(4000), attrDeptVARCHAR(4000));
Set-Valued ElementsExample INSERT INTO TabUniversity VALUES ( ‘Computer Science' , TypeVA_Student ( Type_Student('23374','Conrad','Matthias', TypeVA_Course ( Type_Course(‘Databases II‘, TypeVA_Professor ( Type_Professor(‘Kudrass‘ , TypeVA_Subject ( ‘Database Systems,‘Operating Systems‘), ‘Computer Science‘)),‘4‘), Type_Course(‘CAD Intro‘, TypeVA_Professor ( Type_Professor(‘Jaeger‘ , TypeVA_Subject ( ‘CAD‘,‘CAE‘), ‘Computer Science‘)),‘4‘), ...)), Type_Student(‘00011',‘Meier',‘Ralf', … ) … ) ...);
Dealing with Null Values • Restrictions with NOT NULL constraints in object-relational DB schema • NOT NULL constraints in table - not in object type! • NOT NULL constraints not applicable to collection types • Object-valued attributes: • use CHECK constraints for NOT NULL • Loss of DTD semantics DTD in the database
Dealing with CHECK Constraints <!ELEMENT Course (CName, Address?)> <!ELEMENT Addresse (Street, City?)> CREATE TYPE Type_Address AS OBJECT ( attrStreet VARCHAR(4000), attrCity VARCHAR(4000)); CREATE TYPE Type_Course AS OBJECT ( attrName VARCHAR(4000), attrAddress Type_Address); CREATE TABLE TabCourse OF Type_Course ( attrName NOT NULL, CHECK (attrAdresse.attrStrasse IS NOT NULL)); // ORA-02290:Desired error message 1. INSERT INTO TabCourse ( VALUES (‘CAD Intro’,Type_Address (NULL,’Leipzig’); // ORA-02290:Undesired error message 2. INSERT INTO TabCourse ( VALUES ('RN', NULL)
Meta-Data about XML Documents • Unique DocumentID for each Document • Prolog Information • Document Location (URL) • Name Space • Element vs. Attribute
Naming Conventions for DB Objects • Rules: • TabElementname Table Name • Type_ElementnameObject Type Name • TypeVa_Elementname Array Name • No Conflicts with Keywords • Introduction of a Schema ID • Naming Rule: SchemaID + Naming Convention + Name CREATE TYPE DTD01_Type_University CREATE TYPE DTD02_Type_University AS OBJECT ( AS OBJECT ( attrStudyCourse VARCHAR(4000) ); attrRegister VARCHAR(4000) );
Conclusions: Advantages • Non-atomic domains possible • Natural representation of XML Documents • Nesting of any complexity possible • Simple queries by using dot notation • Using object references to represent relationships (OIDs)
Conclusions: Drawbacks • Mapping Deficiencies • Possible restrictions of element types in collections • No adequate mapping of NOT NULL constraints • Loss of Information • Prolog, Comments, Processing Instructions, Prolog • Entity References • Attribute vs. Element ? • Schema Evolution • Modification of DTD Modification of DB • Type Information • Target type: VARCHAR - not sufficient!
Outlook • Graph-based creation of a schema • Source: XML Schema • Use CLOB datatype • Enhance Meta-Schema • Comments, Processing Instructions and their position in document • Entity references and their substitution text