360 likes | 520 Views
Tamino – a DBMS Designed for XML. Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University. Abstract. Who?- Software AG What?- XML database management system When? 1999 the first time unveiled 2004 June Tamino XML Server 4.2
E N D
Tamino –a DBMS Designed for XML Dr. Harald Schoning Presenter: Wenhui Li University of Ottawa Instructed by: Dr. Mengchi Liu Carleton University
Abstract • Who?- Software AG • What?- XML database management system • When? • 1999 the first time unveiled • 2004 June Tamino XML Server 4.2 • Why? • management and transfer of structured and unstructured data • completely designed for XML
Industry Background • XML is becoming prevailing for data processing in the internet. • Early goals of Tamino • Easy data exchanging • Evolution trend • Storing, managing, publishing and exchanging XML documents • Business modeling
Industry Background cont’XML support in databases • Oracle XML Developer’s Kit • SQL Server 2000 • DB2 XML Extender
Limitations of XML support via traditional RDBMS or ORDB • XML is not well-structured like RDB,ORDB or OODB • Storing and querying XML is possible but not feasible in these DB systems
Two Modeling approaches • Data-centric documents • Regular structure • Order does not matter • No mixed content • Document-centric documents • less regular structure • significance of the order • mixed content
Why don’t use relational DB • XML documents can have schematic information (DTD), but they are notrequired to. • classical database handling objects of a predefined type, cannot be applied in XML
Why doesn’t use XML itself? • XML is just a markup language, it does not contain processing facilities on its own • querying a set of XML documents is outside the scope of the XML recommendation Therefore, comes the Tamino!
What does Tamino do? • What’s Tamino (the 1st slide) • Store XML documents, HTML files and GIF images, etc. • Retrieve them in a set-oriented manner, with sophisticated query facilities
The schema of XML documents • XML support schematic information, but it differs from the classical databases • DTD have a couple of deficiencies (e.g. data type) • W3C working group is developing an XML schema description language • However, DTD is the only standard schema at present
XML schema vs. RDB and OODB schema • In RDB or OODB, the schema is created before the instances can be stored • Instances must conform to the declared schema • In XML database, each instance declares a schema on its own. • for XML documents, grouping of objects of homogeneous structure into (pre-defined) tables or classes doesn’t work
Query and Index of XML schema • Queries operate on sets • Indexes are defined on the basis of a common schema • For the purpose of querying, arbitrary objects could be grouped to sets • Index definition also requires at least a common subset in the structure
Schema handling in Tamino • Grouping documents by open content model + user-directed document grouping • Documents grouped into collections • Within a collection, declare several document types • For each document type define a common schema (open content model) • For each document, Tamino assigns one of the document type
Type Assignment • Assignment is based on the root element type • Document must match the schema of the document type assigned, but might have additional elements/attributes • In a document type, documents might differ considerably • If no appropriate document type, document is stored without any schema checking
Document accepted by Tamino <City Inhabitants=”138000”> <Name>Darmstart</Name> <Addition>The city of art nouveaud</Addtion> <Monument Height=”39m”> <Name>Langer Ludwig</Name> <Location> <Name>Luisenplatz<Name> <MapIndex>M5</MapIndex> </Location> </Monument> </City>
Is an element/attribute should be modeled? • an index will be defined on this element/attribute • the element/attribute is to be mapped to an external data source or to a server extension • dedicated access rights will be defined on the element/attribute • the presence / multiplicity of the element is to be enforced • one of the above conditions hold for a child of the element
Indexing of Tamino • value-based indexes • well known from traditional database systems • used to accelerate the search • exactly address the data object • names need not be unique within a DTD
Example of value-based index • value-based indexes • data-centric view <!ELEMENT City (Name, Inhabitants, Monument+)> <!ELEMENT Monument (Name, Description)> <!ELEMENT Inhabitants (#PCDATA)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Description (#PCDATA)>
Indexing of Tamino (cont’) • text indexing • document-centric view • limit the scope to a specific part of the document • the scope might span element content
Example of text index • text indexing • document-centric view <statement> <author> <firstname>Harald></firstname> <lastname>Schoning</lastname> </author> <text> X<italic>M</italic>L and X<italic>S</italic>L are <stressed>very</stressed> important </text> </speech>
Indexing of Tamino (cont’) • structural index • If multiplicity permits the omission of elements • or if no DTD is known • Example • in a database of all European cities • search all those cities which have an element called “beach”
Querying XML documents • Currently, there is no standardized query language • XPath allows positioning within a single document • XPath fits well the needs of retrieval in data-centric environments • document-centric environments need a more content-based retrieval facility • Tamino also supports full text search
Expectation for XML processor • W3C:XML recommendation specifies the handling of entities, comments and processing instructions. • User: Tamino, leave comments intact, no processing instruction evaluated, leave entity references unresolved. • User: the output of a Tamino query should match the specification of an XML processor.
Why don’t leave entities unresolved? • In case result is a set of (parts of)matching documents • This result DTD must include all different entity declarations of the original document • Definition of the entity might differ from document to document • So, for the same entity name, entities are renamed, and the entity references are changed accordingly.
problems of external entities • These entities can change without the database system knowing about this • Thus, the values of external entities must not be included in indexes • Example: <!ENTITY &mysubject SYSTEM “http://www.softwareag.com/hottopic.xml”> ... <ticker>Todays hot topic: &mysubject</ticker> • Checking the current contents of the external entity lead to unacceptable response times.
Relational Databases and XML • major (object-) relation database systems include some forms of XML support • The simplest form is to generate XML documents for existing relational data. • But, real database handling of XML requires that XML data can be stored and retrieved • Two approaches
XML support approach(1) • Map the XML document is to relational tables and their columns • Markup is ignored on storage, and reconstructed on retrieval • advantage of this approach: • the contents of an XML document can be handled with traditional SQL
XML support approach(1) cont’ • Shortcomings: • The sequence information lost <Order CustomerId=”567” Date=”12- 12-2000”> <Item ProductID=” 17” Quantity=”2”/> <Item ProductID=”l6” Quantity=”9”/> <Item ProductID=“ 19 ” Quantity=“8”/> </Order> The retrieval of the order: <Order CustomerId=”567” Date=”12-12-2000”> <Item ProductID=” 16” Quantity=”9’/> <Item ProductID=” 17” Quantity=”2”/> <Item ProductID=” 19” Quantity=”8”/> </Order>
XML support approach(1) cont’ • Data-centric documents sequence might not matter, it does for document-centric • this approach loses all comments and processing instructions • mixed content cannot be stored easily in this model
XML support approach(2) • Leaves the XML document intact and stores it in a large text field (“BLOB”) • Or even outside the database • Text search is possible • Can limit a certain text-based condition
XML support approach(2) cont’ • Limitations: • no structure-aware combinations are possible • Value-based search is not supported on these text fields • IBM solution: side tables • But, direct manipulation of side tables destroys the consistency of the database • Security can be defined on document level only, but not on elements orattributes
Summary • Tamino was designed with particular attention to the XML • Schema handling for XML is different from relational databases does • In Schema handling, external entities cause conceptual problems • value-based indexes are useful for XML, as well as text index and structural index • Comments and processing instructions should be preserved when documents are stored • The result of a query against an XML database should be XML
Q&A Thanks!