420 likes | 531 Views
Comparing RDMS to XML. Why Store Data in XML? VS RDBMS.
E N D
Why Store Data in XML? VS RDBMS • Today's businesses continually accumulate data, whether it's customer information, production information, or any of thousands of other categories of information. This data needs to be stored some place, and better yet, it needs to be stored in a relational format for easy retrieval and integration with other data. • RDBMS systems such Oracle, Microsoft SQL Server, MySQL, PostgreSQL, and many others are used to perform such tasks • when security, speed, and efficiency pose concerns. • RDBMS systems offer a great solution whenever internal or company-developed applications require access to the data. • The potential problem arises either when one application needs to access data of another application but does not have direct access to that application's database system, or when data need to be ported between different database systems (especially true if the systems are from different vendors). • RDBMSs store their data in binary formats that are not directly accessible by other databases. Any external application wishing to interact with this data needs to utilize a database Application Programming Interface (API) provided by the RDBMS specifically for that product or a generalized interface, such as the MS SQL engine
What we will cover • XML from a Data Practitioner’s standpoint • Relational databases and schemas • A comparison of relational and XML architecture • What each technology is good at – and when to use it. • Relational metadata and XML metadata • How to navigate a relational schema vs. an XML schema • Gathering data (queries) in a database vs. an XML document • Differences in flexibility between the two • Cost considerations • Data integrity considerations • So…what do we do?
XML: A Data Practitioner’s View (1) • XML is a specification for designing tag-based languages. • The specification allows for: • Metadata (XML Schemas) that define • Valid data structures • Defining of user data types • Valid lists of values, ranges, and patterns • Optionality/cardinality • Elements and Attributes • Reusability of data, data types, and schemas • Creation of instance documents based on an XML Schema
XML: A Data Practitioner’s View (2) • The potential exists for: • Industries to agree on an XML-based language for data exchange. • Exchange of XML instance documents between trading partners • An entire industry has grown up around: • Providing XML tools (and repositories) • Ongoing development of standards • XML has gained very wide acceptance!
XML Content • Here is an easy-to-understand sample of XML: <?xml version = "1.0" encoding = "UTF-8"?> <DataTransmission xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation = "PatientSearchResponse.xsd" Source = "CentralPatient" Target = "Store042" MsgTypeCode = "PSRS" MsgTypeDesc = "PatientSearchResponse" FoundFlag = "true"> <PatientList> <PatientIdFields> <FirstName>Maria</FirstName> <LastName>Montes</LastName> <Birthdate>1951-11-05</Birthdate> <Gender>F</Gender> <StreetAddress>1969 Ygnacio Valley Road</StreetAddress> <CityAddress>Walnut Creek</CityAddress> <StateCode>CA</StateCode> <ZipCode>94597</ZipCode> <PatientID>109993345</PatientID> </PatientIdFields> </PatientList></DataTransmission>
The XML Document(1) • An XML document is: • A self-contained set of structured (“tagged”) data • Decoupled from the source systems • Tagged data in the document does not change when the source system data changes • The document represents a snapshot of the source system data at a particular moment in time. • Packaged with the metadata (tags) for a specific business transaction
The XML Document (2) • An XML Document has: • Semantic content • Values (tagged data) • Metadata (the tags) • Order • Hierarchy • Structure
The XML Schema • An XML document is usually (but not always) validated by an XML Schema. • The XML Schema provides the information on whether the XML document “followed the rules” set up in the XML Schema. • An XML Schema is an agreement between the sender and the receiver of a document as to the structure of that document.
Elements vs. Attributes • Elements: • Basic building blocks of XML • Contain content which can be a structure • Attributes • Specify additional information about an element. • Contain only simple type content • Some data could be either an Element or an Attribute (so you need standards on how to decide which to use).
Element & Attribute in XML Schema • Element and Attribute declaration: <xsd:element name = "DataTransmission"> <xsd:complexType> <xsd:sequence> <xsd:element ref = "FirstName" minOccurs = "0"/> <xsd:element ref = "LastName" minOccurs = "0"/> <xsd:element ref = "Phone" minOccurs = "0"/> <xsd:element ref = "Birthdate" minOccurs = "0"/> <xsd:element ref = "Gender" minOccurs = "0"/> <xsd:element ref = "StreetAddress" minOccurs = "0"/> <xsd:element ref = "CityAddress" minOccurs = "0"/> <xsd:element ref = "StateCode" minOccurs = "0"/> <xsd:element ref = "ZipCode" minOccurs = "0"/> <xsd:element ref = "SSN" minOccurs = "0"/> <xsd:element name = "SafetyCapDate" type = "xsd:date"/> </xsd:sequence><xsd:attribute name = "Source" use = "required" type = "xsd:string"/> <xsd:attribute name = "Target" use = "required" type = "xsd:string"/> <xsd:attribute name = "MsgTypeCode" use = "required" type = "MsgTypeCodeType"/> <xsd:attribute name = "MsgTypeDesc" use = "required" type = "xsd:string"/> </xsd:complexType></xsd:element>
Element and Attribute XML Document • Element and Attribute content: <DataTransmission xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation = "PatientSearchRequest.xsd" Source = "Store599" Target = "CentralPatient" MsgTypeCode = "PSRQ" MsgTypeDesc = "PatientSearchRequest"> <FirstName>Maria</FirstName> <LastName>Montes</LastName> <Birthdate>1951-11-05</Birthdate> <Gender>F</Gender> <StreetAddress>1969 Ygnacio Valley Road</StreetAddress> <CityAddress>Walnut Creek</CityAddress> <StateCode>CA</StateCode> <ZipCode>94597</ZipCode> <SSN>561-88-9208</SSN> <SafetyCapDate>2001-05-22</SafetyCapDate> </DataTransmission>
Simple data types in an XML Schema • Comes with “atomic” simple data types • Integer, boolean, date, decimal, string, etc. • You can build user-defined simple data types • Built on the included “atomic” data types • Allows declaration of • valid values, ranges, Patterns, Length, total digits • And more… • Attributes or Elements can be of a simple data type (either atomic or user-defined).
Simple data type examples <xsd:simpleType name = "SevenPlaceInteger"> <xsd:restriction base = "xsd:integer"> builds on atomic simple data type <xsd:totalDigits value = "7"/> </xsd:restriction></xsd:simpleType><xsd:simpleType name = "GenderType"> <xsd:restriction base = "xsd:string"> <xsd:enumeration value = "M"/> <xsd:enumeration value = "F"/> <xsd:length value = "1"/> </xsd:restriction></xsd:simpleType><xsd:simpleType name = "RelationshipCodeType"> <xsd:restriction base = "xsd:string"> <xsd:enumeration value = "self"/> <xsd:enumeration value = "spouse"/> <xsd:enumeration value = "dependent"/> <xsd:enumeration value = "other"/> </xsd:restriction></xsd:simpleType> <xsd:simpleType name = "SevenPlacePositiveInteger"> builds on custom simple data type <xsd:restriction base = "SevenPlaceInteger"> <xsd:minInclusive value = "0"/> </xsd:restriction></xsd:simpleType>
Complex data types in XML Schema • Builds a structure of Elements. • Each subelement is either a simple data type or another structure of Elements. • Only Elements can be of a complex data type. • Can be named and reusable or anonymous and used only by a single Element. • Can be an extension or restriction of another complex type.
Complex data type examples <xsd:complexType name = "AddressType"> declaration of named complex data type <xsd:sequence> <xsd:element ref = "StreetAddress"/> <xsd:element ref = "CityAddress"/> <xsd:element ref = "StateCode"/> </xsd:sequence></xsd:complexType><xsd:element name = "WorkAddress" type = "AddressType"/> association of Element with named complex data type<xsd:complexType name = "AddressWithCountryType"> new complex data type extends existing complex data type <xsd:complexContent> <xsd:extension base = "AddressType"> <xsd:sequence> <xsd:element name = "CountryCode" type = "xsd:string"/> </xsd:sequence> </xsd:extension> </xsd:complexContent></xsd:complexType><xsd:element name = "PatientInsurance"> element with anonymous complex data type <xsd:complexType> <xsd:sequence> <xsd:element ref = "Patient"/> <xsd:element ref = "TPMembership" minOccurs = "0" maxOccurs = "unbounded"/> </xsd:sequence> </xsd:complexType></xsd:element>
Using the XML Schema Source database Target database Data Data Extract program Parse program XML Document Network XML Schema XML Schema
Reusing XML Schemas Statecode.xsd • XML Schemas can build on each other to provide reusability. Base Definitions.xsd <xsd:include schemaLocation = "StateCodes.xsd"/> Patient Search Request.xsd Patient Search Response.xsd Patient Update Request.xsd <xsd:include schemaLocation = "BaseDefinitions.xsd"/> <xsd:include schemaLocation = "BaseDefinitions.xsd"/> <xsd:include schemaLocation = "BaseDefinitions.xsd"/>
An XML Schema example <xsd:element name = "Patient"> <xsd:complexType> <xsd:sequence> <xsd:element ref = "PatientID"/> <xsd:element ref = "FirstName"/> <xsd:element ref = "LastName"/> <xsd:element ref = "Birthdate"/> <xsd:element ref = "Gender"/> <xsd:element name = "PrivacyInd" type = "xsd:boolean"/> <xsd:element ref = "StreetAddress"/> <xsd:element ref = "CityAddress"/> <xsd:element ref = "StateCode"/> <xsd:element ref = "ZipCode"/> <xsd:element ref = "EMailAddress" minOccurs = "0"/> <xsd:element ref = "SSN" minOccurs = "0"/> <xsd:element name = "HIPAANotifInd" type = "xsd:boolean"/> <xsd:element name = "SafetyCapInd" type = "xsd:boolean"/> <xsd:element name = "SafetyCapDate" type = "xsd:date"/> <xsd:element ref = "StatusCode"/> <xsd:element ref = "Doctor" minOccurs = "0"/> <xsd:element ref = "Phone" maxOccurs = "unbounded"/> <xsd:element ref = "PatDrugAllergy" minOccurs = "0" maxOccurs = "unbounded"/> <xsd:element ref = "OtherDrugTaken" minOccurs = "0" maxOccurs = "unbounded"/> </xsd:sequence> <xsd:attribute name = "PrivacyInd" type = "xsd:boolean"/> </xsd:complexType></xsd:element>
The Structure of an XML Schema • Elements in an XML Schema are hierarchical. • To expand the hierarchy with this tool (Tibco’s XML Authority), click here.
Expanding an Element This is the result you get – you can now see the elements that make up the structure of the the OtherDrugTaken element.
Managing XML Schemas • Avoid chaos by managing XML metadata across the Enterprise: • Create reusable base definition schemas • Create and document: • Widely used elements (with their attributes) • Complex data types • Simple data types • Keep track of which schemas use other schemas • Keep track of which documents are validated by which schemas (XML repository).
XML Architecture • XML is hierarchical: This hierarchy is useful for starting with a patient and finding all the information about them – such as a list of their prescriptions and when the prescriptions were filled. Patient Insurance Prescription Doctor Drug Fills Doctor Drug Claim
XML Hierarchy Revisited • The Hierarchy can change depending on what the XML document is used for. Prescription Each version of the hierarchy serves a different purpose Drug Fills Doctor Patient This version of the hierarchy is useful for starting with a Prescription and finding all the information about it, including the Patient and Drug. Drug Claim
Database Architecture • Database architecture is relational: • Normalized to eliminate data redundancy • Join on any two columns that have the same data type. • Foreign keys can enforce data integrity
Relational Metadata – the Schema • Relational metadata is stored in the database • Database control tables fully define the structure of the database. • Without the DBMS metadata the contents of the database are worthless. • Completely self-contained (not reusable) • Tables are structured, each column is a “bucket” for a specific kind of data • In most databases, the metadata does not include descriptions, so a Data Dictionary is necessary.
XML Metadata – the Document • Metadata built into the document • Every element has a tag to tell you where the data is stored in the document. • Descriptive tags give structure to the document and tell you what the data means (sort of). • “Sort of” because it only tells the tag name, so this only has meaning to someone who already understands what the element or attribute means. • Document cannot be parsed for storage on its own. What else is needed?…
XML Metadata – the Schema • An XML Schema (or DTD) is needed to: • Provide standardization (basis of agreement) • Allow meaningful parsing and data storage • Specify agreement on document structure • A data dictionary is still necessary to provide definition for Elements and Attributes • Without an XML Schema, a document is essentially only good for transmitting blocks of data for humans to read.
Database Architecture XML Architecture • Data Model • Cardinality • Optionality • Data Type • Describes Relationships • Has Metadata • Generate DDL • XML Schema • Cardinality • Optionality • Data Type • Describes Relationships • Has Metadata • Valid XML • Sequence of Elements Final Result: Database Final Result: Valid XML Document Comparing XML to RDMS Metadata • An XML Schema establishes the valid structure of an XML document, like a database schema establishes the valid structure of a database.
What are relational databases good at? • Data Storage • General purpose data storage and retrieval • Used for many purposes, such as queries and analysis • Generalized view of data for shared use • Ideally shared across business units or the Enterprise • Works well to store the contents of an XML document.
What is XML good at? • Data Exchange • Exchange of data in a document • Usually designed for a specific communication • Works well to move data between databases • Important when source or target database is outside your firewall. • Usually don’t have direct access to such databases. • Works well (with style sheets or XSLT) to display data on the web because browsers inherently are designed to display documents.
RDBMS Store data Query data Mine data Create generalized reports XML Transmit data Exchange data with outside agencies Replace flat files Create specific reports When do I use each technology? • Which one? • Use them together • Each for its own purpose • Build an infrastructure that: • Creates XML document (using XML schemas) from database contents. • Parses XML documents to store their contents in a database
Relationships: RDBMS • RDBMS • Explicit table to table: if not declared by a foreign key, it doesn’t exist.
Relationships: XML <Patient> <PatientID>10988453</PatientID> <FirstName>Maria</FirstName> <LastName>Montes</LastName> <Birthdate>1951-11-05</Birthdate> <Gender>F</Gender> <PrivacyInd>false</PrivacyInd> <StreetAddress>1969 Ygnacio Valley Road</StreetAddress> <CityAddress>Walnut Creek</CityAddress> <StateCode>CA</StateCode> <ZipCode>94597</ZipCode> <HIPAANotifInd>true</HIPAANotifInd> <SafetyCapInd>false</SafetyCapInd> <SafetyCapDate>2000-01-01</SafetyCapDate> <StatusCode>A</StatusCode> <Phone> <PhoneID>45569009</PhoneID> <AreaCode>925</AreaCode> <PhoneNumber>5556964</PhoneNumber> <PhoneTypeCode>RS</PhoneTypeCode> </Phone> <PatDrugAllergy> <PatDrugAllergyID>43325564</PatDrugAllergyID> <InformedDate>2001-01-01</InformedDate> <DrugID>432678945</DrugID> </PatDrugAllergy></Patient> • XML • Implied positioning: • Part-of relationship implied by positioning • Parallel elements have same relationships as sibling elements
Querying: XML vs. RDBMS • Relational • Easy to build queries by: • Navigating the database joins … or … • Creating joins on any pair of tables that share matching columns • XML • Difficult to build queries because: • Must stick to the structure of the document • Pointer-based navigation is restrictive
Navigating in XML for Queries • Must follow the structure of the document. • Example: DOM Node navigation: • getFirstChild • getLastChild • getPreviousSibling • getNextSibling • getParentNode
Structure Flexibility : Relational vs. XML • Relational: Rigid structure • Lots of work to change the structure because of foreign keys, Views, Stored procs, and triggers • Plus programs that assume the database looks a certain way! • XML: Flexible structure • Easy to change the structure of documents to account for new data, data type changes • Everything is text so it is easy to implement • Or is it?
XML Flexibility • Changing the structure of an XML document is easy. • Changing the XML Schema is easy too, but have you considered that you must: • Get agreement from business partners • Change the programs that create documents from the database and parse documents for the database. • Version your XML Schemas if you want to be able to open and parse historical documents (do you keep them?).
Data Integrity: Relational vs. XML • Both provide ways to control data types, ranges, patterns, valid values, and optionality. • But XML does not provide data validation via look-up as relational does with foreign keys. • However, this lack does make the XML Schema a less rigid structure than a database schema.
Cost Considerations • People always ask – which is cheaper to implement? • XML is essentially all text, so it is easy to create, change and manage. • XML doesn’t need the infrastructure that an RDBMS does. • XML tools are cheap ($99!) • But does this cost difference really matter? • The two technologies do different things. • You can’t use one as replacement of the other. • If you need it, you need it. • So, XML is cheaper – and it doesn’t really matter.
So what do we do? • Do you remember the previous slide? • The two technologies do different things • They complement each other: • RDBMS stores data in a generally usable way • XML exchanges data in a specific way is easy to display on the web. • They both present challenges for metadata management. • So…we use them both for the appropriate purpose.