530 likes | 764 Views
XML. Extensible Markup Language. XML. Metalanguage A Language, which describes languages Languages describe formats for data exchange. Example. Hans Meyer Lohmannstrasse 23 06366 Köthen Dr. Else Müller Bernburger Strasse 56 06366 Köthen. <Patient> <Name> <Strasse> <Ort>
E N D
XML Extensible Markup Language
XML • Metalanguage • A Language, which describes languages • Languages describe formats for data exchange
Example Hans Meyer Lohmannstrasse 23 06366 Köthen Dr. Else Müller Bernburger Strasse 56 06366 Köthen
<Patient> <Name> <Strasse> <Ort> </Patient> <Arzt> <Name> <Strasse> <Ort> </Arzt> </Name> </Strasse> </Ort> </Name> </Strasse> </Ort> Hans Meyer Lohmannstrasse 23 06366 Köthen Dr. Else Müller Bernburger Strasse 56 06366 Köthen Example
Structure of XML documents • Prolog • Deklaration of type of dokument • DTD (Document Type Definition) • Elements http://www.w3schools.com/xml/default.asp http://de.selfhtml.org/
Document Type DefinitionDTD • It describes the grammar of a XML - document • It describes permitted elements and attributes • their data type and range of values • their nesting • An XML – Dokument, that conforms to a DTD is called valid
Example DTD <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Personen [ <!ELEMENT Personen (Patient)> <!ELEMENT Patient (#PCDATA)> ]> <Personen> <Patient> Hans Meyer Lohmannstrasse 23 06366 Köthen </Patient> </Personen> http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten1.xml
Structure of XML documents • DTD describes the characteristics of the elements • Elements are initiated by a start tag <Elementname> and are terminated by a closing tag </Elementname>. • XML tags are case sensitive • Elements can contain Elements. • #PCDATA Parsed character data: The elements consist of character strings whose characters are part of the defined character set.
Names of Elements • Names can contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML or Xml ..) • Names cannot contain spaces
Sequence of Elements Subordinate elements are separated in the declaration by commas and included in parentheses. Example: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Personen [ <!ELEMENT Personen (Patient,Arzt)> <!ELEMENT Patient (Name,Adresse)> <!ELEMENT Arzt (Name, Adresse)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Adresse (#PCDATA)> ]> http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten2.xml http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten3.xml
selection list • Selection of exactly one element: The available elements are seperated by | • Example: <!DOCTYPE Personen [ <!ELEMENT Personen (Patient|Arzt)> <!ELEMENT Patient (Name,Adresse,Diagnose)> <!ELEMENT Arzt (Name, Adresse,Fachgebiet)> http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten4.xml
Multiple occurrence * The element can appear no time or arbitrarily often + The element can appear at least one time or arbitrarily often ? The element can appear no time or at most one time
Worzyk FH Anhalt Attributes <!ATTLIST element-name attribute-name attribute-type default-value> Types of attriutes:: CDATA, (en1|en2|..), ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, ENTITIES, NOTATION, xml: Defaultvalue: value #REQUIRED, #IMPLIED, #FIXED value http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Patienten5.xml http://www.w3schools.com/xml/xml_attributes.asp Datenbanksysteme 2 SS 2004 Seite 13 - 13
Comments Comments are embedded by <!– and --> <!-- This is a comment -->
Well-formed XML - File • The file starts with the XML-declaration, which establish the reference to XML • It exists at least one data element • It exists exactly one root element, which contain all other data elements • All required attributes are defined • All elements have the right content • The elements must be nested properly
Valide XML - File • The file is well-formed • A DTD is assigned to the file • The content of the file is according to the assigned DTD
Parser A parser validates if an XML Document is valide: <html> <body> <script type="text/javascript"> var xmlDoc = new ActiveXObject("Microsoft.XMLDOM") xmlDoc.async="false" xmlDoc.validateOnParse="true" xmlDoc.load("Patienten5.xml") document.write("<br />Error Code: ") document.write(xmlDoc.parseError.errorCode) document.write("<br />Error Reason: ") document.write(xmlDoc.parseError.reason) document.write("<br />Error Line: ") document.write(xmlDoc.parseError.line) </script> </body> </html> http://www.inf.hs-anhalt.de/~Worzyk/Telemedizin/Beispiele/Parser.htm
DTD - Disadvantages • Few datatypes • specification not in XML – Syntax • Specification can not be validated with a parser
XML - Schema • An XML Schema: • defines elements that can appear in a document • defines attributes that can appear in a document • defines which elements are child elements • defines the order of child elements • defines the number of child elements • defines whether an element is empty or can include text • defines data types for elements and attributes • defines default and fixed values for elements and attributes http://www.w3schools.com/schema/schema_intro.asp
XML SchemaAdvantages over DTD • XML Schemas are extensible to future additions • XML Schemas are richer and more useful than DTDs • XML Schemas are written in XML • XML Schemas support data types • xs;date, xs;dateTime, xs:string • XML Schemas support namespaces • xmlns:xs="http://www.w3.org/2001/XMLSchema“
Dublin Core Standard Dublin Core Metadata Initiative Conference in 1995 in Dublin / Ohio defined a set of describing attributs to categorize documents in the internet 15 core elements are recommended in „Dublin Core Metadata Element Set, Version 1.1 (ISO 15836)“ http://dublincore.org/documents/dces/
How to create an XML structure • Create a tree-structure of the data • Convert that structure to a DTD • Add data elements • Test
ExampleQuarterly billing • One file consists of exactly one physician and at least one patient • A phyiscian is either a General Practitioner or a dentist • A general practitioner has an address and a profession • A dentist has an address • A patient has an address and no ore more diagnisis • An address consists of Name, City, Street • A name has a salutation Mr. or Ms.
ExampleQuarterly billing billing + Physician Patient | * General Practitioner Dentist Address Diagnosis Profession ? Address Adresse Name City Street Mr Ms
Example - DTD <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE Billing [ <!ELEMENT Billing (Physician, Patient+)> <!ELEMENT Physician (General_Practitioner | Dentist)> <!ELEMENT General_Practitioner (Address, Profession?)> <!ELEMENT Dentist (Address)> <!ELEMENT Patient (Address, Diagnosis*)> <!ELEMENT Address (Name, City, Street)> <!ELEMENT Profession (#PCDATA)> <!ELEMENT Diagnosis (#PCDATA)> <!ELEMENT Name (#PCDATA)> <!ELEMENT City (#PCDATA)> <!ELEMENT Street (#PCDATA)> <!ATTLIST Name Salutation (Mr|Ms) "Ms"> ]>
Example - Data < Billing > < Physician > < General_Practitioner > <Address> <Name>Dr. Erpel</Name> <City>Entenhausen</City> <Street>Am Krankenhaus 1</Street> </Address> < Profession >Geriatrics</ Profession > </ General_Practitioner > </ Physician > < Patient > <Address> <Name Anrede="Herr">Daniel</Name> <City>Entenhausen</City> <Street>Bahnhofstrasse 3a</Street> </Address> <Diagnose>Bettflucht</Diagnose> </Patient> <Patient> <Address> <Name>Daisy</Name> <City>Entenhausen</City> <Street>Am Stadtpark</Street> </Address> <Diagnosis>Sonnenbrand</Diagnosis> <Diagnosis>Migräne</Diagnosis> </Patient> </ Billing >
Queries to XML - Files • XPath • XQuery
XPath The language XPath serves to address parts of a XML document. It was designed for the use both in XSLT and in XPointer. XPath models a XML document as a tree, which consists of knots. http://www.informatik.hu-berlin.de/~obecker/obqo/w3c-trans/xpath-de-20010702/
Example <?xml version="1.0" encoding="ISO-8859-1"?><bookstore><book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price></book><book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price></book><book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price></book><book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price></book></bookstore>
Queries with XPath Select all titles: /bookstore/book/title Select the title of the first book /bookstore/book[1]/title Select all the prices /bookstore/book/price/text() Select price nodes with price>35 /bookstore/book[price>35]/title http://www.w3schools.com/xpath/xpath_examples.asp
XQuery • Querylanguage for XML data • Uses Xpath expression • Analogy to SQL
Xquery Example <?xml version="1.0" encoding="ISO-8859-1"?> <bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year="1999"> <title>The Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> </bib>
Xquery Example Query: doc("books.xml")/bib/book[price<50] results: <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book>
FLWOR For, Let, Where, Order by, Return for $x in doc("books.xml")/bib/book where $x/price>50 order by $x/title return $x/title Results: <title>Advanced Programming in the Unix environment</title> <title>TCP/IP Illustrated</title> <title>The Technology and Content for Digital TV</title>
XML – Documents in Databases XML – Documents can be • Focussed on data • Focussed on text • Semi-structured
Alternatives to store XML Documents • Storage as a whole • Storage within the XML-Structure • Transformation to structures of the database
Storage of XML documents as a whole Original will be stored in a file system or as CLOB in a database full-text index Strukturindex
Example <hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung> </hotel>
full-text index <hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung> </hotel>
full-text - andStructurindex <hotel url=“http://www.hotel-huebner.de“ id=“h0001“ erstellt-am=“03/02/2003“ Autor=“Hans Müller“> <hotelname>Hotel Hübner</hotelname> <kategorie>4</kategorie> <adresse> <plz>18199</plz> <ort>Warnemünde</ort> <strasse>Seestraße</strasse> </adresse> <telefon>0381 / 5434-0</telefon> <fax> 0381 / 5434-444</fax> <anreisebeschreibung>Aus Richtung Rostock kommend ... </anreisebeschreibung> </hotel>
Queries Volltextindex hotel AND warnemünde (hotel OR pension) AND (rostock OR warnemünde) Volletxt- und Strukturindex hotel.adresse.ort CONTAINS (“warnemünde“) AND hotel.freizeitmoeglichkeit CONTAINS (“swimming pool“)
generic storage Storage within the XML-Structure All Informationen of the XML-Dokument will be stored • simple generic Storage • Document Object Model
Document Object Model The structure of the tree will be transformed to a class hierarchy Storage in objectrelational or objektoriented databases
Queries • XPath • QXuery • XQL • Abfragesprache der Software AG • SQL
Transformation to Structures of databases DTD or Schema must be available Automatic or userdriven procedures Transformtion to relational objectrelational objectoriented Databases