1.68k likes | 1.89k Views
XML Workshop. XML – Standardformat für den Austausch von elektronischen Daten in der pharmazeutischen Industrie?. Joerg Dillert Senior Consultant March, 30th, 2004. 0. Allgemeines. Der Workshop …. ist in Germisch!. Ein paar Regeln. 9.00 – 16.30 Pausen 15,60,15
E N D
XML Workshop XML – Standardformat für den Austausch von elektronischen Daten in der pharmazeutischen Industrie? Joerg Dillert Senior Consultant March, 30th, 2004
Der Workshop … • ist in Germisch!
Ein paar Regeln • 9.00 – 16.30 • Pausen 15,60,15 • Handys aus oder Vibration! • Toiletten • Fluchtwege • Fragen - bitte jederzeit
1. Einführung Was ist eigentlich XML? Wie ist es entstanden?
Handys, Smartphones und PDAs mit integrierter SyncML-Unterstützung Modell Anbieter Gerätetyp Verfügbarkeit • Alcatel: ot715 • Motorola: A830, A835, V600, E390 • Nokia 7250, 6800, 3650, 6220, 9210i, 7650 • Samsung SGH-D700 • Siemens: S55, SL55, M55, SX1 • Sony Ericsson: T68i, T610, P800, Z1010, PEG-NZ90 • PDAs: Sony PEG-NX70V, PEG-T675C,PEG-T625C
Wir leben im Zeitalter der Buzzwords • B2B, B2M, E2B • DIA, EMEA, FDA • XML, DTD, XSL, SVG • Die (Computer) Industrie gibt uns viele neue Wörter jede Woche • Schauen Sie mal an Ihren Arbeitsplatz – welches sind denn so Ihre Buzzwords? (SOPs, DCFs, …)
Smudo ... kennen Sie diese Deutsche Musikgruppe? MFG
Urkundlich erwähnt … SGML Standard Generalized Markup Language ISO 88791 seit 1986
The SGML family of markup languages – more buzzwords!! GML Generalized Markup Language Goldfarb, Mosher and Lorie, IBM, 1969 IBM Document Composition Facility DCF (Script) SGML Standardized Generalized Markup Language Content attributes. ISO-8879 first published in 1986 HTML HyperText Markup Language Functional attributes: hyperlink, frame Based on hyperdocument standard definitions CALS Continuous Acquisition and Life-cycle Support Based on DoD MIL-M-28001B standard definitions XML eXtensible Markup Language (Founding father: Dr. Charles F. Goldfarb, IBM)
1986 • Entwicklung SGML in den IBM Labs in Almaden • Charles Goldfarb • ISO Standard • Überarbeitung 1990, Ziel war eine universell einsetzbare Auszeichnungssprache für Dokumente
A brief history of SGML The Evolution of Markup Languages • Plain text • Font attributes: Bold, underline, italics, font size • Document structure attributes: Heading level, index term • Document content attributes: Patient age, dosage unit
1990 • Am Kernforschungszentrum Cern in Genf begann Entwicklung von HTML • erster Entwurf 1993, Geburtstunde des Web • 1995 überarbeitet HTML Version 2.0
1994 • Um Wildwuchs zu verhindern – Gründung des World Wide Web Consortium (W3C) • primäre Aufgabe: Weiterentwicklung von HTML
1998 • W3C erkannte, daß mit HTML die Herausforderungen der Zukunft nicht gemeistert werden können • Zwischen Zuviel an Markup (SGML) und dem Zuwenig (HTML) sollte der goldene Mittelweg gefunden werden • Abschluß des Findungsprozesses – XML • 1998 als offizieller Standard verabschiedet
2001 • W3C verabschiedet als wichtigste Ergänzung die erste Version von XSL (Extensible Stylesheet Language) • stellt Regeln zur Umwandlung von XML Dokumenten und ein Vokabular zum Formatieren dieser Dokumente zur Verfügung • 2002 Arbeitsentwurf zu XHTML Version 2.0 , Bruch mit HTML 4.0 und XHMTL 1.0 – keine Rückwärtskompatibilität
The big advantage of XML • You have flexibility - you can define your own TAGS • The Parser need only the DTD / Schemas for checking the correctness of your file • Readable for everyone • Vendor independent (No vendor can impose their own definitions, standards or undocumented formats) • License free ... all three types are in ASCII format! (American Standard Code for Information Interchange )
XML and data interchange • This kind of information data interchange is the standard in other industries and is called B2B • The Germans favourite spare time object ... • is produced Just in Time
B2B Server B2B Server What we‘ve learned from other industries ... Supplier System Y Car producer System X Request Order Delivery Proposal Order confirmation Delivery confirmation Invoicing
XML is ... • The data interchange and document format for now and in the future (E2B, CDISC, CTD) • Is in practical use in many industries • E.g. car production industry • EVERY system can communicate with another • You need only ONE interface per system
2. XML eXtensible Markup Language
XML • XML - eXtensible Markup Language • It is a subset of SGML • It focusses on content (sometimes also structure) • The XML file contains the DATA • It is restricted by TAGs • Example: <messagetype>ICSR</messagetype>
... and as XML structure <ANA106> <Screening visit> <Inclusion criteria> <Inc1>YES</Inc1> <Inc2>YES</Inc2> <Inc3>YES</Inc3> <Inc4>YES</Inc4> </Inclusion criteria> <Exclusion criteria> <Excl1>NO</Excl1> <Excl2>NO</Excl2> <Excl3>NO</Excl3 <Excl4>NO</Excl4> </Exclusion criteria> <Demographic/Investigator> <Sex>Male</Sex> <DoB>07/26/1966</DoB> <Smoke>Yes</Smoke> <InvNo>128</InvNo> </Demographics/Investigator> ... --- more page(sections> </Screnning visit> <Visit 1> ... --- more blocks </Visit 1> ... --- more visits </ANA106>
einfache XML Struktur <?xml version="1.0" encoding="ISO-8859-1"?> <DVMDTagung> <Workshop> <event > <stadt>Ulm</stadt> <ort>MedSchule</ort> </event> </Workshop> </ DVMDTagung >
Attribute <?xml version="1.0" encoding="ISO-8859-1"?> <DVMDTagung> <Workshop name=„XML in der Pharmazeutischen Industrie" Leiter=„Joerg Dillert"> <event datum="31.03.2004"> <stadt>Ulm</stadt> <ort>MedSchule</ort> </event> <event datum=„25.06.2004"> <stadt>Berlin</stadt> <ort>PFOffice</ort> </event> </Workshop> </ DVMDTagung >
Attribute <?xml version="1.0" encoding="ISO-8859-1"?> <!– zum Kommentieren --> <DVMDTagung> <Workshop name="XML in der Pharmazeutischen Industrie" Leiter=„Joerg Dillert"> <event datum="31.03.2004"> <stadt>Ulm</stadt> <ort>MedSchule</ort> </event> <event datum="25.06.2004"> <stadt>Berlin</stadt> <ort>PFOffice</ort> </event> </Workshop> </DVMDTagung > see in IE see in XML Notepad
Characters • Character set • Characters that may be represented in XML document • e.g., ASCII character set • Letters of English alphabet • Digits (0-9) • Punctuation characters, such as !, - and ?
Character Set • XML documents may contain • Carriage returns • Line feeds • Unicode characters • Enables computers to process characters for several languages
Characters vs. Markup • XML must differentiate between • Markup text • Enclosed in angle brackets (< and >) • e.g,. Child elements • Character data • Text between start tag and end tag • e.g., Fig. 5.1, line 7: Welcome to XML!
White Space, Entity References and Built-in Entities • Whitespace characters • Spaces, tabs, line feeds and carriage returns • Significant (preserved by application) • Insignificant (not preserved by application) • Normalization • Whitespace collapsed into single whitespace character • Sometimes whitespace removed entirely <markup>This is character data</markup> after normalization, becomes <markup>This is character data</markup>
White Space, Entity References and Built-in Entities (cont.) • XML-reserved characters • Ampersand (&) • Left-angle bracket (<) • Right-angle bracket (>) • Apostrophe (’) • Double quote (”) • Entity references • Allow to use XML-reserved characters • Begin with ampersand (&) and end with semicolon (;) • Prevents from misinterpreting character data as markup
White Space, Entity References and Built-in Entities (cont.) • Build-in entities • Ampersand (&) • Left-angle bracket (<) • Right-angle bracket (>) • Apostrophe (') • Quotation mark (") • Mark up characters “<>&” in element message <message><>&</message> see in IE
Using Unicode in an XML Document • XML Unicode support • e.g., displays Arabic words • Arabic characters • represented by entity references for Unicode characters
XML document that contains Arabic words <?xml version = "1.0"?> <welcome> <from> دايت َلأند </from> <subject> أهلاً بكم فيِ عالم </subject> </welcome> see in IE
Markup • XML element markup • Consists of • Start tag • Content • End tag • All elements must have corresponding end tag<img src =“img.gif”>is correct in HTML, but not XML • XML requires end tag or forward slash (/) for termination <img src =“img.gif”></img>or <img src =“img.gif”/>is correct XML syntax
Markup (cont.) • Elements • Define structure • May (or may not) contain content • Child elements, character data, etc. • Attributes • Describe elements • Elements may have associated attributes • Placed within element’s start tag • Values are enclosed in quotes • Element car contains attribute doors, which has value “4” <car doors =“4”/>
Markup (cont.) • Processing instruction (PI) • Passed to application using XML document • Provides application-specific document information • Delimited by <? and ?>
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 5.5 : usage.xml --> 4 <!-- Usage of elements and attributes --> 5 6 <?xml:stylesheet type = "text/xsl"href = "usage.xsl"?> 7 8 <book isbn = "999-99999-9-X"> 9 <title>Deitel&s XML Primer</title> 10 11 <author> 12 <firstName>Paul</firstName> 13 <lastName>Deitel</lastName> 14 </author> 15 16 <chapters> 17 <preface num = "1" pages = "2">Welcome</preface> 18 <chapter num = "1" pages = "4">Easy XML</chapter> 19 <chapter num = "2" pages = "2">XML Elements?</chapter> 20 <appendix num = "1" pages = "9">Entities</appendix> 21 </chapters> 22 23 <media type = "CD"/> 24 </book> PI discussed later
CDATA Sections • CDATA sections • May contain text, reserved characters and whitespace • Reserved characters need not be replaced by entity references • Not processed by XML parser • Commonly used for scripting code (e.g., JavaScript) • Begin with <![CDATA[ • Terminate with ]]> see in IE
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 5.7 : cdata.xml --> 4 <!-- CDATA section containing C++ code --> 5 6 <book title = "C++ How to Program" edition = "3"> 7 8 <sample> 9 // C++ comment 10 if ( this->getX() < 5 && value[ 0 ] != 3 ) 11 cerr << this->displayError(); 12 </sample> 13 14 <sample> 15 <![CDATA[ 16 17 // C++ comment 18 if ( this->getX() < 5 && value[ 0 ] != 3 ) 19 cerr << this->displayError(); 20 ]]> 21 </sample> 22 23 C++ How to Program by Deitel & Deitel 24 </book> CDATA
XML Namespaces • Naming collisions • Two different elements have same name <subject>Math</subject> <subject>Thrombosis</subject> • Namespaces • Differentiate elements that have same name<school:subject>Math</school:subject> <medical:subject>Thrombosis</medical:subject> • school and medical are namespace prefixes • Prepended to elements and attribute names • Tied to uniform resource identifier (URI) • Series of characters for differentiating names
XML Namespaces (cont.) • Creating namespaces • Use xmlns keyword xmlns:text =“urn:deitel:textInfo” xmlns:image =“urn:deitel:imageInfo” • Creates two namespace prefixes text and image • urn:deitel:textInfo is URI for prefix text • urn:deitel:imageInfo is URI for prefix image • Default namespaces • Child elements of this namespace do not need prefix xmlns =“urn:deitel:textInfo”
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 5.8 : namespace.xml --> 4 <!-- Namespaces --> 5 6 <directory xmlns:text = "urn:deitel:textInfo" 7 xmlns:image = "urn:deitel:imageInfo"> 8 9 <text:file filename = "book.xml"> 10 <text:description>A book list</text:description> 11 </text:file> 12 13 <image:file filename = "funny.jpg"> 14 <image:description>A funny picture</image:description> 15 <image:size width = "200" height = "100"/> 16 </image:file> 17 18 </directory> XML namespace - no default
1 <?xml version = "1.0"?> 2 3 <!-- Fig. 5.9 : defaultnamespace.xml --> 4 <!-- Using Default Namespaces --> 5 6 <directory xmlns = "urn:deitel:textInfo" 7 xmlns:image = "urn:deitel:imageInfo"> 8 9 <file filename = "book.xml"> 10 <description>A book list</description> 11 </file> 12 13 <image:file filename = "funny.jpg"> 14 <image:description>A funny picture</image:description> 15 <image:size width = "200"height = "100"/> 16 </image:file> 17 18 </directory> XML namespace with default default needs full name
DTD / Schema • DTD – Document Type Definition • Sometimes also called the ‘Document Type Description‘ • Today you have Schemas • Schemas are more detailed • Comes back to XML • Contains the ‘grammar’ of an XML document • The parser (a program) checks the correctness of the XML file based on the DTD / Schemas • numerics • characters ...
Parsing / Validieren Correct (well-formed) file XML file DTD Schemas
DTDs vs. Schemas • beschreiben den prinzipiellen Aufbau von Dokumenten eines bestimmten Typs • können entweder mit DTDs (Document Type Definitions) oder XML-Schemata spezifiziert werden • DTDs wurden von SGML übernommen und sind Teil von XML 1.0. • XMLSchema sind ein eigener W3C-Standard.