360 likes | 557 Views
Introduction to XML. What is XML?. Extensible Markup Language XML 1.0 1998 Easier-to-use subset of SGML (Standard Generalized Markup Language) XML is a text-based markup language Standard for data interchange on the web Set of rules for designing semantic tags
E N D
What is XML? • Extensible Markup Language XML 1.0 1998 • Easier-to-use subset of SGML (Standard Generalized Markup Language) • XML is a text-based markup language • Standard for data interchange on the web • Set of rules for designing semantic tags • Meta-markup language to define other languages • XML 1.0 Specificationhttp://www.w3.org/TR/REC-xml
HTML and XML • HTML is an application of SGML • XML is a subset of SGML • XHTML is an application of XML
XML File Sample <?xml version="1.0"?> <dining-room> <manufacturer>The Wood Shop</manufacturer> <table type="round" wood="maple"> <price>$199.99</price> </table> <chair wood="maple"> <quantity>6</quantity> <price>$39.99</price> </chair> </dining-room>
XML describes Structure and Semantics, Not Formatting HTML Example<DL> <DT>Mambo <DD>by Enrique Garcia </DL> <UL> <LI>Producer: Enrique Garcia <LI>Publisher: Sony Music Entertainment <LI>Length: 3:46 <LI>Written: 1991 <LI>Artist: Azucar Moreno </UL>
XML describes Structure and Semantics, Not Formatting (2) XML Example<SONG> <TITLE>Mambo</TITLE> <COMPOSER>Enrique Garcia</COMPOSER> <PRODUCER>Enrique Garcia</PRODUCER> <PUBLISHER>Sony Music Entertainment</PUBLISHER> <LENGTH>3:46</LENGTH> <YEAR>1991</YEAR> <ARTIST>Azucar Moreno</ARTIST> </SONG>
What's So Great About XML? • Easy Data Exchange • Growth of proprietary data formats • Conversion Programs (Applications, versions ..) • Data and markup are stored as text • Avoid store simple data in huge files
What's So Great About XML? (2) • Customizing Markup Languages • Banking Industry Technology Secretariat (BITS) • Financial Exchange (IFX) • Schools Interoperability Framework (SIF) • Common Business Library (CBL) • Electronic Business XML Initiative (ebXML) • The Text Encoding Initiative (TEI)
What's So Great About XML? (3) Self-Describing Data<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> <GREETING>Hello from XML</GREETING> <MESSAGE>Welcome to Programming XML in Java</MESSAGE> </DOCUMENT>
What's So Great About XML? (4) Structured and Integrated Data <?xml version="1.0"?> <SCHOOL> <CLASS type="seminar"> <CLASS_TITLE>XML In The Real World</CLASS_TITLE> <CLASS_NUMBER>6.031</CLASS_NUMBER> <SUBJECT>XML</SUBJECT> <START_DATE>6/1/2002</START_DATE> <STUDENTS> <STUDENT status="attending"> <FIRST_NAME>Edward</FIRST_NAME> <LAST_NAME>Samson</LAST_NAME> </STUDENT> <STUDENT status="withdrawn"> <FIRST_NAME>Ernestine</FIRST_NAME> <LAST_NAME>Johnson</LAST_NAME> </STUDENT> </STUDENTS> </CLASS> </SCHOOL>
Well-Formed XML Documents • Follow the syntax rules setup for XML by W3C in • The XML 1.0 Specification (www.w3.org/TR/REC-xml) • Contain one or more elements • Root element must contain all the other elements • Each element nest inside any enclosing elements properly
Valid XML Documents • Has the same function as Idoc type definition • Association with a Document Type Definition (DTD) • Comply with that DTD <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/css" href="first.css"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (GREETING, MESSAGE)> <!ELEMENT GREETING (#PCDATA)> <!ELEMENT MESSAGE (#PCDATA)> ]> <DOCUMENT> <GREETING>Hello from XML</GREETING> <MESSAGE>Welcome to Programing XML in Java</MESSAGE> </DOCUMENT> • More on DTD http://www.cs.rpi.edu/~puninj/XMLJ/classes/class3/Overview.html
Related Technologies • Hypertext Markup Language • HTML most common output format of XML • Web Browsers: Internet Explorer 5.0, Netscape 6.0 • Different way to design a Web site.
Related Technologies (2) • Cascading Style Sheets • Define formatting properties • Font Size • Font family • Font weight • Paragraph indentation • Paragraph alignment • Multiple style sheets can be applied to a single document • Multiple styles can be applied to a single element.
Related Technologies (3) • The Unicode Character Set • American Standard Code for Information Interchange (ASCII) 0-255 'A' - 65 • XML provides full Support for the two-byte Unicode Character Set. 0-65,535http://www.unicode.org • XML Documents written in: • ASCII • UTF-8 Compressed version of Unicode (uses 8 bits to represent characters)<?xml version="1.0" encoding="UTF-8"?> • XML defines character reference to encode Unicode characters.© < π • Universal Character System (UCS ISO 10646) • 4 bytes per symbol • UCS-2 and UCS-4 encoding
How Do I Use XML? • XML Document is parsed • Data is manipulated • APIs available in Java, C, C++, Perl..
Simple API for XML - SAX • Event-based framework for parsing XML data • Methods such as startDocument(), endElement() • Set of errors and warnings • http://www.megginson.com/SAX • Several parsers can be plugged into the SAX API
Document Object Model - DOM • Manipulation of XML Data • Provides a representation of an XML Document as a tree. • Reads XML Document into memory • http://www.w3.org/DOM
Sun's Java API for XML Parsing - JAXP • Provide cohesiveness to the SAX and DOM APIs • Add convenient methods for Java developers • http://java.sun.com/xml
Java and XML: A Perfect Match • Java is portable code, XML is portable data • Applications completely portable • Java Virtual Machine (JVM) • Standards-based data layer • Java provides the most robust set of: • APIs - JAXP • Parsers - XP • Processors - Saxon • Publishing Frameworks - Cocoon • Tools for XML - XML Pro
XML Editors • Create XML documents • Text Editors - vi, emacs, notepad • XML Editors • Adobe FrameMaker, www.adobe.com • XML Pro, www.vervet.com • XML Writer, xmlwriter.net • XML Notepad, msdn.microsoft.com/xml/notepad/intro.asp • XMetal from SoftQuad, xmetal.com • XML Spy, www.xmlspy.com
XML Parsers • Read XML Document • Verify that XML is well formed • Verify that XML is valid • expat, parser written in C by James Clark (www.jclark.com) • XML for Java (XML4J), from IBM Alphaworks(www.alphawors.ibm.com/tech/xml4j) • Lark, written in Java (www.textuality.com/Lark/) • Apache Xerces (www.apache.org) • XP by James Clark (www.jclark.com) • Oracle XML Parser (technet.oracle.com/tech/xml) • Sun Microsystems Project X (java.sun.com/products/xml)
XML Validators • Verify that XML is valid • XML.com's Validator based on Lark (xml.com) • Language Technology Group at the University of Edinburgh's validator based on the RXP Parserwww.ltg.ed.ac.uk/~richard/xml-check.html • Scholarly Technology Group at Brown University's validatorwww.stg.brown.edu/service/xmlvalid/
XML Browsers • Display the Data to the User • Internet Explorer 5 • Display directly XML Documents • Handle XML in scripting Languages (JScript, VBScript) • Bind XML to ActiveX Data Object (ADO) database recordsets • XML integrated into the Office 2000 suite of applications • Netscape Navigator 6 • Display directly XML Documents • Handle XML in scripting Languages (Javascript 1.5) • Support the XML-based User Interface Language (XUL). XUL lets you configure the controls in the browser • Jumbo • Display XML • Use CML to draw molecules
XML Resources • XML at W3C (http://www.w3.org/xml/) • XML.com (http://www.xml.com/ • XML.org Registry (http://www.xml.org/) • XML Cover Pages (http://xml.coverpages.org/) • Java and XML (http://java.sun.com/xml/)
XML Applications • Languages based on XML • Chemical Markup Language (CML) • Mathematical Markup Language (MathML) • Channel Definition Format (CDF) • Synchronized Multimedia Integration Language (SMIL) • XHTML • Scalable Vector Graphics (SVG) • MusicML • VoxML
XML and Idoc Mapping • XML DTD -> Idoc type • XML tree structure -> Idoc tree structure • XML parent element -> Idoc parent segment • XML child element -> Idoc child segment or field • XML document -> Idoc document
XML DTD and Idoc Type Mapping Example <!-- MATMAS01 Material Master --> <!ELEMENT MATMAS01 (IDOC+) > <!ELEMENT IDOC (EDI_DC40, E1MARAM+) > <!-- IDoc Control Record for Interface to External System --> <!ELEMENT EDI_DC40 (TABNAM, MANDT?, DOCNUM?, DOCREL?, STATUS?, DIRECT, OUTMOD?, EXPRSS?, TEST?, IDOCTYP, CIMTYP?, MESTYP, MESCOD?, MESFCT?, STD?, STDVRS?, STDMES?, SNDPOR, SNDPRT, SNDPFC?, SNDPRN, SNDSAD?, SNDLAD?, RCVPOR, RCVPRT, RCVPFC?, RCVPRN, RCVSAD?, RCVLAD?, CREDAT?, CRETIM?, REFINT?, REFGRP?, REFMES?, ARCKEY?, SERIAL?) > <!-- Segment E1MARAM : Master material general data (MARA) --> <!ELEMENT E1MARAM (MSGFN?, MATNR?, ERSDA?, ERNAM?, LAEDA?, AENAM?, PSTAT?, LVORM?, MTART?, MBRSH?, MATKL?, BISMT?, MEINS?, BSTME?, ZEINR?, ZEIAR?, ZEIVR?, ZEIFO?, AESZN?, BLATT?, BLANZ?, FERTH?, FORMT?, GROES?, WRKST?, NORMT?, LABOR?, EKWSL?, BRGEW?, NTGEW?, GEWEI?, VOLUM?, VOLEH?, BEHVO?, RAUBE?, TEMPB?, TRAGR?, STOFF?, SPART?, KUNNR?, WESCH?, BWVOR?, BWSCL?, SAISO?, ETIAR?, ETIFO?, EAN11?, NUMTP?, LAENG?, BREIT?, HOEHE?, MEABM?, PRDHA?, CADKZ?, ERGEW?, ERGEI?, ERVOL?, ERVOE?, GEWTO?, VOLTO?, VABME?, KZKFG?, XCHPF?, VHART?, FUELG?, STFAK?, MAGRV?, BEGRU?, QMPUR?, RBNRM?, MHDRZ?, MHDHB?, MHDLP?, VPSTA?, EXTWG?, MSTAE?, MSTAV?, MSTDE?, MSTDV?, KZUMW?, KOSCH?, NRFHG?, MFRPN?, MFRNR?, BMATN?, MPROF?, PROFL?, IHIVI?, ILOOS?, KZGVH?, XGCHP?, COMPL?, KZEFF?, RDMHD?, IPRKZ?, PRZUS?, MTPOS_MARA?, GEWTO_NEW?, VOLTO_NEW?, WRKST_NEW?, E1MAKTM+, E1MARCM*, E1MARMM*, E1MBEWM*, E1MLGNM*, E1MVKEM*, E1MLANM*, E1MTXHM*) > <!-- Segment E1MAKTM : Master material short texts (MAKT) --> <!-- Field MATNR in E1MARAM: Material number --> <!ELEMENT MATNR (#PCDATA) >
XML and Idoc Document Mapping Example <?xml version="1.0"?> <MATMAS01> <E1MARAM> <E1MAKTM> <MSGFN>005</MSGFN> <SPRAS>D</SPRAS> <MAKTX>Zwischenlage </MAKTX> </E1MAKTM> </E1MARAM> </MATMAS01>
Recommended Classes on XML • This presentation is modified from material on XML at http://www.cs.rpi.edu/~puninj/XMLJ/classes.html • PSU web based training (WBT) also provides excellent XML courses: • 13182 XML Technology Overview • basic concepts such as XML, DTD, XLS • 86031 XML Programming Part 1 • similar to the one above, runs consistent with the second part. • 86032 XML Programming Part 2 • Advanced topics include XML programming interfaces DOM, SAX, XML translator XSLT • 13182 is good for learning concepts of XML. 86031/32 are good for learning how to code with XML in java or C++.
In Class Assignment (10 HW) • Develop a DTD XML definition for Delivery notes, which includes • Date • Vendor (value is a vendor ID) • Quantity • Product (value is a product ID)