720 likes | 894 Views
Introduction to XML. Cheng-Chia Chen September 2007. contents. What is XML ? Where does XML come from? What is its status? Why do we need XML ? XML v.s Other formats Core XML Specifications and APIs How can we do with XML? XML sites
E N D
Introduction to XML Cheng-Chia Chen September 2007
contents • What is XML ? • Where does XML come from? What is its status? • Why do we need XML ? • XML v.s Other formats • Core XML Specifications and APIs • How can we do with XML? • XML sites • A partial list of XML applications and industry initiatives • a sketch of XML documents
What is XML ? • The eXtensible Markup Language • a data-structure definition language : let you define the structure and format of your own data. • a data format (syntax) used for the representation, storage and transmission of data whose format is defined by xml. • Text-based markup Language, let you define your own HTML-likemarkup languages. • Recommended by World Web Consortium (W3C) in Feb 1998. • intended to be used as a new message format over the Internet to complement the inadequacy of HTML. • a subset of SGML • is now very popular and becomes the dominating interchange format of information over the internet
The idea of XML • Existing student information • S9010 張得功 資科系 三年級 chang10@cs.nccu.edu.tw • S9021 王德財 應數系 二年級 null • …
HTML’s concerns • How to present the data: <TABLE BORDER=1 bgcolor=“yellow” > <TR><TH>學號</TH>姓名<TH>科系</TH> <TH>年級</TH> <TH>電郵</TH> </TR> <TR><TD> S9010</TD><TD>張得功</TD> <TD>資科系</TD> <TD>三年級</TD> <TD> chang10@cs.nccu.edu.tw </TD></TR> <TR> <TD> S9021 </TD> <TD>王德財</TD> <TD>應數系</TD> <TD>二年級 </TD> </TR> </TABLE>
XML’s concerns • XML uses markup tags as well, but, describe the content, rather than the presentation of that content. • the same example coded in XML: <students> <student><學號> S9010 </學號> <姓名>張得功</姓名> <科系>資科系</科系> <年級>三年級</年級> <電郵> chang10@cs.nccu.edu.tw </電郵> </student> <student><學號> S9021 </學號> <姓名>王德財</姓名> <科系>應數系</科系> <年級>二年級</年級><電郵/> </student> … </students> Notes: 1. Only contents are encoded in the XML text. 2. All data are annotated by tags indicating their roles or functions in the message.
Where does XML come from ? • a simplified subset of the Standard Generalized Markup Language (SGML) standardized in 1986, based on the Generalized Markup Language invented by IBM in 1969 • simplified for more general use on the Web and as a data interchange format. • without losing extensibility, • easier for anyone to write valid XML. • easier to write a parser • easier for the parser to quickly verify that documents are well-formed and/or valid. • 1.0 recommended by W3c at Feb. 1998. • 1.1 recommended at Feb. 2004.
What is the status of XML? • A pervasive data formats over internet as well as other IT fields. • embraced by all of the leaders in the computer industry. • many vertical industries are embracing XML for its ability to expedite the availability of their domain-specific information for internal and external use. • IBM, Microsoft, Sun, Oracle, HP, … • There are many W3C-proposed extensions to XML. • Most use the XML language, which minimizes the differences in syntax that must be learned. • See • XML at W3c or • The XML Cover Pages • for most up-to-date information.
Why do we need XML ? or What can XML bring us?
XML unifies the syntax of information • Layer of information(data): • bit • byte • character BCD EBCDIC ASCII BIG5 ISO-8859 ==> • UNICODE • syntax(form) XML • semantics (ontology) Semantic Web • Application • Semantic Web: • an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. • --- Tim Berners-Lee et.al.
New desired requirements in the internet age • Easy retrieval of information over the net • realized by current Web/internet technology • good browser, • web server • HTTP, DNS, search engines. • HTML, URI, HyperText, MIME • Easy/cheap interoperation of existing softwares in the internet. • also the old goal of distributed system/computing • RPC, RMI, CORBA,... • a prerequisite for eCommerce • issues: • data transmission ==> solved by existing internet infrastructure • data representations ?
Why needing a unifying format for data ? • Case: 10 word processors, each need to be able to process docs generated by any other. • 1st approach: • write a converter A-->B for any A and B. • #converter = n x (n-1) = 90 (bad!) • 2nd approach: • invent a common format (C). • write a pair of converters (A --> C, C-->A) for each word processor. • To process doc generated from A by B, simply • A --(A-->C)-- C -- (C-->B) -- B • required converts: 2 x n = 20 (much better!) • prerequisite: need a common format. • This is what XML plays!!
Additional benefits of XML (as a common format) • Free (or cheap) cost of obtaining required software for processing XML. • without the need to reinvent the wheel. • can focus on value-added software based on these underlying software. • Decoupling of tightly-coupled distributed systems into loosely one. • less monopolization of software by vendors • more selections of combinations for buyers • more chances of contributing softwares for small company. • less investment for buyers.
Application type of current World-Wide Web • Three-tier WWW architecture : • Major information flows (for human information retrieval): (human) browser --(http)--> webServer --> databases -->wrap result into html or other MIME formats --(http) ---> browser --> human • major interactions and interchanged data formats: • application type: information retrieval • Man ---(html/MIME)--- machine(browser+web server) • web server ------------ backend system (databases)
file system file system file system databases databases databases client browser query(post,get) http IE client browser html/text gif/jpeg video/audio FireFox web server the internet ... client browser ... query result tables web server web server apache IIS
Additional Interactions for WWW business applications • New application type : web service • additional interactions • backend business system <---> webserver <--> webserver <---> backend business system • Problem: Too many data formats exist among the systems and web servers understanding all kinds of data formats are hard to implement. • Solution: define a universal or a small set of universal data formats (in XML) and require all systems to transmit data using such formats. • but the existing HTML + MIME formats not enough ? • NO!! HTML, while amendable to human via browsers, is not easy for machine to understand/retrieve data.
Advantages of XML over HTML • XML can define your own tags. • XML tags describe the content, rather than the presentation of that content • easier for content search (no annoying presentation data). • easier for page development (separating content from view) • easy for devices to render the contents depending on its environments (single model/multiple views) • Notes for the next figure: • searches can be applied to XML data more easily, and the result can be rendered differently, depending on the destination device. • the XML processor can exist on the server, the client, or both.
work done by the XML processor in response to a client request: • collect data from related data sources • merge sources into a unifying content • rendering data depending on the client’s environment.
Comparison of XML and Other formats • HTML • discussed • Text-based non-markup formats • .c .cpp .java .ini … • Binary formats • .dll .exe .o .swf • .class .png .jpeg …
Advantage of XML over text formats Ex: • JavaML v.s Java; CppML v.s Cpp • XMI v.s rational’s proprietary format • web.xml, plugin.xml v.s ***.ini (for configuration) • build.xml v.s. makefile • XQuery XML format v.s plain text format • RelaxNG XML v.s. plain text format • advantage: • structure explicitly represented in the XML format. • (free and) standard tools (and API) exists for quick parsing of the XML format. => front-end processing avoided/reduced • disadvantage: too verbose. • for storage and transmission. • can be overcome by compression • for human generation; (not a problem for machine generation) • require smarter editor • for human reading/comprehension: • a real problem!!
Advantage of XML over binary formats • Example: • classML v.s .clss file format. • swfml v.s swf (Flash file format) • XER v.s. BER for ASN.1 • advantage: • readable; editable • (free and) open software and APIs available • disadvantage: • take longer time to parse and transmit. The trend: • one data model/ multi representation formats + • converters among the formats.
Core specifications for XML • XML 1.0 • XML Namespace • XML Path language (XPath) • XML Stylesheet Langugae (XSL) • XSL Transformation language (XSLT) • XSL formating Objects (XSLFO) • XML Linking language (XLink) • XML Pointer Langugae (XPointer) • XML schemas (; RelaxNG) • XHTML • XML signatures/canonicalization • XML protocols • XMLForm • XQuery (XML language for Querying XML Documents)
Core Specifications for XML • XML • document type definition (DTD) : a utility used to define the formats and contents of valid XML documents. • a specification to define what kinds of texts are well-formed XML document • XML namespace • Define a mechanism to avoid collision of elements and/or attribute names in documents using multiple sets of DTDs. • Xlink • Define the mechanism for linking to web resources from an XML document. • Xpointer • Define a mechanism for linking to inside an XML document. • XPath • Define a mechanism to refer to part of an XML document
XSL ( XML Stylesheet Language) • a language for expressing stylesheets. • consists of two parts: • XSLT : a language (in XML format) used to describe how to transform an XML document into one in XML or non-XML format. • XSLFO: an XML vocabulary for specifying formatting semantics. • An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary.
XML Schema • A planned replacement of DTD. • used to define the structures and formats of various messages encoded in XML format. • another competing alternative: RelaxNG • consists of three documents: • Part 0: a primer • an easy-to-understand introcuction • Part 2: Datatypes • define tens of frequently used bulit-in datatypes • Part 3: structures • specifies the XML Schema definition language, offers facilities for describing the structure and constraining the contents of XML documents
API for XML documents • DOM (level 1 , 2 & 3) : • Document Object Model • Tree-based XML API • language independent • SAX (version 1 & 2) : • Simple API for XML Document, • Event-based XML API • JDOM, dom4j, XOM (XML APIs for Java) • DOM for Java • Tree-based, • simpler version of DOM • easier to use than DOM, • suitable for Java only
How can XML be used ? XML was designed to store, carry and exchange data. It was not designed to display data. As a syntax format: • XML is used to Exchange Data • With XML, data can be exchanged between incompatible systems. • XML and B2B : With XML, financial information can be exchanged over the Internet. • XML can be used to Share Data • With XML, plain text files can be used to share data. • XML can be used to Store Data With XML, plain text files can be used to store data and object. As a meta language (for defining data structure) • XML can be used to Create new Languages • XML is the mother of WML, SVG, SMIL, GXL, XHTML, CML,...
XML can make your Data more Useful • With XML, your data is available to more users. • For sensible developers • All sensible developers should have all their future applications exchange data in XML.
What can we do about XML • XML processing tools: • XML parser; XML editors; XML-existing format converter • XML2HTML; DTD2DCD ; DCDeditor • Various Domain-specific XML rendering tools • graphical XML --> Graphic • DTD manager, schema tools, soap processor, web service tools/IDE/system • XML-enabled services/applications: • make your application software capable of serving request from internet (without special prerequisite) and requesting other internet on-line service.
What can we do about XML ? • XML document design/application development • Design standard XML format for various domains • order, transaction, billing, product for business domain • mathematical formula, chemical formula in science • Graph/graphics markup language ; Others: ? • academic artifacts: OODesign (XMI), graph(GXL), petriNet, java Object (XML encoding), AST,... • require cooperation of XML experts and domain experts. • XMLize legacy system data/database • domain:一般企業: 之 人事 庫存 客戶 產品 產品使用手冊 公文; 醫院 學校 政府機關(戶政 地政 稅捐...) : 病例 藥品 課程 戶籍 地籍 稅務 • Approaches: • change old format to new XML format, and optionally, provide a view of old format. • two formats coexisting. • preserve old format, provide a new XML view.
XML information • Java • Sun’s java site: (http://java.sun.com/) • The java tutorial (http://java.sun.com/docs/books/tutorial/) is a nice book to begin with. • Information sources for XML: • W3C site: http://www.w3.org/ • SGML/XML home page: http://xml.coverpages.org/ • XML com: http://www.xml.com/ • XML page of leading computer companies • Microsoft: http://www.microsoft.com/xml/ • IBM: http://www.ibm.com/developer/xml/ • sun: http://java.sun.com/xml • …
XML applications • XML as an alternative representation format • (SVG) Scalar Vector Graph : for vector graph • (MathML) : for mathematical expressions • SMIL (Synchronized Multimedium Integration language): • Resource Description Framework (RDF) : an XML language for describing web resources and their relationship • CML (Chemical Markup Language) : for chemical molecule • JCML : XML format for java bytecodes (object code) • JavaML : for java programs • CppML : XML formats for C++ • Ant : a replacement of make for java • OOML : a OO PL in XML • UIML : user interface Markup language • WAP WML (Wireless Markup Language)
A partial list of XML applications and industry initiatives • W3C Specifications Documentation • Text Encoding Initiative (TEI) • XCES: Corpus Encoding Standard for XML • Encoding and Markup for Texts of the Ancient Near East • Electronic Text Corpus of Sumerian Literature (ETCSL) • Perseus Project • Channel Definition Format, CDF (Based on XML) • RDF Rich Site Summary (RSS) • Open Content Syndication (OCS) • Web Modeling Language (WebML) • Portable Site Information (PSI) • XHTML and 'XML-Based' HTML Modules • W3C Document Object Model (DOM), Level 1 Specification • Web Collections using XML • Meta Content Framework Using XML (MCF) • XML-Data • Namespaces in XML • Resource Description Framework (RDF) • Ontology Interchange Language (OIL) • The Australia New Zealand Land Information Council (ANZLIC) - Metadata • Alexandria Digital Library Project • ATLA Serials Project (ATLAS)
XML in law • BiblioML - XML for UNIMARC Bibliographic Records • Medlane XMLMARC Experiment - MARC to XML • e-Government Interoperability Framework (e-GIF) • US Federal CIO Council XML Working Group • XML Metadata Interchange Format (XMI) - Object Management Group (OMG) • OMG Common Warehouse Metadata Interchange (CWMI) Specification • Object Management Group XML/Value RFP • MDC Open Information Model (OIM) • Dublin Core Metadata Initiative (DCMI) • Open Archives Metadata Set (OAMS) • Publishing Requirements for Industry Standard Metadata (PRISM) • Platform for Internet Content Selection (PICS) XML and Petri Nets • Outline Processor Markup Language (OPML) • ParlML: A Common Vocabulary for Parliamentary Language • Legal XML Working Group • COSCA/NACM JTC XML Court Filing Project • New Mexico District Court XML Interface (XCI)
XML and multimedia • Synchronized Multimedia Integration Language (SMIL) • Multimodal Presentation Markup Language (MPML) • Moving Picture Experts Group: MPEG-7 Standard • DIG35: Metadata Standard for Digital Images • W3C Scalable Vector Graphics (SVG) • Precision Graphics Markup Language (PGML) • Vector Markup Language (VML) • Image Markup Language (IML) • VRML (Virtual Reality Modeling Language) and X3D • Extensible Graph Markup and Modeling Language (XGMML) • Structured Graph Format (SGF) • Graph Exchange Language (GXL) • Petri Net Markup Language (PNML)
XML in chemistry and biochemistry • Georgia State University Electronic Court Filing Project • Web Standards Project (WSP) • Open Software Description Format (OSD) • XLF (Extensible Log Format) Initiative • ALURe (Aggregation and Logging of User Requests) XML Specification • Apache XML Project • WAP Wireless Markup Language Specification • The SyncML Initiative • Materials Property Data Markup Language (MatML) • Measurement Units Markup Language • XML-Based 'eStandard' for the Chemical Industry • Chemical Markup Language • Molecular Dynamics [Markup] Language (MoDL) • StarDOM - Transforming Scientific Data into XML • Bioinformatic Sequence Markup Language (BSML) • BIOpolymer Markup Language (BIOML) • CellML • Gene Expression Markup Language (GEML) • Genome Annotation Markup Elements (GAME)
XML and Finance • Microarray Markup Language (MAML) • XML for Multiple Sequence Alignments (MSAML) • Systems Biology Markup Language (SBML) • OMG Gene Expression RFP • Taxonomic Markup Language • XDELTA: XML Format for Taxonomic Information • Virtual Hyperglossary (VHG) • Weather Observation Definition Format (OMF) • Open Philanthropy Exchange (OPX) • Open Financial Exchange (OFX/OFE) • Interactive Financial Exchange (IFX) • FinXML - 'The Digital Language for Capital Markets' • Investment Research Markup Language (IRML) • Extensible Financial Reporting Markup Language (XFRML) • Extensible Business Reporting Language (XBRL) • XMLPay Specification • Trading Partner Agreement Markup Language (tpaML) • Internet Open Trading Protocol (IOTP) • Financial Products Markup Language (FpML)
XML messaging ( or XML Protocols) • XML Mail Transport Protocol (XMTP) for XML SMTP and MIME Representation • HTML Threading - Use of HTML in Email • XML Messaging (IETF) • Jabber XML Protocol • XML Messaging Specification (XMSG) • M Project: Java XML-Based Messaging System • HTTP Distribution and Replication Protocol (DRP) • Information and Content Exchange (ICE)
FAML DTD for Financial Research Documents • Mortgage Bankers Association of America MISMO Standard • Digital Property Rights Language (DPRL) • Extensible Rights Markup Language (XrML) • Open Digital Rights Language (ODRL) • Research Information Exchange Markup Language (RIXML) • Data Link for Intermediaries Markup Language (daliML) • XML-MP: XML Mortgage Partners Framework • EcoKnowMICS ML • Electronic Book Exchange (EBX) Working Group FIXML - A Markup Language for the FIX Application Message Layer • Bank Internet Payment System (BIPS) • smartX ['SmartCard'] Markup Language (SML)
Secure XML • XML and Encryption • XML Digital Signature (Signed XML - IETF/W3C) • XML Key Management Specification (XKMS) • Security Services Markup Language (S2ML) • AuthXML Standard for Web Security • Digital Signatures for Internet Open Trading Protocol (IOTP) • XML Encoding of SPKI Certificates • Digital Receipt Infrastructure Initiative • Digest Values for DOM (DOMHASH) • Signed Document Markup Language (SDML)
Real Estate Transaction Markup Language (RETML) • OpenMLS and RELML (Real Estate Listing Markup Language) • Data Consortium (Real Estate Standards) • Comprehensive Real Estate Transaction Markup Language (CRTML) • ACORD - XML for the Insurance Industry • iLingo XML Schemas for Insurance • Customer Profile Exchange (CPEX) Working Group • Customer Support Consortium • XML for the Automotive Industry - SAE J2008 • Spacecraft Markup Language (SML) • XML.ORG - The XML Industry Portal • X-ACT - XML Active Content Technologies Council • Electronic Business XML Initiative (ebXML) • BASDA eBIS-XML • Portal Markup Language (PML) • EDGARspace Portal • DII Common Operating Environment (COE) XML Registry • StarOffice XML File Format • Open eBook Initiative • ONIX International XML DTD • NISO Digital Talking Books (DTB)
OpenMath Standard • OMDoc: A Standard for Mathematical Documents • Mathematical Markup Language • Re-Useable Data Language (RDL)" • OpenTag Markup • Metadata - PICS • MIX - Mediation of Information Using XML • CDIF XML-Based Transfer Format Covad xLink API (XML-Based DSL Provisioning) • WebBroker: Distributed Object Communication on the Web • Web Interface Definition Language (WIDL) • Global Engineering Networking Initiative (GEN) • XML/EDI - Electronic Data Interchange • XML/EDI Repository Working Group
Global Uniform Interoperable Data Exchange (GUIDE) • BizCodes Initiative • Universal Data Element Framework (UDEF) • European XML/EDI Workshop • EEMA EDI/EC Work Group - XML/EDI • ANSI ASC X12/XML and DISA • OpenTravel Alliance (OTA) • Hospitality Industry Technology Integration Standards (HITIS) Project • Open Catalog Protocol (OCP) • eCatalog XML (eCX) • vCard Electronic Business Card • Customer Identity / Name and Address Markup Language (CIML, NAML) • AND Global Address XML Definition • Historical Event Markup and Linking • iCalendar XML DTD
EC FrameWorks • CommerceNet Industry Initiative • eCo Interoperability Framework Specification • BizTalk Framework • eCo Framework Project and Working Group • Commerce XML (cXML) • SMBXML: An Open Standard for Small to Medium Sized Businesses • RosettaNet
XML Encoded Form Values • Capability Card: An Attribute Certificate in XML • Telecommunications Interchange Markup (TIM, TCIF/IPI) • aecXML Working Group - Architecture, Engineering and Construction • Building Construction Extensible Markup Language (bcXML) • MasterBuilder Construction Management and Accounting • Green Building XML (gbXML) • Product Data Markup Language (PDML) • Product Definition Exchange (PDX) • Electronic Component Information Exchange (ECIX) and Pinnacles Component Information Standard (PCIS) • ECIX QuickData Specifications • ECIX Component Information Dictionary Standard (CIDS) • ECIX Timing Diagram Markup Language (TDML) • XML and Electronic Design Automation (EDA) • Encoded Archival Description (EAD) • UML eXchange Format (UXF) • XML Data Binding Specification • Translation Memory eXchange (TMX) • P3P Specification: Platform for Privacy Preferences • Extensible Name Service (XNS) • Dialogue Moves Markup Language (DMML)
Scripting News in XML • InterX.org Initiative • Document Encoding and Structuring Specification for Electronic Recipe Transfer (DESSERT) • NuDoc Technology • Coins: Tightly Coupled JavaBeans and XML Elements • DMTF Common Information Model (CIM) • Universal Plug and Play Forum • XML Transition Network Definition (XTND) • Process Interchange Format XML (PIF-XML) • (XML) Topic Maps • DARPA Agent Mark Up Language (DAML) • Rule Markup Language (RuleML) • Relational-Functional Markup Language (RFML) • Ontology and Conceptual Knowledge Markup Languages • Information Flow Framework Language (IFF) • Simple HTML Ontology Extensions (SHOE) • XOL - XML-Based Ontology Exchange Language • Description Logics Markup Language (DLML) • Case Based Markup Language (CBML) • Artificial Intelligence Markup Language (AIML) • Physics Markup Language (PhysicsML)
Procedural Markup Language (PML) • QAML - The Q&A Markup Language • LACITO Projet Archivage de données linguistiques sonores et textuelles [Linguistic Data Archiving Project] • Geography Markup Language (GML) • LandXML • Navigation Markup Language (NVML) • Extensible Data Format (XDF) • Gemini Observatory Project • NASA Goddard Astronomical Data Center (ADC) 'Scientific Dataset' XML • Extensible Scientific Interchange Language (XSIL) • Object Oriented Data Technology (OODT) and XML • Astronomical Markup Language • Astronomical Instrument Markup Language (AIML) • GedML: [GEDCOM] Genealogical Data in XML • adXML.org: XML for Advertising • Newspaper Association of America (NAA) - Standard for Classified Advertising Data • News Industry Text Format (NITF) • XMLNews: XMLNews-Story and XMLNews-Meta • NewsML and IPTC2000 • News Markup Language (NML) • Notes Flat File Format (NFF)