330 likes | 583 Views
XML Basics. Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University. Markup Language. Extensible Meta Language. Storage. Management. Search. Sharing. Retrieval. Interchange. Information Age. Processing. Information. Representation.
E N D
XML Basics Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Markup Language Extensible Meta Language
Storage Management Search Sharing Retrieval Interchange Information Age Processing Information Representation
IS there such a creation ? The Needs for Information Interchange • Power • Flexibility • Simplicity • Fault tolerance • Scalability • Interoperability • Open standard • Extensible • Character-based • Human-readable
Power Flexibility Simplicity Fault tolerance Scalability Interoperability Open standard Extensible Character-based Human-readable HTML SGML XML X X X X ? X X X X X ? ? ? ? ? ? X X X X X X X IS There Such a Creation? Criteria
Weaknesses of HTML • HTML isn’t extensible – can’t define custom tags. • HTML is display-centric. • HTML isn’t usually directly reusable • HTML only provide one view of data. • HTML has little or no semantic structure. • Getting bigger and slower! • Not fault tolerance. XML will complement, rather than replace, HTML
The Buzz Words Around XML • SVG – Scalable Vector Graphics Language • OFX – Open Financial Exchange • SGML – Standard Generalized Markup Language • DTD – Document Type Definition • DSSSL – Document Style Semantics and Specification Language • CSS – Cascading Style Sheet • XSL – XML Stylesheet Language • DOM – Document Object Model. . . .
Basics of XML What? Who? When? XML How? Where? Why?
What is XML? • XML stands for Extensible Markup Language. • Markup is the code, embedded with the document, which store the information required for electronic processing. • XML is extensible because it predefines no tag but lets the user create tags that are needed for application. • XML is a metalanguage because it can be used to define markup languages.
Family of Markup Languages • GML – Generalized Markup Language • SGML – Standard Generalized Markup Language • HTML – Hyper Text Markup Language • XML – Extensible Markup Language • XHTML – Extensible Hyper-Text Markup Language • CML - Chemistry Markup Language • MathML – Mathematical Markup Language • SVG – Scalable Vector Graphic • SMIL – Synchronized Multimedia Integration Language • HDML – Handheld Device Markup Language • WML – Wireless Markup Language • OEB – Open eBook Structure Specification
XHTML SVG SMIL HDML OEB Genealogy of Markup Languages GML (1969) IBM SGML (1985) ISO 8879 CERN HTML (1993) XML (1998) W3C
Genealogy of Markup Languages SGML XML HTML XSL
Advantages of XML • Common language for system-to-system communication • Enables loose connectivity, yet tight integration • Relatively easy to implement conversion from an RDB record to an XML message. • Platform independent • Scalable • XML Signature provides message and party authentication.
Format Information Structure Nontraditional Traditional vs. Nontraditional Document Information Structure Format Traditional
Ways of Displaying XML Information (Document) XSL DHTML + CSS DSSSL CGI + Script Format Structure (DTD)
Write One Publish Many Idea Process Print out CD ROM Web WAP, etc. Process XML Document Process Process
CAD Package Word Processor Statistical Processing Spreadsheet Package XML for Information Interchange XML
Internet XHTML XML Java Demand for Platform Independent Technology Platform Presentation Data Processing
Selected XML Applications Middle-Tier Servers: • Personalized Frequent-Flyer Website • Building an Online Auction Website • Anatomy of an Information Server E-Commerce: • Electronic Data Interchange (EDI) • Collaboration in an e-commerce Supply Web
Selected XML Applications Portals: • Enterprise Information Portals (EIP) Syndication: • Information and Content Exchange (ICE) Publishing: • PC World Online Content Management: • Enterprise Data Management
Selected XML Applications Content Acquisition: • Integrating Legacy Data Schema: • Building a Schema for a Product Catalog Stylesheet: • A Stylesheet-Driven Tutorial Generator. Navigation – Application Integration: • Application Integration Using Topic Map
Components of XML Systems XML Document (Contents) Well-Formed (Syntax) XML Application XML Parser (Processor) XML DTD (Rule) Validate (Structure)
Well Formed Document Well formed XML documents are those documents that are syntactically correct. Here are some general guidelines: • At least one root element. • All elements must contain both start and end tags. • Tags are case sensitive • No overlapping tags. Elements must nest inside each other properly. • Attribute values must be enclosed in quotes. • An empty element must end with “/>” • The text characters (<), (>) and (“) must always be represented by character entities.
XML Document Issue Warning/Stop Processing Further Processing Issue Warning/Stop Processing Data Type Definition How a Parser Interprets XML - Validate (optional) yes no Well Formed? DTD? no yes yes no Valid?
Popular Parsers for XML • MSXML – Microsoft’s IE • Gecko – Netscape • IBM XML Parser for Java (http://alphaworks.ibm.com/tech/xml4j) • Data Channel XJ Parser (http://xdev.datachannel.com) • SUN XML Parser for Java (http://developer.java.sun.com/developer/earlyAccess/xml/index.html)
Thank You? Any Question?