1 / 63

Introduction to the eXtensible Markup Language (XML)

Introduction to the eXtensible Markup Language (XML). Instructor: Joseph DiVerdi, Ph.D., M.B.A. Background & Context. HTML follows the rules of formal electronic document-markup design & implementation Born out of the need to Assemble text, graphics, & other digital content

Download Presentation

Introduction to the eXtensible Markup Language (XML)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to the eXtensible Markup Language (XML) Instructor: Joseph DiVerdi, Ph.D., M.B.A.

  2. Background & Context • HTML follows the rules of formal electronic document-markup design & implementation • Born out of the need to • Assemble text, graphics, & other digital content • For transmission over the Internet • HTML v4.01 standard is defined using • Standardized Generalized Markup Language • SGML • Adequate for formalizing HTML • Too complex for extending HTML

  3. Background & Content • eXtensible Markup Language • Based on simpler features of SGML • Kinder, gentler, & more flexible • Well-suited for orderly development of new markup languages • HTML is even being reborn as XHTML

  4. Background & Context • With XML there exists a standardized means for defining markup languages • That are customized for different needs • Rather than relying upon HTML extensions • Mathematicians express mathematical notations • Musicians present musical scores • Physicians exchange medical records • Accountants share financial information • All groups need an acceptable, resilient way to express these different kinds of information, so software can be developed to process & display these diverse data

  5. Background & Context • XML provides a solution • Each content sector • business group, trade association, consortium.. • can define a markup language • for information exchange & processing over the Web • Programmers can develop parsers • XML-compliant processes • that read new language definitions & • permit a server to process documents in those languages • permit a client to retrieve & display those documents

  6. Background on SGML • Standard Generalized Markup Language • SGML • International standard (ISO 8879) • Published in 1986 • SGML prescribes a standard format for embedding descriptive markup in a document • SGML also specifies a standard method for describing the structure of a document • More important & crucial to its power

  7. SGML Background • SGML allows an author to set up hierarchical models for each type of document produced • SGML forces each element in the structure • Labeled with descriptive markup such as chapter, title & paragraph • To fit in the logical, predictable structure of the document

  8. SGML Background • SGML supports an unlimited variety of document structures • Users typically design a different document structure for each category of information they produce: • information bulletins • technical manuals • parts catalogs • design specifications • reports • letters & memos

  9. SGML Background (con't) • SGML allows authors to create documents that are independent of any specific hardware or software • Since SGML documents conform to an international standard • They are portable • They can be exchanged seamlessly with users who have different systems

  10. How does SGML work? • A document can be broken into three layers: Structure Content Style • SGML separates these three aspects • Deals mainly with the relationship between structure & content

  11. SGML & Structure • File called the DTD Document Type Definition • DTD describes the structure of a document • Describes types of information handled & relationships among fields • Like a database schema • DTD provides a framework for the elements • Chapters, chapter headings, sections, and topics • That together constitute a document

  12. SGML & Structure • DTD also specifies rules for the relationships between elements • A chapter heading must be the first element after the start of a chapter • Each list must contain at least two items. • These rules ensure that documents have a consistent, logical structure • A DTD accompanies a document everywhere • A document instance is a document whose content has been tagged in conformance with a particular DTD

  13. SGML & Content • Content is the information itself Titles, paragraphs, lists, tables, graphics, & audio • The method for identifying the content's position within the DTD structure is called tagging • Creating an SGML document involves inserting tags around content • These tags mark the beginning and end of each part of the structure

  14. SGML & Content • <PAR> indicates the start of a paragraph & </PAR> indicates the end <PAR>Content is the information itself.</PAR> • Elements can be nested in other elements • The paragraph (<PAR>) is an element within the topic (<topic>) <TOPIC> <PAR> Content is the information itself. </PAR> </TOPIC>

  15. SGML & Content • The structure of a particular document is revealed by the nesting of tags: <section> <subhead> Content </subhead> <par> Content is the information itself. </par> </section>

  16. SGML & Content • Some SGML-based authoring software programs rely on a software module called a parser that verifies that the document follows the rules of the DTD • The parser also verifies that the DTD itself is structurally correct

  17. SGML & Style • SGML itself has nothing to do with setting standards for style • Most systems still rely on proprietary methods :( • Two efforts to develop standards-based style sheets have resulted in the mature OS & the newly released DSSSL • Document Style Semantics & Specification Language • Complex formatting language • Difficult to learn & implement • XSL inherits & simplifies many formatting concepts • eXtensible Stylesheet Language

  18. SGML & HTML • When the creators of the WWW needed a markup language to instruct browsers how to display WWW content they used SGML guidelines to create HTML • Hyper Text Markup Language • HTML was designed specifically for displaying content in a browser • But isn't much good for anything else

  19. Progress Marches On • The WWW has matured & is being used for more than just viewing text and images • More versatile markup languages are needed

  20. Limitations of HTML • HTML was designed so that tags would be used to mark up information according to its meaning • Without regard to how this info would be rendered in a browser • The title, main header, emphasized text ,and contact information of the author are placed inside the elements TITLE, H1, EM, & ADDRESS • Remember SGML structure & content

  21. Limitations of HTML • Each browser should decide how to display marked up text because it knows about the user's preferences & environment and can make decisions based on that information • Without this information, the author cannot do this as well • People who are blind • People who run non-graphical browsers • People who have weak eyesight • Need larger font sizes

  22. Limitations of HTML • Using FONT, I, or other elements to control layout optimizes presentation for a limited number of environments reduces the content's portability • Problems for those readers who operate in a non-standard environment

  23. Limitations of HTML • Browsers have their own elements and attributes whose only purpose is to specify the layout, like FONT, CENTER, BGCOLORetc. • Browser vendors have ignored standards, like CSS, that tried to segregate information about layout from the HTML documents • HTML editors produce HTML where the markup is presentational rather than semantic

  24. Limitations of HTML • The result is that many pages on the web now contain tags written for a specific version of a specific browser & a specific screen resolution with default preferences • These pages are often more or less unreadable to those who use something else anything besides that configuration • HTML has gradually been turned into a presentational language for Netscape & Explorer by the vendors & their users

  25. Limitations of HTML • HTML offers only a limited number of tags for specialized uses • Chemistry • elements for chemical formulas • for measurement data • Airplane manufacturer • engines, parts & models • Stock Broker • opening price, closing price, daily high, etc.

  26. Limitations of HTML • HTML has limited internal structure • It's easy to write valid HTML with semantic nonsense • H2->H1->H3->/H3->/H1->/H2 • Consider the English language equivalent • book title->part title->chapter title • Processing HTML information automatically also becomes difficult or even impossible because of its intrinsic structure

  27. Solution: Just Extend HTML • HTML is already overburdened with dozens of interesting but incompatible inventions from different manufacturers, because it provides only one way of describing your information • HTML is at the limit of its usefulness as a way of describing information, and while it will continue to play an important role for the content it currently represents, many new applications require a more robust and flexible infrastructure

  28. Solution: Just Use "Word" • Information on a network which connects many different types of computer has to be usable on all of them • It is also helpful for such information to be in a form that can be reused in many different ways • Minimize wasted time & effort

  29. Solution: Just Use Word • Public information cannot afford to be restricted to one make or model or manufacturer, or to cede control of its data format to private hands • Proprietary data formats, no matter how well documented or publicized, are simply not an option • Their control still resides in private hands & • They can be changed or withdrawn • arbitrarily & without notice

  30. Solution: Go Back to SGML • SGML is the international standard for defining this kind of application • Those who need an alternative based on different software for other purposes are entirely free to implement similar services using such a system, especially if they are for private use

  31. XML Defined • XML is a portable, WWW-specific SGML • Powerful enough to describe data • Light enough to travel across the Web • SGML with a reduced feature set • Extensible because it is not a fixed format • Not a single, predefined markup language • It's a meta-language • A language for describing other languages

  32. XML Defined • XML documents can reside on a server & be converted to HTML for viewing by browsers if required • Browsers can be XML compliant and access XML documents directly if required

  33. Role of XML Development • It removes two constraints which are holding back Web development: • Dependence on a single, inflexible document type (HTML) • The complexity of full SGML, whose syntax allows many powerful but hard-to-program options. • XML simplifies the levels of optionality in SGML, and allows the development of user-defined document types on the Web.

  34. A Reminder • C, C++, Fortran, Pascal, Basic, Java • programming languages with which calculations are specified, actions, and decisions are made • SGML, XML, HTML • markup specification languages with which ways of describing information, usually for storage, transmission, or processing by a program can be designed • Markup Languages don't do anything alone • a program must be run to do something with them

  35. XML Defined (Again) • The main point of XML is that the author, by defining a markup language, can encode the information of documents much more insightfully than is possible with HTML • This means that programs processing these documents can understand them much better and therefore process the information in ways that are impossible with HTML (or ordinary text processor documents)

  36. Example: Recipe Manager • Marked up recipes (for, say, soups and seafood dishes etc) according to a definition tailored for recipes • Contain the ingredients, amounts of each and alternatives for some • A program that, with a list of your fridge contents, goes through the recipes and makes a list of the possible recipes

  37. Example: Recipe Manager • With nutritional information about the ingredients another program could sort the dishes by the number of calories • Or by how long they'd take to prepare • Or the price of the ingredients • The possibilities are many, because the information is encoded in a way that the computer can more easily "understand"

  38. Example: Tax forms in XML • How to "automate" tax processing systems? • Tax laws are complex • Tax laws change frequently • Tax forms also change frequently • Form user interface code would have to change frequently • Validating and processing applications would have to change frequently

  39. Example: Tax forms in XML • Express the form itself as an XML document • described all the fields • the text in the form • the relationships between the fields • The user interface code for web submission could then use this information in a Java applet to set up the user interface correctly • The validation application could use it to validate received information

  40. Example: Tax forms in XML • Some of the constraints that can expressed in an XML document are: • that field X is the sum of fields W, Y and Z • that field X should contain Y percent of the amount in field Z • that the value of field X should be between Y and Z • that fields X and Y should contain the same value that if the value in field X is Y, then fields W-Z should not be filled in

  41. Example: Tax forms in XML • These should all be easily expressible in XML, and the resulting documents should be simple enough that non-programmers can modify them when needed. • Changes to the forms could then be effected by modifying the XML document, without changing any of the application code

  42. Example: FAQ Maintenance • Using an XML structure an FAQ-maintainer could also be rid of the problems with maintaining the FAQ in HTML, TEXT, and PDF versions • Instead the maintainer can make one or more stylesheets to be run each time the original has been updated to create new versions of the distribution files

  43. Example XML File <?xml version="1.0" standalone="yes"?> <!-- file name: inventory.xml --> <INVENTORY> <BOOK> . . </BOOK> <BOOK> . . </BOOK> </INVENTORY>

  44. Example XML File <BOOK> <TITLE>The Legend of Sleepy Hollow</TITLE> <AUTHOR>Washington Irving</AUTHOR> <BINDING>mass market paperback</BINDING> <PRICE>$2.95</PRICE> </BOOK> <BOOK> <TITLE>Leaves of Grass</TITLE> <AUTHOR BORN="1819">Walt Whitman</AUTHOR> <BINDING>hardcover</BINDING> <PRICE>$7.75</PRICE> </BOOK>

  45. Example XML File w/ CCS <?xml version="1.0" standalone="yes"?> <!-- file name: inventory.xml --> <?xml-stylesheet type="text/css" href="inventory.css"?> <INVENTORY> <BOOK> . . </BOOK> <BOOK> . . </BOOK> </INVENTORY>

  46. Example CSS /* file name: inventory.css */ BOOK { display: block; margin-top: 12pt; font-size: 10pt } TITLE { display: block; font-size: 10pt; font-weight: bold; font-style: italic } AUTHOR { display: block; margin-left: 15pt; font-weight: bold } BINDING { display: block; margin-left: 15pt } PAGES { display: none } PRICE { display: block; margin-left: 15pt }

  47. Example XML File w/DTD <?xml version="1.0" standalone="no"?> <!-- file name: inventory.xml --> <?xml-stylesheet type="text/css" href="inventory.css"?> <!DOCTYPE book_inventory SYSTEM "inventory.dtd"> <INVENTORY> <BOOK> . . </BOOK> </INVENTORY>

  48. Example DTD /* file name: inventory.dtd */ <!ELEMENT INVENTORY (BOOK+)> <!ELEMENT BOOK (TITLE AUTHOR BINDING PAGES PRICE)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT AUTHOR (#PCDATA)> <!ELEMENT BINDING (#PCDATA)> <!ELEMENT PAGES (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>

  49. XML Browser Issues • The XML specification is still relatively new • Much XML is experimental • There won't be just one browser, but many • Because the potential number of different XML applications is not limited, no single browser can be expected to handle 100% of everything

  50. XML Browser Issues • IE5.5 handles XML but currently still renders it via the CSS model even when using an XSL stylesheet • Not all the stylesheet options work • Microsoft was also one of the architects of a invalid hybrid solution in which one could embed fragments of XML in HTML files • Current HTML-only browsers simply ignore element markup which they don't recognize • This has now been superseded by XHTML

More Related