1 / 60

SGML, HTML, XML: Do We Really Need All That?

SGML, HTML, XML: Do We Really Need All That?. ISMT Multimedia Fall 2002 Dr Vojislav B Mišić. Lecture Overview. What is a markup language? HTML markup: what’s good, what’s wrong Extensions to HTML (dHTML and style sheets, XML and XSL, …) XML Basic elements Well-formed vs. valid XML

ona
Download Presentation

SGML, HTML, XML: Do We Really Need All That?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SGML, HTML, XML:Do We Really Need All That? ISMT Multimedia Fall 2002 Dr Vojislav B Mišić

  2. Lecture Overview • What is a markup language? • HTML markup: what’s good, what’s wrong • Extensions to HTML (dHTML and style sheets, XML and XSL, …) • XML • Basic elements • Well-formed vs. valid XML • Writing a DTD • Examples of XML

  3. Markup languages • What is markup? • Text (actual contents of the document) • is interspersed with markings • Markup is related to the text • notes on the content • notes on text presentation • but virtually anything can be marked (remember Fermat’s last theorem?) • Markup language allows separation of concerns: content vs. presentation

  4. Standards for markup • SGML (IBM) – a standardized way to write other markup languages (actually, a meta-language) • SGML-based language is specified using a DTD (Document Type Definition) • SGML is not really a user-friendly language, hence its use was rather limited, even though software support for it does exist

  5. Other markup languages • TeX (Knuth) is another widely used markup language • Performs extremely well for complex texts with • mathematical formulas and symbols • cross-references • different typefaces • foreign language

  6. A TeX example \begin{equation}\label{coh1} \Psi (S) = \displaystyle \frac{\displaystyle \sum_{x \in R (S)} \left( \# S_w (x) - 1 \right)} {\displaystyle \sum_{x \in R (S)} \left( \# S - 1 \right)} \end{equation}

  7. HTML • HTML (HyperText Markup Language) is the language of the Internet • Allows platform-independent browsing • Text-only at first, media later • Hyperlinks, limited visual formatting • However, it is far from perfect, and is gradually being replaced (current version: 4.01)

  8. HTML markup • First you write the text, then add appropriate markup tags • Tags can describe logical entities • Headings of different levels: H1, H2, … • Lists and list elements (UL, OL, LI) • But tags can describe visual effects (display rendering) • Bold and italic text (B, IT) • Font and typeface changes

  9. If you make an error… • Anything not recognized as correct HTML is essentially ignored • HTML browser just treats it as plain text and displays it directly • In this manner, users are still able to see most of the source, albeit without proper formatting • Your opinion: is this good or bad?

  10. HTML editing • HTML source is ASCII and essentially layout independent • Plain text editors can be used • You can put extra white space to your heart’s content, with no effect on what is displayed by the browser • Most browsers allow you to view and save the HTML source of the document displayed – the quickest way to learn HTML • HTML is interpreted – editing changes are displayed (almost) instantly

  11. HTML on the Internet • HTML browsers can display graphics and other media objects • Although HTML by itself provides only the most primitive support for multimedia • Tags can specify target URLs (hyperlinks) • Error tolerance ensures that anyone with a browser (any browser) can access HTML documents • … all of which made HTML the language of choice for hypertext on the Internet

  12. More HTML features • Visual formatting is allowed but not forced • you can specify a typeface, but the browser will substitute another one of its own choice if the one specified is not available • User can easily change the presentation • just resize window and select different fonts/sizes • Browser differences (IE vs. Navigator) – actually, not very important any more

  13. HTML Interactivity • Interactivity at first limited to hyperlinks • Forms introduced later (Navigator 3) • Form support still limited, most often a client- or server-side scripting is required • Proliferation of scripting languages • CGI scripts • JavaScript and Jscript (more details later) • Vbscript, ASP • perl

  14. Is HTML a Good Markup Language? • Logical and visual formatting capabilities together • Some people argue for cleaner separation of logical from visual formatting • Others want more author control • Many extensions (some proprietary) • Changes generally lean towards greater author control over document rendering – more direct formatting instructions included

  15. Dynamic HTML • Commercial term – there is no such thing as a dHTML standard • Combination of HTML with new technologies • Stylesheets add greater author control • Scripting allows improved interactivity, including user input • Even simple animations are possible • As always, not quite compatible extensions by Microsoft and Netscape

  16. HTML styles • In standard HTML, logical markup tags (such as <H1>) have predefined properties for • Typeface • Font size • Mode • Line spacing • Properties cannot be changed, and we cannot define our own tags • The only way is to use a (possibly way too long) sequence of appropriate primitive tags every time – not a very convenient solution

  17. Stylesheets to the rescue • Cascaded stylesheets (CSS): cleaner separation of markup from actual content • Style: a named set of properties that define presentation of a chunk of text (character, paragraph, …) • Styles are present in text processing software (WinWord) but in some markup languages as well (TeX) • CSS is used with HTML, but it’s not HTML – although browsers know how to handle them together

  18. CSS Syntax • A CSS-compatible stylesheet contains a set of rules, each with a selector (name), a number of properties and their values • Rules can be • Inline (within a HTML tag, in document body) • Embedded (in the head of a HTML document) • External, in a separate file which is then linked or imported into a HTML document • Position of the rule defines the scope of its effect on the document

  19. CSS Selectors • HTML selectors – text portions of HTML tags • Class selectors – can be applied to any HTML tag • ID selectors – usually applied only once per page to a particular HTML tag • Type of HTML tag defines the scope of CSS properties • Block level (DIV, LI, H1) • Inline (B, FONT, TT) • Replaced tags (IMG)

  20. CSS Properties • Always of the form property:value; • Categories of properties control • Typefaces (fonts, size, mode) • Text (kerning, leading, alignment) • Lists (bullets, indentation) • Colors (borders, text, rules, background) • Margins • Positioning of individual elements

  21. CSS Rule with a HTML selector • Effective redefinition of HTML tags, e.g.:B { fonts: bold 18pt times,serif; text-decoration: underline;} • Redefines the <B> (boldface) tag throughout the rest of the document • Don’t forget to close the brace!

  22. CSS Rule with a class selector • Independent style, applicable to any HTML tag:.extra { font-size: 28pt; }.huge { font-size: 48pt; } • Class selector must be referred to within the HTML tag:<B class="extra">Extra</B><B class="huge">HUGE</B>

  23. CSS Rule with a class selector • May be linked to a specific HTML tag:p.mini { font-size: 8pt; }p.big { font-size: 14pt; } • Class selector may be applied to this HTML tag only:<P class=“mini">mini</P><P class=“big">BIG</P>

  24. CSS Rule with an ID selector • Another independent style, applicable to any HTML tag:#area1 { position: relative; margin-left: 9em; color: red; } • ID is specified within the HTML tag:<SPAN ID="area1"> ... </SPAN>

  25. More on CSS selectors • Several CSS selectors may share the same definition, and individual selectors may get additional properties separately • CSS rules can refer to tags nested within other tags, e.g.,P B { background: pink; } • redefines the <B> tag only when encountered within the <P> tag

  26. Adding CSS to your document • Within a style container in the document head:<HEAD><STYLE TYPE="text/css"><!-- CSS rules go here--></STYLE></HEAD> • HTML comment tags hide the CSS rules form non-CSS browsers

  27. Importing CSS into your document • Create a separate file, stylefile.css, then write<HEAD><LINK REL=stylesheets TYPE="text/css“ HREF="stylefile.css“></HEAD> • Several files may be added in this manner

  28. More on CSS • Single line comments start with // • Multiline comments between matched pairs of /* and */ • A stylesheet file may import another stylesheet file (hence the name CSS) with the statement@import url(stylefile) • But: the last rule listed wins! • Also: beware of browser differences!

  29. More CSS capabilities • Font selection • Text control • List properties • Background properties • Absolute and relative positioning (but this is very dangerous!) • Visibility (which probably has little use by itself – but it can be quite useful when changed though appropriate scripts) • Stacking (vertical) order

  30. Document Object Model • DOM describes the structure of HTML HTML document as a hierarchy • Thus allowing a script written in a suitable language to access and manipulate only selected element (or elements) within that document • document.images.b1.src="button_on.gif" describes a path from root or top (which is the document itself) to a particular element – an image file • Then, a script can manipulate this element (e.g., hide, show, replace, move, …) in response to certain events

  31. XML • eXtended Markup Language: a simplified (easier, more consistent) version of SGML • XML-compliant languages defined with appropriate DTDs • XML parsers signal syntax errors (unlike HTML) – use of authoring tools implied • current uses (with more to follow) • SMIL for synchronized multimedia • RDF for resource definition exchange

  32. What is XML? • A method for putting structured data in a text file • Data stored on disk can be in binary or text format • Binary formats are often more concise • Text format allows human inspection • XML is a set of rules/guidelines/conventions for designing text formats for such data, to produce files that are • Easy to generate and read (by a computer) • Unambiguous and platform-independent • Extensible, easy to localize/internationalize

  33. XML looks like HTML but isn't HTML • XML makes use of • tags (words bracketed by '<' and '>') and • attributes (of the form name="value") • HTML specifies what each tag & attribute means (and often how the text between them will look in a browser) • XML uses the tags only to delimit pieces of data – and leaves the interpretation to the application

  34. XML is text, but isn't meant to be read • XML files are text files, but they are not made for human readers • Text format allows experts (such as programmers) to more easily debug applications • Text format allows the use of a simple text editor to fix a broken XML file • Rules for XML files much stricter than for HTML • Applications are not allowed to try to second-guess the creator of a broken XML file – if the file is broken, just stop and issue an error message

  35. XML is verbose, but that is not a problem • XML is a text format and uses tags to delimit the data • Therefore, XML files are nearly always larger than comparable binary formats • But disk space isn't as expensive anymore as it used to be, and compression/decompression can be fast and reliable • Communication protocols can compress data on the fly, thus saving bandwidth as effectively as a binary format

  36. XML is … good • XML is license-free • XML is platform-independent • XML is well-supported • Choosing XML is a lot like choosing SQL • you still have to build your own database and your own programs/procedures that manipulate it • but there are many tools available and many people that can help you • XML isn't always the best solution, but it is always worth considering …

  37. XML is a family of technologies • XML: the specification that defines what "tags" and "attributes" are • Xlink describes a standard way to add hyperlinks to an XML file • CSS is applicable to XML as it is to HTML • XSL: an advanced language for style sheets (presentation and manipulation) • XSLT: a transformation language • SMIL: Synchronized Multimedia Modeling • … and others

  38. Well-formed vs. valid XML • Well-formed vs. valid XML • Well-formed documents comply with XML well-formedness constraints, which require that • Elements properly nest within each other • Elements use other markup syntax correctly • XML allows you to use elements of your own naming: ESSAY, SECTION, PARAGRAPH, NOTE, IMPORTANT • … unlike HTML, which forces all documents into a fixed document type

  39. Writing XML One, Two • XML Declaration: declares the nature of XML documents to document readers • <?xml version="1.0" standalone="yes"?> • <?xml version="1.0" standalone="no"?> • <?xml version="1.0“ standalone="no“ encoding="UTF-8"?> • Root element: contains all other elements (i.e., the rest of the document) • Root element is synonymous with your document type • Root element cannot be repeated

  40. An XML example <?xml version="1.0" standalone="yes"?> <TRIVIA><MATH><QUESTION>What is the square root of 25</QUESTION><ANSWER>5</ANSWER></MATH> <GENERAL><QUESTION>What is the season after Summer</QUESTION><ANSWER>Fall</ANSWER><ANSWER>Autumn </ANSWER></GENERAL></TRIVIA>

  41. Rules for XML elements • All elements must have opening and closing (start and end) tags <MATH> ... </MATH> • There are exceptions – tags like <QUESTION ... /> • Case matters – CML is case-sensitive • Proper tag nesting must be observed • You can add whitespace to your heart’s content – it is ignored in processing

  42. XML Writing • Describe content with elements of your own naming • Invent a new element each time you introduce content that significantly differs from any previous • More elements = greater control you will have later, when you use it • Add attributes to elements • Attributes describe the content or behavior of elements

  43. Another Example • <?xml version="1.0" standalone="yes"?><HELP><TITLE>XML Help</TITLE><QUERY area="XML"><QUESTION>Where do I start?</QUESTION><ANSWER>Start with your root element. Break your document down into parts, fill them in, repeat.</ANSWER></QUERY><QUERY area="XML"><QUESTION>Are my element names are well chosen?</QUESTION></HELP>

  44. XML Writing 4 • Parsing: checking well-formedness <PRICE>$57.80</PRICE><PET><CAT type="Cornish Rex">Cat nests properly within PET.</CAT></PET><WEATHER>Foggy no closing tag<LEVEL>Intermediate<LEVEL> improper tag<PASSWORD>planetB612</PASSWD> wrong spelling<DISTANCE TYPE=KM 120</DISTANCE> missing closing bracket<CAR><engine>engine does not nest properly within CAR</CAR></engine> improper nesting

  45. Valid XML • Valid XML—unlike well-formed one—requires a Document Type Definition • DTD: a set of rules that a particular document type must follow • The rules state the name and contents of each element, and the contexts in which a particular element can and must exist • DTD enables communication with databases • Valid XML documents may be accompanied by style sheets for proper presentation

  46. What’s in a DTD • Two essential structures: the element and the attribute • Root element: contains all other elements • Contents of other elements defined recursively starting from the root, until you reach text-level elements, e.g., <!ELEMENT NAME CONTENT> • Elements may have attributes, which are defined within the element definition, or separately, e.g., <!ATTLIST ELEMENT-NAME NAME CDATA #IMPLIED>

  47. Writing a DTD <!ELEMENT novel (preface,chapter+,biography?,criticalessay*)> <!ELEMENT preface (paragraph+)> <!ELEMENT chapter (title,paragraph+,section+)> <!ELEMENT section (title,paragraph+)> <!ELEMENT biography (title,paragraph+)> <!ELEMENT criticalessay (title,section+)> <!ELEMENT paragraph (#PCDATA|keyword)*> <!ELEMENT title (#PCDATA|keyword)*> <!ELEMENT keyword (#PCDATA)>

  48. DTD Declarations (1):Element type declaration • Each element type includes a name, content, and possibly a set of attributes • A document can contain many conforming elements of that type • Sequence: ordered list of components (,) • Choice: alternative components (|) • Components may be optional (?) • Components may be required and repeatable (+) • Components may be optional and repeated (*) • Mixed-content declarations must include #PCDATA , parsed character data (i.e., text) as their first member

  49. DTD Declarations (2):Attribute List Declarations • Much more variation here  • String type attributes (CDATA): virtually unconstrained text strings • Enumeration attributes: require a list of options to pick from • Attribute defaults: • #REQUIRED, required; • #IMPLIED, optional; • #FIXED "value", a fixed value, • "value", a default but overridable value • Usage: <ELEMENT-NAME NAME="value">

  50. An Attribute List Example <!ELEMENT MEMO (TO,FROM,SUBJECT,BODY,SIGN)><!ATTLIST MEMO importance (HIGH|MEDIUM|LOW) "LOW"><!ELEMENT TO (#PCDATA)><!ELEMENT FROM (#PCDATA)><!ELEMENT SUBJECT (#PCDATA)><!ELEMENT BODY (P+)><!ELEMENT P (#PCDATA)><!ELEMENT SIGN (#PCDATA)><!ATTLIST SIGN signatureFile CDATA #IMPLIED email CDATA #REQUIRED>

More Related