200 likes | 291 Views
Beyond HTML: Extensible Markup Language. Timothy W. Cole Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign American Association of Law Libraries 19 July 2000 t-cole3@uiuc.edu http://dli.grainger.uiuc.edu/Publications/TWCole/AALL_2000/.
E N D
Beyond HTML:Extensible Markup Language Timothy W. Cole Grainger Engineering Library Information CenterUniversity of Illinois at Urbana-Champaign American Association of Law Libraries19 July 2000 t-cole3@uiuc.edu http://dli.grainger.uiuc.edu/Publications/TWCole/AALL_2000/
Ordered Hierarchy of Content ObjectsA Definition of Text in Computer Terms • Premise: A Text is the Sum of its Components • So a <BOOK> Could Be Defined as Containing:<FRONT_MATTER> <CHAPTER>s <BACK_MATTER> • <FRONT_MATTER> Could Contain:<BOOK_TITLE> <AUTHOR>s <PUBLISHER> • While Each <CHAPTER> Could Contain:<CHAPTER_TITLE> <SECTION>s • And Each <SECTION> Could Contain:<SECTION_TITLE> <PARAGRAPH>s • Components Chosen Reflect Anticipated Use
Ordered Hierarchy of Content Objects(continued) • OHCO is a Useful, Albeit Imperfect Model • More Powerful Than Model of Text as a Stream of Characters & Formatting Instructions • Does Not Allow for Overlapping Content Objects • OHCO Model is Inherent in XML, HTML • XML Designed for Descriptive Content Objects, Not Presentational Content Objects • XML Syntax is Fixed, But Semantics is Extensible
XML Basics: Markup & Content • Consider: Would Display As:<?xml version='1.0' ?> Colè, Tim<!-- This is an Example --><author sequence='first'><LName> Colè </LName>,<FName> Tim </FName> </author> • This example illustrates: • XML Processing Instructions • XML Comments (Ignored by XML Applications) • XML Element Markup, Including an Attribute • XML Content, Including an Entity
XML Basics (continued) • “Well-Formed” XML Rules: • XML Element Markup is Case-Sensitive • All XML Tags Must Be Closed • Hierarchical Nesting; No Overlapping Elements • All XML Attribute Values Must Be Quoted • Enforces Stricter Syntax than HTML • Facilitates Fast, Efficient Parsing • Extensible Semantics Provide Flexibility • “Well-Formed” More Lightweight Than SGML
Is It Valid Or Well-Formed?When Does It Matter? • All Web Browsers Need Is Well-Formed • XML Authoring Tools Need To Validate • Otherwise Tower of Babel Ensues • Indexing Agents & Schema-Specific Rendering Agents May Need To Validate • Illustrations: • Malformed XML • Well-Formed But Invalid XML • Valid XML
Library Uses of XML:Using XML for Primary Sources • Facilitates Searching • Full-Text Searching & Field-Specific Searching • More Meaningful Proximity Searching • Better Retrieval / Browsing • Selective Views / Suppression of Personal Data • Re-Ordered & Piecemeal Views • Illustration -- Illinois Agronomy Handbook • Search • Browsing
Library Uses of XML:XML for Metadata & Wrapping • Facilitates Interchange, Normalization, ... • Simpler than Fixed Fields, Record Headers, Etc. • XML Implementations of Metadata Standards, e.g.: RDF, EAD, DC, FGDC, US-MARC • Easier Routing / Handling of Specialized Content • In Combination with Primary Source XML • Automatic Extraction of Metadata From Source • Facilitates Authority Control
Library Uses of XML: XML for Document Management • Smarter Documents • XML Namespaces -- Integrating Multiple XML Schemas (Including XHTML) • Rights Management, Technical Requirements,… • Facilitates Enhanced Linking Between Docs. • Creation of Links From Marked Up Content • Easy to Add or Modify Links Over Time • XLink & XPointer Promise More Robust Linking • Metadata File from Illinois DLIB Testbed • Schema Integrates RDF, DC, & Project Design
Components of XML ImplementationsDTDs & XML Schemas • Use Either to: • Define Content Models • Declare Attributes & Entities • DTDs Inherited from SGML • DTDs Themselves Not Well-Formed XML • Limits on Detail of Content Model Definitions • Minimal Data Typing • XML Schemas Are Well-Formed XML • Data Typing & Better Content Models Supported • Not Yet in Widespread Use
Components of XML ImplementationsEncoding & Entities(Using Characters Not on Your Keyboard) • Computers Use 1s and 0s, but Characters form the Basis of Human-Readable Texts • Coded Character Sets (CCS) Assign Integer Values to Characters -- ASCII, ISO 8859, Unicode • Character Encoding Schemes (CES) Map Those Integers to Bytes -- 7-bit, 8-bit, UTF-8 • Bytes Are Then Rendered as Glyphs by Your Computer, Using Font Appropriate to CCS/ CES • Font Unavailable Or CCS/CES Misunderstood Results in Incorrect Character(s) on Screen
Components of XML ImplementationsEncoding & Entities (continued) • Common Ways to Deal With This Problem: • Select CCS/CES Appropriate to Language • Use Default CCS/CES, but Override Default Font • Use XML/HTML Named or Numeric Entity • HTML Understands Non-Extensible Set of Named Entities • XML Understands Numeric Entities Corresponding to Unicode CCS, All Named Entities Must Be Declared in DTD • Use Unicode for CCS, UTF-8 for CES - XML Defaults • An Illustration in HTML
Components of XML ImplementationsPresentation - CSS Style Sheets • XML Content Objects Have No Style • Use Cascading Style Sheets (CSS)Work Like CSS for HTML, Except: • Must Be Explicit About Everything • No Special Treatment of Class & ID Attributes • Attach CCS to XML Using Special XML PI • CSS Does Define Formatting • CSS DOES NOT Reorganize or Add Content • Simple XML-CSS Example; The CSS Used
Components of XML ImplementationsTransformations - XSLT Style Sheets • Some Characteristics of XSLT Style Sheets • XSLT Files Are Well-Formed XML • XSLT Transform to Another Schema, Or to XHTML • XSLT Objects Have Implicit Functionality • Attach XSLT To Document Using XML PI • XSLT Can Reorganize & Add Content • Still Need CSS for Presentation -- CSS Style Sheets Work on the Output of XSLT Processing • Supplement XSLT With Script To Manipulate & Modify Actual Content • Simple XSLT Example; The XSLT Style Sheet
The State-of-the-Art in XML Tools • XML Authoring • Add-Ons to Established Word Processors, e.g.:WordPerfect 9 / WordPerfect 2000 • Tools With SGML Roots, e.g.:ArborText’s Epic (was Adept) EditorSoftQuad’s XMetaL Editor • New XML Tools, e.g.:Vervet Logic’s XMLProExtensibility’s XML Authority / XML Turbo • So Far, There Are Fewer Authoring Tools Customized for Specialized XML Schemas
The State-of-the-Art in XML Tools (continued) • XML Presentation Tools: • Latest Releases of Netscape Navigator/Mozilla, and Microsoft’s Internet Explorer Support XML-- But Support is Generic, Partial, & Uneven • Plug-Ins, Standalones Available / In Work for Advanced XML Schemas (CML, MML, VML,…) • XML Database Integration Tools: • Add-Ons to Established DBMS Available/In WorkMicrosoft SQL Server-XML Technology Preview • Illustration; With Query & CSS; XML Source File; • XML Query Language Specification In Work
Developing XML Applications:The Politics of XML • Evolution of XML • XML Formalized as W3C Recommendation 2/98 • Numerous Ancillary Specs Released & In WorkNamespaces, XSLT, XLink/XPointer, XML Signature • Numerous Early Implementors(Chemistry, Biology, Multimedia, Metadata) • Prerequisites for Community Implementations • Identify Target(s) of Opportunity • Define Horizontal & Vertical Content Objects • Consensus Building & Community Buy-In • Test Implementations & Tool Building
Developing XML Applications:The Politics of XML (continued) • Status of XML In Legal Community • LegalXML Has Identified Targets Begun Process of Defining Content Objects & Building Consensus • Progress in Some Areas, e.g.:Court Filing (see also XML Court Interface) • Less Visible Progress in Other Workgroups, e.g.:Reference, Public Law, Users • Presence (& Vested Interests) of Extensive Non-XML Legal Automation Systems In Place Lessens Motivation
Developing XML Applications:The Politics of XML (continued) • Status of XML In Publishing & Libraries • Extensive XML Work in MetadataUnfortunately Has Led to Competing Stds. • Many Publishers Have Been Using SGML for a Decade or More -- But Only Internally • Perceived Tradeoff (probably overrated):Publicly Releasing Primary Sources in XML vs.Control of Product & Marketplace • Problems with Early SGML Web Experiments • No One Wants to be FirstBut No One Wants to be Last Either
Future Directions • Continued Evolution of Standards, Tools • Continued Development of Community Implementations -- Selected Disciplines • Increased Use of XML Behind the Scenes • Carryover from SGML Trends • Integration of XML with Databases • XML Unlikely to Replace HTML, Other Document Formats, But Will Co-Exist • Magnitude of Role in Law Libraries Uncertain, but Likely to Have At Least Some Role