390 likes | 402 Views
XML for Libraries. Roy Tennant eScholarship California Digital Library escholarship.cdlib.org. Introduction. Goal: introduce you to XML, explain what it can do in general terms, and highlight particular uses Clarification: you will not learn enough to do it without further study.
E N D
XML for Libraries Roy Tennant eScholarship California Digital Library escholarship.cdlib.org
Introduction • Goal: introduce you to XML, explain what it can do in general terms, and highlight particular uses • Clarification: you will not learn enough to do it without further study
Outline • Introduction to XML • Serving XML to the Web • Case Studies • Tips & Advice • Resources
Introduction to XML • Extensible Markup Language • A method of creating and using tags to identify the structure and contents of a document — not how it should be displayed • The tags used can be arbitrary or can come from a specification
What it Looks Like <?xml version="1.0"?> <book> <author> <lastname>Tennant</lastname> <firstname>Roy</lastname> </author> <title>The Great American Novel</title> <chapter number=“1”> <chaptitle>It Was Dark and Stormy</chaptitle> <p> “I’m scared,” I said.</p> </chapter> </book>
Two Types of XML • Well-Formed • Valid
Well-Formed XML • Follows general tagging rules: • All tags begin and end • But can be minimized if empty: <br/> instead of <br></br> • All tags are case sensitive • All tags must be properly nested: • <author> <firstname>Mark</firstname><lastname>Twain</lastname> </author> • All attribute values are quoted: • <subject scheme=“LCSH”>Music</subject> • Has identification & declaration tags • Software can make sure a document follows these rules
Valid XML • Uses only specific tags and rules as codified by one of: • A document type definition (DTD) • A schema definition • Only the tags listed by the schema or DTD can be used • Software can take a DTD or schema and verify that a document adheres to the rules • Editing software can prevent an author from using anything except allowed tags
Ways to Use XML • Behind the scenes as a standard and easily transformed format for information • As a transfer syntax, to exchange information in a machine-parseable form • As a method of delivery direct to the user (not recommended)
Why is XML Important? • It is a standard, easily extensible way to encode loosely-structured as well as highly-structured information • Due to its easy parseability, software can transform it in countless ways, thereby allowing: • Easy migration paths • Alternative displays • On-the-fly response to user needs
XML vs. Databases(a simplistic formula) • If your information is… • Tightly structured • Fixed field length • Massive numbers of individual items • You need a database • If your information is… • Loosely structured • Variable field length • Massive record size • You need XML
Serving XML to the Web • Directly in native form • Transformed to static HTML • Transformed to HTML dynamically
Transforming XML: XSLT • XML Stylesheet Language — Transformations (XSLT) • A markup language and programming syntax for processing XML • Is most often used to: • Transform XML to HTML for delivery to standard web clients • Transform XML from one set of XML tags to another • Transform XML into another syntax/system
Required Components for Serving XML to the Web • An XML-encoded “document” • An XSLT stylesheet to… • …transform it to HTML or XHTML: • Static • Dynamic • A CSS stylesheet (optional)
XML Web Publishing Software • Required to: • Apply dynamic transformations to XML content • Render HTML dynamically for standard web browsers • A couple examples, both free: • Cocoon: http://xml.apache.org/cocoon/ • AxKit: http://axkit.org/
Case Study: Publishing Books @ the California Digital Library • Goals: • To create highly usable online versions of books • To create versions that will migrate easily as technology changes • To create an infrastructure that will support dynamic presentations of the same content
Case Study: Publishing Books @ the California Digital Library • Strategy: • Markup the texts in XML • Serve them dynamically using XML web publishing software (currently Cocoon) • Create different displays for different purposes, and a mechanism for allowing the user to select their preferred view • Find and apply an XML-aware search engine • Create a method by which users can create their own Adobe Acrobat versions
AxKit mod_perl Web Server
Cocoon Tomcat Web Server
Cocoon Tomcat Web Server I want this XML doc…
XSLT Stylesheet XML Doc Cocoon Tomcat Web Server
XSLT Stylesheet XML Doc XHTML Document (no displaymarkup)* Cocoon Tomcat HTML Stylesheet (CSS) Web Server * Dynamic document
Transformation XSLT Stylesheet Information Presentation XML Doc XHTML Document (no displaymarkup)* Cocoon Tomcat HTML Stylesheet (CSS) Web Server * Dynamic document
Where are the words found in these books?
Begin transitioning to XML now: XHTML and CSS for web files, XML for static documents with long-term worth Do not rely on browser support of XML DTDs? We don’t need no stinkin’ DTDs! Get on the XML4Lib discussion list:http://sunsite.berkeley.edu/XML4Lib/ Buy my book! Tips and Advice
Resources • Web sites • Electronic discussions • Books • Magazines and journals • Individuals