300 likes | 425 Views
CEAL Preconference Workshop: XML. Wooseob Jeong Assistant Professor School of Information Studies University of Wisconsin – Milwaukee March 2, 2004 San Diego, CA Sponsored by School of Information Studies, University of Wisconsin – Milwaukee University of California – San Diego Library.
E N D
CEAL Preconference Workshop: XML Wooseob Jeong Assistant Professor School of Information Studies University of Wisconsin – Milwaukee March 2, 2004 San Diego, CA Sponsored by School of Information Studies, University of Wisconsin – Milwaukee University of California – San Diego Library
Why XML? • Simply because it’s already everywhere. • MS Office • XHTML - WYSIWYG • PDF • RDF - Dublin Core, RSS • MARC in XML • E-books
What is XML? • Extensible Markup Language • XML is a concept, not an application. • Meta Language • Linguistics for individual languages • XHTML is an application of XML. • Brief history of XML • SGML – HTML • Not enough … why?
Learning XML • No technical experience needed. • Even no HTML experience is welcome. • HTML vs. XHTML (different families) • Again, XML is a concept. • Good starts on XML • http://www.infomotions.com/musings/getting-started/
XML is simple but very strict. • You can make your own mark up set as you like with minimal requirement. • Every tag should be paired. • Tags should be in a hierarchy. • However, once you establish the set, you have to follow it. It’s the law! No exception. Otherwise, your document won’t be displayed at all. • “Well-formedness”– minimum requirement • DTD (Document Type Definition)
Philosophy of XML • Separation of presentation information from its content. No decorating information allowed in contents. • Presentation should be rendered by methods outside the document, currently either CSS or XSLT • CSS has been used in HTML as well as in XML. • Ex) http://www.uwm.edu/~dhedberg/MENU.xml • XSLT is more powerful. • Ex) http://web.utk.edu/~rgilmou1/xml4lita/ • More Examples
Markup information • Presentational Markup: Describe Appearance <blockquote> 1234 N. Oakland Ave. Milwaukee, WI 53201 </blockquote> • Semantic Markup: Indicates Meaning <address> <street>1234 N. Oakland Ave.</street> <city>Milwaukee</city> <state>WI</state> <zip>53201</zip> </address>
Your First XML Document • Using NotePad, please follow the instruction at • http://supervoca.com/xml/first.htm • The result should look like • http://supervoca.com/xml/first.xml
Restaurant Menu Exercise • Well-formedness • CSS (Cascading Style Sheet) • Simple but not flexible • XSLT (Extensible Stylesheet Language Transformations) • It is an xml document itself. • Complex but really powerful • Online Exercises
Menu CSS Exercise • Use NotePad and type yourself, please! • Watch out “save as” option. • Modify “menu.xml” with your favorite foods, adding CSS info. • Modify “menu.css” with your prefences. • Comprehensive CSS reference • http://www.w3schools.com/css/default.asp
Menu XSLT Exercise • Modify “menu2.xml” by adding XSLT info. • Modify “menu2.xsl” with your preference. • It is like a limited programming language. • Selective displays with the same data. • Examples • You may use HTML tags freely, but every attribute’s value should be quoted. • Watch out typos!
Unicode in XML • Unicode is the default character set in XML. • What’s Unicode? • http://unicode.org/ • Why is it so important? • Where is ASCII? • Multilingual vs. Multiscript • WordPad or MS Word should be used for Unicode documents. • Save as “Unicode Text”
“united.xml” <?xml version="1.0"?> <?xml-stylesheet type="text/css" href="united.css"?> <united> <English>Eradicate extreme poverty and hunger</English> <Chinese>消灭极端贫穷和饥饿</Chinese> <French>Réduire l'extrême pauvreté et la faim</French> <Russian>Ликвидация крайней нищеты и голода</Russian> </united>
“united.css” English {display: block; color=red} French {display: block; color=blue} Chinse {display: block; color=green} Russian {display: block; color=purple}
More Unicode Exercise • Multilingual/multiscript sources • United Nations • International Bible Society • Since an XSLT file is an XML document, you can use any languages or any scripts in your XSLT. • Only Windows 2000 or XP supports Unicode fully. • CSS –“bible.xml” • XSLT –“biblecjk.xml”
SMIL Exercise (1) • Synchronized Multimedia Integration Language • Still an XML application! • Multiple media are played together. • Example: Closed Captioning. • RealText Exercise • Based on Real Player setting
SMIL Exercise (2) • Locate an audio source. • Ex) Voice of America at http://voanews.com • Locate its transcript. • Modify “example.smil” file according to your information. • Modify “example.rt” file with your transcript. • It can be a “Karaoke” application.
SMIL Exercise (3) • Online Exercise • http://supervocab.com/xml • Locate any CJK real audio file on the web, and copy the URL to the form. • Ex) http://homepage.third-wave.com/didreat/kor/real.htm • Type the script in CJK. • Choose a character set and a font. • SMIL in Real Audio does still support local character sets only.
Document Type Definition • What is DTD? • The master plan dictates all the rules for elements, attributes, and entities. • You may make your own DTD, but once you make it, you should follow the rule. No exception! • Why is DTD important? • Data Exchange
Elements, Attributes, and Entities • Elements • Building blocks of markup (tags) • Attributes • Qualifying Elements (properties) • Entities • Referencing External Content and Saving Typing • Ex) special characters
DTDs • XHTML • TEI (Text Encoding Initiative) • EAD (Encoded Archival Description) • RDF (Resource Description Framework) • Dublin Core; RSS
Validation • To be a same type of document, it should be valid for its DTD. <!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1.1//EN" "http://www.tei-c.org/Lite/DTD/teixlite.dtd"> • Online validation tool • Well-formedness vs. Validation
TEI Letter Transcript Exercise • The purpose of this exercise is to make a valid TEI document transcribing a letter. • Use a remote TEI DTD • TEIXLite DTD • <!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1.1//EN" "http://www.tei-c.org/Lite/DTD/teixlite.dtd"> • Modify “letter.xml” and “letter.css” with your text and preference. • Do a validation test, please.
Dublin Core • Most Frequent Example in RDF • http://dublincore.org/ <?xml version="1.0"?> <!DOCTYPE rdf:RDF PUBLIC "-//DUBLIN CORE//DCMES DTD 2002/07/31//EN" "http://dublincore.org/documents/2002/07/31/dcmes-xml/dcmes-xml-dtd.dtd"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://www.ilrt.bristol.ac.uk/people/cmdjb/"> <dc:title>Dave Beckett's Home Page</dc:title> <dc:creator>Dave Beckett</dc:creator> <dc:publisher>ILRT, University of Bristol</dc:publisher> <dc:date>2002-07-31</dc:date> </rdf:Description> </rdf:RDF>
RSS (RDF Site Summary) • “Rich Site Summary” • More practical and active example in RDF • http://supervocab.com/rss • RSS feeds are so many on the web. • http://mtgear.net/index.rdf • http://homepage.mac.com/cyberdog_to_go/iblog/B1549800066/rss.xml • http://blog.isism.net/b2rdf.php
Other important parts in XML (1) • XSL • XSL Transformations • XSL Formatting Objects • Ex) PDF • URI (Uniform Resource Identifiers) • URL (Uniform Resource Locator) • ISBN/ISSN
Other important parts in XML (2) • XLINK • More than what HTML links do • Ex) inbound link information, behavior of links (when, how to activate) • XPointer • More than what HTML anchors do • XPointers refer to particular parts of or locations in XML documents. • Ex) linking to the third sentence of the seventeenth paragraph in a document
Other important parts in XML (3) • Namespace • An XML namespace is a collection of names, identified by a URI reference • Problem: same element names • Ex) title in HTML and title of a book • Schema • Alternative to DTD • Data type
Popular E-book Formats • Adobe: basically PDF • Microsoft • Palm • Free E-book Projects • http://etext.lib.virginia.edu/ebooks/ebooklist.html • http://www.sois.uwm.edu/xml/ • Authoring tools • Universal CJK support cannot be found yet.
Conclusion • XML is a concept. • There are many XML applications. • XML should separate its presentation information from its contents. • XML’s default character set is Unicode. • XML should be “well-formed” at least. • DTD/Schema is very important for data/information interchange.