60 likes | 158 Views
Construction and Pedagogical Use of Digital Archives. One: Introduction to XML. <?xml version="1.0"?> <?xml-stylesheet href="jonson.xsl" type="text/xsl"?> <!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main DTD Driver File//EN" "http://www.tei-c.org/Guidelines/DTD/tei2.dtd" [
Construction and Pedagogical Use of Digital Archives One: Introduction to XML <?xml version="1.0"?> <?xml-stylesheet href="jonson.xsl" type="text/xsl"?> <!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main DTD Driver File//EN" "http://www.tei-c.org/Guidelines/DTD/tei2.dtd" [ <!ENTITY % TEI.XML 'INCLUDE'> <!ENTITY % TEI.mixed 'INCLUDE'> <!ENTITY % TEI.drama 'INCLUDE'> <!ENTITY % TEI.verse 'INCLUDE'> <!ENTITY % TEI.prose 'INCLUDE'> <!ENTITY % TEI.figures 'INCLUDE'> <!ENTITY % TEI.linking 'INCLUDE'> <!ENTITY % TEI.transcr 'INCLUDE'> <!ENTITY % TEI.analysis 'INCLUDE'> <!ENTITY % TEI.textcrit 'INCLUDE'> <!ENTITY % ISOlat1 SYSTEM 'http://www.tei-c.org/Entity_Sets/Unicode/iso-lat1.ent'> %ISOlat1; <!ENTITY % ISOlat2 SYSTEM 'http://www.tei-c.org/Entity_Sets/Unicode/iso-lat2.ent'> %ISOlat2; <!ENTITY % ISOnum SYSTEM 'http://www.tei-c.org/Entity_Sets/Unicode/iso-num.ent'> %ISOnum; <!ENTITY % ISOpub SYSTEM 'http://www.tei-c.org/Entity_Sets/Unicode/iso-pub.ent'> %ISOpub; <!NOTATION jpg SYSTEM "IMAGES/JPEG"> <!ENTITY fig130-001-1 SYSTEM "fig130-001-1.jpg" NDATA jpg> ]> Washington University 25 May 2006
What is XML? • A meta-language, i.e. a language for creating languages • A platform- and application-independent protocol for marking-up structured documents • A descriptive mark-up system, i.e. one that describes and categorizes parts of a document • An extensible strategy that allows designers to scale the scope of mark-up to match the needs of a project
Basic XML Components • Document, or “instance” • DTD, defining the grammar and syntax of the tagging scheme • XSLT files for transforming documents • Associated external resources such entities or CSS files • Associated tools such as XML parser • Additional protocols such as XSL-FO, XLinks, XPointers, and Schemas
Basic Rules of XML • The XML declaration must begin the document • Every opening <tag> must have an accompanying closing </tag> • All elements must be nested hierarchically • Empty tags must end with />, for example, <tag/> • The document must contain exactly one root element that completely contains all other elements • All attribute values must be within quotes • The characters "<" and "&" are reserved and must be used only to begin tags and entity references respectively • The only native XML entity references are &, <, >, ', and "
XML Good Practice • Do not include whitespaces in tag names • Do not include reserved XML characters or characters that have special meaning in processing languages like perl • Do not start a tag name with a number or a punctuation character • Tag names are case-sensitive -- <tag> is different from <TAG> • Take care when using characters beyond the core 7-bit ASCII set, for example ë, ß, or £
Sample XML Document 1: <?xml version="1.0"?> 2: <?xml-stylesheet href=”potato.xsl" type="text/xsl"?> 3: <!DOCTYPE russett SYSTEM “Idaho.dtd”[ 4: <!ENTITY % spud.xml 'INCLUDE'> 5: <!ENTITY chips “Ruffles”> 6: <!NOTATION jpg SYSTEM “images/spuds”> 7: <!ENTITY crispy SYSTEM “smashed.jpg” NDATA jpg> ]> 8: <russett> 9: <title>My Favorite Tuber Kings</title> <snack> 11: <basic n=“1” type=“breakfast”>Hash Browns</basic> 12: <basic n=“2” type=“lunch”>&chips;</basic> <basic n=“3” type=“dinner”>Mashed (see 13: <pic ref=“smashed.jpg”>) 14: <!-- Oops, I just started the Atkins Diet...forget all this --> </basic> </snack> </russett>