130 likes | 231 Views
XML technologies for text encoding. Tamás Váradi varadi@nytud.hu. Introduction. Processing XML files CSS – getting the picture right XPATH – Finding our way around XSLT extracting the right info Encoding content the right way Text Encoding Initiative TEI Lite Tools. Benefits of XML.
E N D
XML technologies for text encoding Tamás Váradi varadi@nytud.hu
Introduction • Processing XML files • CSS – getting the picture right • XPATH – Finding our way around • XSLT extracting the right info • Encoding content the right way • Text Encoding Initiative • TEI Lite • Tools
Benefits of XML • makes structure and content clear • encoding independent of display and device • portable, platform independent • ideal for exchange of data • with a DTD, validation of document is easy
Limitations of XML • Verbose annotation increases the size of the files (sometimes hugely) • Not very efficient format for fast access and recall
Displaying XML files? • Style sheets • consistent design • easy to change • one stylesheet can serve many XML documents • one documents can use different stylesheets
Cascading Stylesheets Elements are associated with display styles h1: { font-size: 3em; } value selector property A Stylesheet is a collections of style rules
Declaring the stylesheet <?xml-stylesheet type = "text/css" href = "url-of-stylesheet" ?> <? xml version="1.0' ?> <? xml-stylesheet type="text/css" href="cards.css" ?>
An example • Load the file letter.xml into Internet Explorer • Now load the file letter2.xml • View source • Open the file letter.css in notepad • Check that what you see corresponds to what is in the css file
Cascading stylesheets • Features are inherited down the XML tree • Three levels of applying styles: • External stylesheets • Internal style definitions • Inline style settings
Limitations of CSS • Elements are formatted in their original sequence • No means to reorder elements • No means to select a set of elements
More advanced techniques • XSL – Extensible stylesheet Language • XSLT – XSL with Transformations • XPath – a standard way to find elements in the XML hierarchy
XSLT • See the excellent introduction to XSLT by Sebastian Rahtz available here
Standard annotation of content • XML is an annotation standard • it is not designed for any particular domain • Need for standard way of encoding typical text genres like books, dictionaries, letters, radio news etc. etc. • => TEXT ENCODING INITIATIVES (TEI)