1 / 41

Anyone for pizza?

Anyone for pizza?. Designing a TEI document type declaration. The T E what?. Originally, a research project within the humanities Sponsored by ALLC, ACH, ACL Funded 1990-1994 by US NEH, EU LE Programme et al Major influences digital libraries and text collections language corpora

randyh
Download Presentation

Anyone for pizza?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anyone for pizza? Designing a TEI document type declaration

  2. The T E what? • Originally, a research project within the humanities • Sponsored by ALLC, ACH, ACL • Funded 1990-1994 by US NEH, EU LE Programme et al • Major influences • digital libraries and text collections • language corpora • scholarly datasets • Now an international membership consortium incorporated Jan 2001 http://www.tei-c.org

  3. Current TEI activity • Preliminary XML version of DTD now available • Text update in progress • Workgroups under consideration • character set issues • manuscript description • modelling • lexica and termbanks • physical description • resource discovery • Membership will vote by end 2001

  4. Who uses TEI? • digital librarians and archivists • HTI, UVA, CETH, OTA... • Language Engineering projects • EAGLES, BNC, MULTEX, ECI, Silfide • academic researchers • Women Writers Project, CURIA Project, VWWP, Orlando, Model Editions Partnership, Canterbury Tales Project, Bodleian Library... • http://www.hcu.ox.ac.uk/TEI/Applications/

  5. Goals of the TEI • better interchange and integration of scholarly data • support for all texts, in all languages, from all periods • guidance for the perplexed: what to encode • assistance for the specialist: how to encode any information of interest • Hence a loose framework into which unpredictable extensions can be fitted

  6. Legacy of the TEI • a way of looking at what “text” really is • a codification of current scholarly practice • (crucially) a set of shared assumptions and priorities about the digital agenda: • focus on content and function (rather than presentation) • generic solutions (rather than application-specific ones)

  7. TEI Deliverables • A set of recommendations for text encoding, • covering both generic text structures and some highly specific areas based on (but not limited by) existing practice • A very large collection of element definitions • combined into a very loose document type declaration • A mechanism for creating multiple views (DTDs) of the foregoing • One such view and associated tutorial: TEI Lite; others exist (e.g. CES, BNC)

  8. The TEI modus operandi... • identify significant particularities • independent of notation or realization • avoid controversy, over-delicacy, inadequacy • seek generalizable solutions, acceptable to a consensus

  9. ... and some consequences • focus on content, not presentation • descriptive, not prescriptive • Occam's razor • modular, extensible dtd

  10. Designing a dtd for the TEI • How can a single markup scheme handle a large variety of requirements? • all texts are alike • every text is different • Learn from the database designers • one construct, many views • each view a selection from the whole

  11. How Many dtds? • How many dtds might the TEI require? • one (the Corporate or WKWBFY approach) • none (the Anarchic or NWEUMP approach) • as many as it takes (the Mixed Economy or XML approach) • Or a single main dtd with many faces (the British approach)

  12. The TEI solution: modularization • a (very) large number of element and attribute definitions • organized as tagsets (core, base, additional, or auxiliary) • grouped into classes

  13. How to combine Tag Sets… • all tag sets, all the time (the table d'hôte model) • a few pre-selected combinations (the combination plate model) • in completely unconstrained abandon (the smørgasbord model) • one from column A, two from column B (the Chinese menu model)

  14. The Chicago Pizza Model <!ENTITY % base “(deepDish | thinCrust | stuffed)” > <!ENTITY % topping “( pepperoni | mushrooms | sausage | pepper | anchovies | ...)” > <!ELEMENT pizza - - ( %base;, tomatoSauce & cheese, %(topping)) >

  15. To build a view of the TEI dtd, take... • the core tagsets • the base of your choice • the toppings of your choice <!DOCTYPE TEI.2 SYSTEM 'tei2.dtd' [ <!ENTITY % TEI.prose 'INCLUDE' > <!ENTITY % TEI.analysis 'INCLUDE' > ]> <TEI.2>.....</TEI.2>

  16. TEI base tagsets • one only must be selected • defines basic structural components • currently defined: • prose, verse, drama • transcribed speech • dictionaries • terminological databases • mixtures of bases require special treatment

  17. TEI additional tagsets • sets of elements for specialized application areas • can be mixed and matched ad lib • currently provided: • linking and alignment; analysis; feature structures; certainty; physical transcription; textual criticism, names and dates; graphs and trees; figures and tables; language corpora....

  18. For an XML DTD • Just add another declaration to the subset • (This is new in TEI P4) <!ENTITY % TEI.XML “INCLUDE”>

  19. How does this work? • enables all declarations within the tagset marked section defined in the main TEI dtd • these may include element, attribute, and class definitions <!ENTITY % TEI.tagset “INCLUDE”>

  20. How does this work? • Within the main DTD: • the declarations making up each tagset are enclosed by an IGNORE marked section • the declarations for each element are enclosed by an INCLUDE marked section • this can be over-ridden by your declaration within the DTD subset

  21. Customizing the TEI DTD • In DTD subset • selection of tag sets • specification of document entities • in TEI.extensions.ent • renaming of elements • suppression of elements • modification of TEI classes • in TEI.extensions.dtd • definition of new elements

  22. Entity definitions • typically will include entity declarations for embedded graphics etc. • may also invoke special characters etc. <ENTITY % myStuff “myEnts.dtd”> %myStuff;

  23. To modify the dtd • Define your modifications in a pair of extension files <!DOCTYPE TEI.2 SYSTEM "tei2.dtd" [ <!ENTITY % TEI.prose "INCLUDE" > <!ENTITY % TEI.extensions.ent SYSTEM "myMods.ent" > <!ENTITY % TEI.extensions.dtd SYSTEM "myMods.dtd" > ]><TEI.2>...</TEI.2>

  24. In your extension files you can… • rename elements <!ENTITY % n.p “para” > • undefine elements <!ENTITY % seg “IGNORE”> • The pizzaChef gives you a list of all the elements available from your chosen tagsets, and generates extension files for you

  25. You can also • supply additional (or replacement) declarations • supply entirely new elements and embed them in the architecture <!ENTITY % seg “IGNORE”> <!ELEMENT %n.seg; (#PCDATA)> <!ENTITY % x.phrase 'blort|'> <!ELEMENT blort (#PCDATA)> <!ATTLIST blort %a.global; farble (foo|bar|baz) "baz">

  26. An example • In the DTD subset we write: <!ENTITY % TEI.prose "INCLUDE"> <!ENTITY % biblStruct "IGNORE"> • In the prose tagset it says: <!ENTITY % TEI.prose "IGNORE"> <![ %TEI.prose; [ .... lots of other declarations ... <!ENTITY % biblStruct "INCLUDE"> <![ %biblStruct; [ <!ELEMENT biblStruct .... > <!ATTLIST biblStruct .... > ]]> … yet more declarations … ]]>

  27. Finally, the pizza is cooked • The carthage program removes • parameterization in the DTD • unreferenced or inaccessible elements • The pizzachef website • http://www.tei-c.org/pizza.html • command line equivalent: • http://www.tei-c.org/maketeidtd/

  28. Element Classes • Most TEI elements are assigned to one or more • model classes, identifying their syntactic properties, or • attribute classes, identifying their attributes • This provides a (relatively) simple way of • documenting and understanding the DTD • parameterizing content models • facilitating customization • An alternative way of doing architectural forms

  29. Some TEI model classes divn: structural elements like divisions (<div>,<div1>, <div2>…) divtop: elements which can appear at the start of a divn element (<head>, <epigraph>, <byLine>…) chunk: paragraph-like elements (<sp>, <p>, <lg>…) phrase: elements which appear within chunks (<hi>, <foreign>, <date> …)

  30. Implementation of classes • Each model class is defined as a pair of parameter entities • Reference to class members is always indirect <!ENTITY % x.class ““> <!ENTITY % m.class “%x.classname1|name2” > <!ELEMENT foo (%m.class;+)>

  31. Class mobility • Each model class is defined as a parameter entity, containing • a reference to an initially null extension class • a list of members • To add a new member to a class, we redefine the extension class: <!ENTITY % x.class “myChunk|myOther“>

  32. TEI attribute classes • global: attributes which are available to every element (n, lang, id, TEIform) • linking: attributes for elements which have linking semantics (targType, targOrder, evaluate

  33. The TEIFORM attribute • protects applications from the effect of element renaming <titre TEIform="title">...</titre> • protects applications from the effect of syntactic sugar <abc type="xyz”> can be rewritten as <xyz TEIform="abc">

  34. TEI Auxiliary DTDs • independent dtds for specialized information: • writing system / character set • feature system (for feature-structure notation) • tag set documentation • independent, free-standing TEI header

  35. What can go wrong? • extensions must use SGML syntax • beware of zombie elements • beware of over zealous pruning • remember that some TEI rules are not enforced (or enforceable) by the DTD • You have to know what's on the menu before you can choose from it

  36. A case study: the Lampeter corpus • Fairly typical requirements for historical language corpora: • light presentational tagging • structural markup for access • detailed information about source text production • small number of tags to ease data capture and validation • Implementation • tagsets: prose base, and tags from four additional sets • some extensions, many exclusions

  37. The Lampeter corpus DTD subset <!DOCTYPE teiCorpus.2 SYSTEM "tei2.dtd"[ <!ENTITY % TEI.prose "INCLUDE"> <!ENTITY % TEI.corpus "INCLUDE"> <!ENTITY % TEI.figures "INCLUDE"> <!ENTITY % TEI.transcr "INCLUDE"> <!ENTITY % TEI.extensions.ent SYSTEM "lampext.ent"> <!ENTITY % TEI.extensions.dtd SYSTEM "lampext.dtd"> ]>

  38. The Lampeter corpus extensions.ent <!ENTITY % analytic 'IGNORE' > <!ENTITY % biblStruct 'IGNORE' > <!-- hic desunt multa --> <!ENTITY % supplied 'IGNORE' > <!ENTITY % x.phrase "it|ro|sc|su|bo|go|"> <!ENTITY % x.biblPart "printer|pubFormat|bookSeller|"> <!ENTITY % x.demographic "socecstatusPat|biogNote|">

  39. The Lampeter corpus extensions.dtd <!ELEMENT it (%phrase.seq;)> <!ELEMENT printer (%phrase.seq;)> <!ATTLIST it %a.global; > <!– etc.for all other new elements -->

  40. To finish the job • Document your extensions, using the TEI tagset for tagset documentation • Write a manual using the ODD system to generate your DTD fragments http://www.hcu.ox.ac.uk/TEI/Master/Reference

  41. Why bother? • The TEI is a well-known reference point • Using the TEI enables • sharing of data and resources • shared modular software development • lower learning curve and reduced training costs • The TEI is stable, rigorous, and well-documented • The TEI is also flexible, customizable, and extensible in documented ways • The architectural approach offers the best compromise for practical work.

More Related