Anyone for pizza?

Anyone for pizza? Designing a TEI document type declaration

The T E what? • Originally, a research project within the humanities • Sponsored by ALLC, ACH, ACL • Funded 1990-1994 by US NEH, EU LE Programme et al • Major influences • digital libraries and text collections • language corpora • scholarly datasets • Now an international membership consortium incorporated Jan 2001 http://www.tei-c.org

Current TEI activity • Preliminary XML version of DTD now available • Text update in progress • Workgroups under consideration • character set issues • manuscript description • modelling • lexica and termbanks • physical description • resource discovery • Membership will vote by end 2001

Who uses TEI? • digital librarians and archivists • HTI, UVA, CETH, OTA... • Language Engineering projects • EAGLES, BNC, MULTEX, ECI, Silfide • academic researchers • Women Writers Project, CURIA Project, VWWP, Orlando, Model Editions Partnership, Canterbury Tales Project, Bodleian Library... • http://www.hcu.ox.ac.uk/TEI/Applications/

Goals of the TEI • better interchange and integration of scholarly data • support for all texts, in all languages, from all periods • guidance for the perplexed: what to encode • assistance for the specialist: how to encode any information of interest • Hence a loose framework into which unpredictable extensions can be fitted

Legacy of the TEI • a way of looking at what “text” really is • a codification of current scholarly practice • (crucially) a set of shared assumptions and priorities about the digital agenda: • focus on content and function (rather than presentation) • generic solutions (rather than application-specific ones)

TEI Deliverables • A set of recommendations for text encoding, • covering both generic text structures and some highly specific areas based on (but not limited by) existing practice • A very large collection of element definitions • combined into a very loose document type declaration • A mechanism for creating multiple views (DTDs) of the foregoing • One such view and associated tutorial: TEI Lite; others exist (e.g. CES, BNC)

The TEI modus operandi... • identify significant particularities • independent of notation or realization • avoid controversy, over-delicacy, inadequacy • seek generalizable solutions, acceptable to a consensus

... and some consequences • focus on content, not presentation • descriptive, not prescriptive • Occam's razor • modular, extensible dtd

Designing a dtd for the TEI • How can a single markup scheme handle a large variety of requirements? • all texts are alike • every text is different • Learn from the database designers • one construct, many views • each view a selection from the whole

How Many dtds? • How many dtds might the TEI require? • one (the Corporate or WKWBFY approach) • none (the Anarchic or NWEUMP approach) • as many as it takes (the Mixed Economy or XML approach) • Or a single main dtd with many faces (the British approach)

The TEI solution: modularization • a (very) large number of element and attribute definitions • organized as tagsets (core, base, additional, or auxiliary) • grouped into classes

How to combine Tag Sets… • all tag sets, all the time (the table d'hôte model) • a few pre-selected combinations (the combination plate model) • in completely unconstrained abandon (the smørgasbord model) • one from column A, two from column B (the Chinese menu model)

To build a view of the TEI dtd, take... • the core tagsets • the base of your choice • the toppings of your choice <!DOCTYPE TEI.2 SYSTEM 'tei2.dtd' [ <!ENTITY % TEI.prose 'INCLUDE' > <!ENTITY % TEI.analysis 'INCLUDE' > ]> <TEI.2>.....</TEI.2>

TEI base tagsets • one only must be selected • defines basic structural components • currently defined: • prose, verse, drama • transcribed speech • dictionaries • terminological databases • mixtures of bases require special treatment

TEI additional tagsets • sets of elements for specialized application areas • can be mixed and matched ad lib • currently provided: • linking and alignment; analysis; feature structures; certainty; physical transcription; textual criticism, names and dates; graphs and trees; figures and tables; language corpora....

For an XML DTD • Just add another declaration to the subset • (This is new in TEI P4) <!ENTITY % TEI.XML “INCLUDE”>

How does this work? • enables all declarations within the tagset marked section defined in the main TEI dtd • these may include element, attribute, and class definitions <!ENTITY % TEI.tagset “INCLUDE”>

How does this work? • Within the main DTD: • the declarations making up each tagset are enclosed by an IGNORE marked section • the declarations for each element are enclosed by an INCLUDE marked section • this can be over-ridden by your declaration within the DTD subset

Customizing the TEI DTD • In DTD subset • selection of tag sets • specification of document entities • in TEI.extensions.ent • renaming of elements • suppression of elements • modification of TEI classes • in TEI.extensions.dtd • definition of new elements

Entity definitions • typically will include entity declarations for embedded graphics etc. • may also invoke special characters etc. <ENTITY % myStuff “myEnts.dtd”> %myStuff;

To modify the dtd • Define your modifications in a pair of extension files <!DOCTYPE TEI.2 SYSTEM "tei2.dtd" [ <!ENTITY % TEI.prose "INCLUDE" > <!ENTITY % TEI.extensions.ent SYSTEM "myMods.ent" > <!ENTITY % TEI.extensions.dtd SYSTEM "myMods.dtd" > ]><TEI.2>...</TEI.2>

In your extension files you can… • rename elements <!ENTITY % n.p “para” > • undefine elements <!ENTITY % seg “IGNORE”> • The pizzaChef gives you a list of all the elements available from your chosen tagsets, and generates extension files for you

You can also • supply additional (or replacement) declarations • supply entirely new elements and embed them in the architecture <!ENTITY % seg “IGNORE”> <!ELEMENT %n.seg; (#PCDATA)> <!ENTITY % x.phrase 'blort|'> <!ELEMENT blort (#PCDATA)> <!ATTLIST blort %a.global; farble (foo|bar|baz) "baz">

An example • In the DTD subset we write: <!ENTITY % TEI.prose "INCLUDE"> <!ENTITY % biblStruct "IGNORE"> • In the prose tagset it says: <!ENTITY % TEI.prose "IGNORE"> <![ %TEI.prose; [ .... lots of other declarations ... <!ENTITY % biblStruct "INCLUDE"> <![ %biblStruct; [ <!ELEMENT biblStruct .... > <!ATTLIST biblStruct .... > ]]> … yet more declarations … ]]>

Finally, the pizza is cooked • The carthage program removes • parameterization in the DTD • unreferenced or inaccessible elements • The pizzachef website • http://www.tei-c.org/pizza.html • command line equivalent: • http://www.tei-c.org/maketeidtd/

Element Classes • Most TEI elements are assigned to one or more • model classes, identifying their syntactic properties, or • attribute classes, identifying their attributes • This provides a (relatively) simple way of • documenting and understanding the DTD • parameterizing content models • facilitating customization • An alternative way of doing architectural forms

Some TEI model classes divn: structural elements like divisions (<div>,<div1>, <div2>…) divtop: elements which can appear at the start of a divn element (<head>, <epigraph>, <byLine>…) chunk: paragraph-like elements (<sp>, <p>, <lg>…) phrase: elements which appear within chunks (<hi>, <foreign>, <date> …)

Implementation of classes • Each model class is defined as a pair of parameter entities • Reference to class members is always indirect <!ENTITY % x.class ““> <!ENTITY % m.class “%x.classname1|name2” > <!ELEMENT foo (%m.class;+)>

Class mobility • Each model class is defined as a parameter entity, containing • a reference to an initially null extension class • a list of members • To add a new member to a class, we redefine the extension class: <!ENTITY % x.class “myChunk|myOther“>

TEI attribute classes • global: attributes which are available to every element (n, lang, id, TEIform) • linking: attributes for elements which have linking semantics (targType, targOrder, evaluate

The TEIFORM attribute • protects applications from the effect of element renaming <titre TEIform="title">...</titre> • protects applications from the effect of syntactic sugar <abc type="xyz”> can be rewritten as <xyz TEIform="abc">

TEI Auxiliary DTDs • independent dtds for specialized information: • writing system / character set • feature system (for feature-structure notation) • tag set documentation • independent, free-standing TEI header

What can go wrong? • extensions must use SGML syntax • beware of zombie elements • beware of over zealous pruning • remember that some TEI rules are not enforced (or enforceable) by the DTD • You have to know what's on the menu before you can choose from it

A case study: the Lampeter corpus • Fairly typical requirements for historical language corpora: • light presentational tagging • structural markup for access • detailed information about source text production • small number of tags to ease data capture and validation • Implementation • tagsets: prose base, and tags from four additional sets • some extensions, many exclusions

The Lampeter corpus DTD subset <!DOCTYPE teiCorpus.2 SYSTEM "tei2.dtd"[ <!ENTITY % TEI.prose "INCLUDE"> <!ENTITY % TEI.corpus "INCLUDE"> <!ENTITY % TEI.figures "INCLUDE"> <!ENTITY % TEI.transcr "INCLUDE"> <!ENTITY % TEI.extensions.ent SYSTEM "lampext.ent"> <!ENTITY % TEI.extensions.dtd SYSTEM "lampext.dtd"> ]>

The Lampeter corpus extensions.ent <!ENTITY % analytic 'IGNORE' > <!ENTITY % biblStruct 'IGNORE' >  <!ENTITY % supplied 'IGNORE' > <!ENTITY % x.phrase "it|ro|sc|su|bo|go|"> <!ENTITY % x.biblPart "printer|pubFormat|bookSeller|"> <!ENTITY % x.demographic "socecstatusPat|biogNote|">

The Lampeter corpus extensions.dtd <!ELEMENT it (%phrase.seq;)> <!ELEMENT printer (%phrase.seq;)> <!ATTLIST it %a.global; > <!– etc.for all other new elements -->

To finish the job • Document your extensions, using the TEI tagset for tagset documentation • Write a manual using the ODD system to generate your DTD fragments http://www.hcu.ox.ac.uk/TEI/Master/Reference

Why bother? • The TEI is a well-known reference point • Using the TEI enables • sharing of data and resources • shared modular software development • lower learning curve and reduced training costs • The TEI is stable, rigorous, and well-documented • The TEI is also flexible, customizable, and extensible in documented ways • The architectural approach offers the best compromise for practical work.

Anyone for pizza?

Anyone for pizza?

Presentation Transcript

Delicious Pizza

Capitalism versus Socialism

Pizza Pizza

PIZZA LOVER BY Michael

Template method

Template method

PIZZA PRODUCT INVESTIGATION.

I Love Pizza!

Where did pizza come from?

1 pizza costs $9.25. 2 pizzas cost $9.25 – $0.50 or $8.75 each.

You ’ re Ready, but is your pizza Hot?

How to make pizza...

Pizza Menu

Pizza 24 Abbotsford

Order online pizza and get free delivery in Abbotsford

Pizza Alternatives Available in a Pizza Outlet

Modern Pizza Toppings Beyond BBQ Chicken Pizza

Pizza Levering Oslo | Pizza Tilbud | Glutenfri | Flamenco Pizza Oslo

Avalanche Pizza - The Best Thin Crust & Gluten Free Pizza Provider in Whistler, BC

Bestille Pizza | Pizza Levering Og Takeaway | Fornebu Pizza Oslo

Pizza Boxes - Your Pizzeria Preface