460 likes | 587 Views
Beyond TEI-Lite. An overview of various TEI pizze. The TEI base modules. Prose Verse Drama Transcribed speech Print dictionaries Terminological databases. The Verse base module. Adds numbered <lg1>, <lg2> etc. to the <lg> in the core (by analogy with <div> and <div1>)
Beyond TEI-Lite... An overview of various TEI pizze
The TEI base modules • Prose • Verse • Drama • Transcribed speech • Print dictionaries • Terminological databases
The Verse base module • Adds numbered <lg1>, <lg2> etc. to the <lg> in the core (by analogy with <div> and <div1>) • Additional attributes met, real, and rhyme for metrical and rhyme analysis • Metrical notations may be defined by <metDecl> element in the Header
Line groups <lg type="stanza"><lg type="sestet"> <l>In the first year of Freedom's second dawn</l> <l>Died George the Third; although no tyrant, one</l> <l>Who shielded tyrants, till each sense withdrawn</l> <l>Left him nor mental nor external sun:</l> <l>A better farmer ne'er brushed dew from lawn,</l> <l>A worse king never left a realm undone!</l></lg> <lg type="couplet"> <l>He died — but left his subjects still behind, </l> <l>One half as mad — and t'other no less blind. </l> </lg></lg> <lg1 type="stanza"><lg2 type="sestet"> <l>In the first year of Freedom's second dawn</l> <l>Died George the Third; although no tyrant, one</l> <l>Who shielded tyrants, till each sense withdrawn</l> <l>Left him nor mental nor external sun:</l> <l>A better farmer ne'er brushed dew from lawn,</l> <l>A worse king never left a realm undone!</l></lg2> <lg2 type="couplet"> <l>He died — but left his subjects still behind, </l> <l>One half as mad — and t'other no less blind. </l> </lg2></lg1>
Metrical analysis <lg1 met="-+-+-+/"> <l real="-+-+-++"> This morn, thy gallant bark, love,</l> <l>Sail'd on the sunny sea;</l> <!-- … --> </lg>
Rhyme <lg rhyme='AB-BBA'> <l>The sunlight on the garden</l> <l>Hardens and grows cold, </l> <l>We cannot cage the minute</l> <l>Within its nets of gold</l> <l>When all is told</l> <l>We cannot beg for pardon. </l> </lg> <lg rhyme='AB-BBA'> <l>The sunlight on the <seg id="A1">garden</seg></l> <l><seg id="A2">Harden</seg>s and grows <seg id=B1>cold, </seg></l> <l>We cannot cage the <seg id="C1">minute</seg></l> <l>Wi<seg id="C2">thin it</seg>s nets of <seg id="B2">gold </seg></l> <l>When all is <seg id="B3">told</seg></l> <l>We cannot beg for <seg id="A3">pardon</seg>.</l> </lg> <linkGrp type=rhyme> <link targets='A1 A2 A3'> <link targets='B1 B2 B3'> <link targets='C1 C2'> </linkGrp>
The Drama base module adds... • Specialised elements to front matter: • <set>, <prologue>, <epilogue>, <performance> • <castList>, <castItem>, <role>, <roleDesc>,<actor> • Stage business: <move> • Specialised elements for film or tv scripts • <view>, <camera>, <caption>, <sound>
Cast lists <castList><head>ACTEURS.</head> <castItem><role id="JUP">JUPITER</role> <roleDesc> Arlequin.</roleDesc></castItem> <castItem><role id="MER">MERCURE</role> <roleDesc> Scaramouche.</roleDesc></castItem> <castItem><role>ISABELLE.</role></castItem> <castItem><role>PIERROT.</castItem> <castItem><roleDesc>OMBRES qui sortent des Enfers.</roleDesc></castItem> <castItem><roleDesc>QUATRE FURIES.</roleDesc></castItem> <castItem><role>PLUTON.</role></castItem> <castItem><roleDesc>Plusieurs Danseurs & Danseuses.</roleDesc></castItem></castList>
Speeches and stage directions <sp who="JUP"><speaker>JUPITER</speaker> <l>Si ma Maîtresse est infidelle,</l> <l>Je veux en être convaincu,</l> <l>Mercure, ce soir avec elle</l> <l>Tâche de me faire cocu.</l></sp> <stage>Mercure fait plusieurs lazzy, & lui fait entendre que sa Maîtresse est dans l'empire de Pluton. ... Quatre Furies sortent aussi des Enfers, qui dansent. Mercure dit à Pluton.</stage> <sp who="MER"><l>Pluton, faites-nous donc paroître<BR> <l>Les habitans de ce séjour:</l> <l> Afin de les mieux reconnoître, </l> <l> Que chacun passe tour à tour. </l></sp>
Scripts, captions, FX... <camera>Zoom in to overlay showing some stock film of hansom cabs galloping past.</camera> <caption>London, 1895.</caption> <caption>The residence of Mr Oscar Wilde.</caption> <sound>Suitably classy music starts.</sound> <view>Mix through to Wilde's drawing room. A crowd of suitably dressed folk are engaged in typically brilliant conversation,laughing affectedly and drinking champagne.</view> <sp who="TJ"><speaker>Prince of Wales</speaker> <p>My congratulations, Wilde. Your latest play is a great success.</p></sp>
Transcribing speech • normalization issues • ease of reading vs accuracy • interpretation vs prosody • analagous to problems of handling digitized images
The Spoken base module • components : <u> <event> <kinesic> <vocal> <pause> <shift> • contextual information in header <settingDesc> <particDesc> • facilities for synchronization and timing
Utterances • Basic unit of discourse, corresponding to speaker turns • Optionally grouped into higher-level divisions (<div>s), e.g. to mark discourse function • Linked by who attribute to <person>description in header
Vocals and events • Empty elements are used to mark paralinguistic phenomena <u who="Jan">This is just delicious</u> <event desc='telephone rings'> <u who="Kim">I'll get it</u> <u who="Tom">I used to <vocal desc="cough"/> smoke a lot</u> <u who="Bob"><vocal desc="sniff"/>He thinks he's tough</u> <vocal who="Ann" desc="snorts"/>
Voice quality and prosody • The <shift> element is used to mark changes in voice quality • Other prosodic features may be marked using specific kinds of <seg>or entity refs <u who="LB"> <shift feature="loud" new="f"/>Elizabeth</u> <u who="EB">Yes</u> <u who="LB"><shift/>Come and try this <pause/> <shift feature="loud" new="ff"/>come on</u>
Another example <u who="MAR">you never <pause/> take this cat for show and tell <pause dur='5'> meow meow</u> <u who="ROS">yeah well I dont want to</u> <event desc='toy cat has bell in tail which continues to make a tinkling sound'> <vocal who="MAR" desc='meows'> <u who="ROS">because it is so old</u> <u who="MAR">how <reg orig="bout">about</reg> yourcat <pause/> yours is new <kinesic desc='shows Father the cat'></u> <u who="FAT" trans="pause">that<pause/> darling</u> <u who="MAR"><s>no mine isnt old</s> <s>mine is just um a little dirty</s></u>
Participant Description <person id="P1" sex="F" age='mid'> <birth date='1950-01-12'> <date>12 Jan1950</date> <name type="place">Shropshire, UK</name> </birth> <firstLang>English</firstLang> <langKnown>French</langKnown> <residence>Long term resident of Hull</residence> <education>University postgraduate</education> <occupation>Unknown</occupation> <socecstatus source="PEP" code="B2"/> </person> <person id="P1" sex=F age='mid'> <p>Female informant, well-educated, born in Shropshire UK, 12 Jan 1950, of unknown occupation. Speaks French fluently. Socio-Economic status B2 in the PEP classification scheme. </person>
Setting Description <settingDesc> <setting who="P1 P2"><name type="city">Bedford</> <name type="region">UK: South East</name> <date value="1989">early spring, 1989</> <locale>rug of a suburban home</locale><activity>playing</activity></setting> <setting who="P3"><name type="city">Bedford</name><name type="region">UK: South East</name><date value="1989">early spring, 1989</date><locale>at the sink</locale> <activity>washing-up</activity></setting> <setting who="P4"><name type="place">London, UK</name> <time>unknown</time><locale>broadcasting studio</locale> <activity>radio performance</activity> </setting></settingDesc> • eg from P2
Timing • Pausing • use<pause> element • Duration • use dur attribute • Overlap • use trans attribute
Overlap Have you heard the the election results? its a disaster its a miracle <u id="A1" who="A">Have you heard the</u> <u id="B1" who="B" trans="latching">the election results? </u> <u id="A2" who="A" trans="pause">its a disaster</u> <u id="B2" who="B" trans="overlap">its a miracle </u>
The Dictionary base tagset • primarily for printed dictionaries, rather than lexica or dictionary production systems • <entry>, <entryFree>, and <superEntry> • <sense> and <hom> • logical structure vs. typographic fidelity
Constituents of a Dictionary Entry • the form group • the grammatical-information group • the definition or translation • etymology • examples • usage information • cross-references to other entries • notes and related entries
Dictionary components (1) • <form> grouping element for one or more of <orth> <pron> <hyph> <syll> <stress> etc. • <gramGrp> groups specialised grammatical tags <gen>, <number> etc • <def> for definition text, <trans> for translation • <etym> for etymology
Dictionary components (2) • examples <eg> • usage note <usg> • label <lbl> • related entries <re> and specialized pointers <oRef>, <pRef> etc
Simple example <entry><form><orth>OATS,</orth> <gram>n. s.</gram> <etym>[aten, Sax.]</etym> <def>A grain, which in England is generally given to horses; but in Scotland supports the people.</def> </form></entry>
The additional modules • Linking segmentation and alignment • Simple analytic mechanisms • Feature structures • Certainty and responsibility • Transcription of primary sources • Text-critical analysis • Names and dates • Graphs networks and trees • Tables, formulae and graphics • Language corpora
Linking, segmentation, alignment • Provides generic segmentation elements • Provides extended pointer syntax and linking • <xptr>, <xref>, <link>, <linkGrp> etc. • Extensive set of attributes for linkage, correspondence,synchronization, aggregation, alternation, and interpretation
Generic segmentation elements • <seg> for arbitrary (nesting) segmentation • <s> for end-to-end segmentation • use type attribute to subcategorise • <anchor> for points • Segmentation is the key to successful linking and analysis
Clustering • (Difficulty (is being expressed) • with ((the method) (to be used))) <s>Difficulty <seg>is being expressed</seg> with <seg><seg>the method</seg> <seg>to be used</seg></seg></s>
discontinuous segments • fundamental problem • join by internal or external links “You put it,” Quill reminded him, “in the safe.” <s id="s1">"You put it,"</s> <s id="s2">Quill reminded him,</s> <s id="s3">"in the safe."</s>
discontinuous segments “You put it,” Quill reminded him, “in the safe.” • can also use PART attribute to indicate that segments are incomplete <s id="s1" next="s3">"You put it,"</s> <s id="s2">Quill reminded him,</s> <s id="s3" prev="s1">"in the safe."</s>
discontinuous segments “You put it,” Quill reminded him, “in the safe.” <s id="s1">”You put it,”</s> <s id="s2">Quill reminded him,</s> <s id="s3">“in the safe.”</s> <join targets="s1 s3" result="s"/>
Translation pairs <s id="s1" corresp="s2" lang="EN"> For a long time I used to go to bed early</s> <s id="s2" corresp="s1" lang="FR"> Longtemps je me couchais de bonne heure</s> • <correspGrp type="trans"> • <link targets="s1 s2"/> • </correspGrp> and/or....
Synchronization • of whole elements • of points in time <u id="A2" who="A" synch="u2"> its a disaster</u> <u id="B2" who="B">its a miracle</u> <u id="A1" who="A">Have you heard <anchor id="AO"/>the</u> <u id="B1" who="B" start="A01"> <anchor id="BO1"/>theelection results? yes</u>
Analytic mechanisms • Specific kinds of segment for linguistic analyses • Specialized interpretive pointers (<span> and <spanGrp>) • The ana attribute and its possible targets • <interp> and <interpGrp> • feature systems <fs> and <fsd>
Arbitrary characterizations • The <span> element can be used to point into a stretch of a text and characterize it in any way • Targets must be SGML identifiers <spanGrp resp=LB type="thematic" > <span value="ships" from="P1" to="P2"> <span value="shoes" from="P4" to="P8"> <span value="sealing wax" from="P12" to="P14"> </spanGrp>
More detailed analysis • the ana attribute is of type IDREFS • what does VVD identify? • a prose description • an <interp> element • a feature structure <w ana="VVD">annotated</w>
using interp... <w ana="VVD">annotated</w> <w ana="NN2">corpora</w> <interp id="VVD" type="lexical class" value="verb past tense"/> <interp id="NN2" type="lexical class" value="noun plural"/>
Hierarchic bundling of interps • nouns can be common or proper • nouns can be singular or plural <interpGrp value="nomimal"> <interpGrp value="common"> <interp value="singular"/> <interp value="plural"/> </interpGrp> </interpGrp>
Feature structures • a feature structure consists of a bundle of features • a feature has a name and a value • values may be binary switches, symbols, strings, or feature structures • bundling may constrained in various (not necessarily hierarchic) ways
Using a feature structure... <w ana="NN2">corpora</w> <fs id="NN2"> <f name="class"><sym value="noun"></f> <f name="number"><sym value="plural"></f> <f name="proper"><minus/></f> </fs> <fs id="NN1"> <f name="class"><sym value="noun"></f> <f name="number"><sym value="singular"/></f> <f name=proper><plus/></f> </fs>
...feature definitions may be stored as a feature library... <fLib> <f id="FCN" name="class"> <sym value="noun"></f> <f id="FN1" name="number"> <sym value="singular"></f> <f id="FN2"name="number"> <sym value="plural"></f> <f id="FPM" name="proper"> <minus/></f> ... </fLib>
...and invoked by reference <fLib> <f id=FCN name=class> <sym value=noun> <f id=FN1 name=number> <sym value=singular> <f id=FN2 name=number> <sym value=plural> <f id=FPM name=proper> <minus> ... </fLib> <fs id="NN1" feats="FCN FPM FN1"/> <fs id="NN2" feats="FCN FPM FN2"/>
Not covered here • Certainty and responsibility • Names and dates • Graphs, networks and trees • Tables, formulae and graphs • Language corpora
Summary Scholars want a lot! • orthographic transcription • all languages of all types of all times • pointer(s) to digital recording or images • markup of proper nouns, dates, times, etc. • part-of-speech and morphological tagging • syntactic, semantic, stylistic or other analyses • cross references to other material on the topic • editorial commentary and annotation • etc., etc., etc. The TEI scheme is designed to facilitate these and more: but to get the best out of it, you have to know what is there...