290 likes | 467 Views
More Text Encoding Initiative (TEI). 6/30 XML + XSLT for Libraries. Today. Basic anatomy of TEI Capturing the structure of source documents Capturing more than the structure Building personographies Using TEICorpus In class continue Assignment 5: Mark up digital texts in TEI.
E N D
More Text Encoding Initiative (TEI) 6/30 XML + XSLT for Libraries
Today Basic anatomy of TEI Capturing the structure of source documents Capturing more than the structure Building personographies Using TEICorpus In class continue Assignment 5: Mark up digital texts in TEI
Basic anatomy of TEI • <TEI> is the root element • <teiHeader> - where the metadata about the digital document you are creating goes • this element is similar to <eadheader> in EAD • <text> - where the transcription of the source document is captured
Required elements of <teiHeader> • <fileDesc> - a wrapper element for capturing these required elements: <titleStmt> - title of your TEI document (not the original document you are transcribing) <publicationStmt> - for publication information about your TEI document <sourceDesc> - for describing the original document you are transcribing
<teiHeader> examples • While there are several required elements inside <teiHeader>, the structure of these elements is pretty flexible • A less structured example that uses <p> tags: http://slis.uiowa.edu/~jlee/239/sampledocs/sampleTEIbook.xml • A more structured example that uses more detailed tags such as <msIdentifier>: http://slis.uiowa.edu/~jlee/239/sampledocs/NoblePostcardsTEI.xml
Determining the level of your markup • We will be transforming our TEI documents to web display as HTML. • The more structure you capture in your transcription, the more flexible your display options will be later.
The <text> element • <text> contains a single text of any kind • You decide the scope of the <text> element • A poem? • A play? • An essay? • A collection of essays?
The <div> element • Within <text>, <div> is used to describe some discrete structure of the source document • You decide what <div> should represent: • One poem? One stanza of a poem? • One book? One chapter?
Sample <div> structure • In this example,<div> represents one chapter: <text> <body> <div> <head type="chapter">Chapter 1</head> <p>In this chapter, we will focus on….</p> </div> <div> <head type="chapter">Chapter 2</head> <p>In chapter one, you learned….</p> </div> </body> </text>
The <group> element • For more complex source documents, use <group> tags to capture a series of <text> elements • For example, encoding a book of poems and using <text> for each poem and <div> to capture stanzas • <text> <front> <!-- biographical notice by editor --> </front> <group> <text> <!-- first poem --> </text> <text> <!-- second poem --> </text> </group></text>
The <ab> element • The anonymous block element, <ab>, is used to encode a discrete chunk of text • It is generally used to describe paragraph-like elements, like <p> tags in HTML
Encoding line breaks • To retain original breaks in texts: • encode them with line break <lb/> elements within anonymous block <ab> elements <ab>Line one of text <lb/> Line two of text</ab> • encode them with separate <ab> elements <ab>This is the first paragraph…</ab> <ab>This is the second paragraph…</ab>
Capturing images • To include an image of the source document, use the <facsimile> element before <text> element: <facsimile> <graphic url="http://digital.lib.uiowa.edu/u?/noble,1184"/> </facsimile> *The URL points to a publicly accessible image file
Identifying names Use <name>, <orgName>, or <persName> element anywhere within the transcription <div> <p>As I haven't time to write a letter I will just drop you a postal. How is <persName>Hattie</persName>? I have got a cold but that's all. this postal is kinda dirty but I got cause it is just what we will do isn't it. Just wait we'll let them know you're not dead. ha ha</p> <signed>bye. <persName>Golda</persName></signed> </div>
Identifying places • <placeName> for geo-political place names • <placeName>Rochester, NY</placeName> • <placeName><settlement type="city">Rochester</settlement>,<region type="state">New York</region></placeName> • <geoName> for places named in terms of geographic features such as mountains, lakes, or rivers, independently of geo-political units • <geogName type="river">Mississippi River</geogName>
Identifying dates • <date> contains a date in any format • <time> contains a phrase defining a time of day in any format. • the attribute @when normalizes the date or time in a standard form, e.g. yyyy-mm-dd. • <date when="1945-10-24">24 Oct 45</date> • <date when="1996-09-24T07:25:00Z">September 24th, 1996 at 3:25 in the morning</date> • <time when="1999-01-04T20:42:00-05:00">Jan 4 1999 at 8 pm</time>
Other elements can record date + time information • Normalized dates and times can be expressed for other elements through attributes • A complete table of “date-able” elements: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.datable.html • For example: <birth when=“1981-01-23”>January 23, 1981</birth>
Expressing date spans and ambiguous dates • @notBefore specifies the earliest possible date for the event • @notAfter specifies the latest possible date for the event • @from indicates the starting point of the period • @to indicates the ending point of the period <residence notBefore-iso="1907-09-09" notAfter-iso="1910-09-06"></residence>
Elements applicable to correspondence • <opener> groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter. • <closer> groups together salutations, datelines, and similar phrases appearing as a final group at the end of a division, especially of a letter. • <dateline> contains a brief description of the place, date, time, etc. of production of a letter, newspaper story, or other work, prefixed or suffixed to it as a kind of heading or trailer. • <salute> contains the salutation in the opening/closing of a letter, preface, etc. • <signed> contains the closing signature
Sample use of <opener> and <closer> • <div type="letter" n="14"> <head>Letter XIV: Miss Clarissa Harlowe to Miss Howe</head><opener><dateline>Thursday evening, March 2.</dateline></opener> <p>On Hannah's depositing my long letter ...</p> <p>An interruption obliges me to conclude myself in some hurry, as well as fright, what I must ever be,</p><closer><salute>Yours more than my own,</salute><signed>Clarissa Harlowe</signed></closer></div> • (Taken from http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DS.html#DSOC)
Building a personography • A personography is a list of normalized biographical data about persons tagged in your TEI document • It can be referenced in multiple TEI documents • It can be used to enhance search + browse tools
The <listperson> element • Personographies are contained within <sourceDesc> in the header • @xml:id is used to uniquely identify a person <listPerson> <person> <persName xml:id="HJ"><forename>Hattie</forename> <surname>Jacobs</surname></persName><sex>female</sex><residence notBefore-iso="1907-09-09" notAfter-iso="1910-09-06"></residence></person> </listPerson>
Referencing personography data in the transcription • Use @ref to refer to the @xml:id you assigned to that person <address> <addrLine> Miss <persName ref="#HJ">Hattie Jacobs</persName> </addrLine> <settlement>Madrid</settlement> <region>Iowa</region></address>
Other global lists • Similarly, you can use @xml:id create a global list of other elements • <listPlace> • <listOrg> • <listBibl> • <listEvent>
Using <teiCorpus> • <teiCorpus> can be used as a wrapper root element for multiple <TEI> documents • <teiCorpus> has its own global header for capturing metadata about all of the <TEI> documents it contains • Example – postcards: http://slis.uiowa.edu/~jlee/239/sampledocs/NoblePostcardsTEI.xml
In class • Continue Assignment 5: Mark up digital texts in TEI • If you have finished encoding the basic structure in your TEI documents: • try enhancing your markup with name, date, and place information • try nesting your TEI documents within one <teiCorpus> document • try building a personography