250 likes | 450 Views
Oxford Text Archive and good practice in the creation of electronic resources http://ota.ahds.ac.uk. Martin Wynne martin.wynne@ota.ahds.ac.uk with a lot of help from Ylva Berglund ylva.berglund@oucs.ox.ac.uk. Case study: Terms of Address in Ben Jonson’s plays. Database of all address terms
E N D
Oxford Text Archiveand good practice in the creation of electronic resourceshttp://ota.ahds.ac.uk Martin Wynne martin.wynne@ota.ahds.ac.uk with a lot of help from Ylva Berglund ylva.berglund@oucs.ox.ac.uk
Case study: Terms of Address in Ben Jonson’s plays • Database of all address terms • Instances coded for various parameters • addresser/addressee, type of address, reference, etc. • Stored on computer + back-ups on floppy disks • The creator is happy to share the resource…
Problem 1: Hardware/media • Changes in hardware → data on old computer and floppies now largely inaccessible • Storage media vulnerable
Problem 2: Software • Specialist software → data tied into application • Proprietary software → ongoing support for application reliant on commercial interests • Program versions: compatibility, different platforms → not compatible
Problem 3: Coding/mark-up • Personal scheme • not comparable to other resources • doesn’t work with generic software • Documentation • resource soon becomes unusable if documentation is inadequate or impossible to find
Problem 4: Sharing • Dissemination • Awareness • Distribution • Sustainability
Solution 1: Hardware/media • Migrate data from old hardware • Don’t keep on only one machine • Don’t store on vulnerable media
Solution 2: Software • Avoid specialised software unless possible to migrate data • Consider open source
Solution 3: Coding/mark-up • Use standards • Document!
Example: html This is bold This is <b>bold</b> This is a noun This is a <b>noun</b>
Example: COCOA <Q FRENCH> <A SARTRE> <T NAUSEE> <I 9> <L 1> %"$C' EST UN GAR*CON SANS IMPORTANCE COLLECTIVE,% %C' EST TOUT JUSTE UN INDIVIDU."% \L#-\F# \CE[LINE. %\L' E[GLISE\.% <F 00> <P 11> <L 1> %%AVERTISSEMENT DES E[DITEURS%% %$CES CAHIERS ONT E[TE[ TROUVE[S PARMI LES PAPIERS D' \ANTOINE R\OQUENTIN.\% %$NOUS LES PUBLIONS SANS Y RIEN CHANGER.% …
Example: XML <s n="97"> <w type="AJ0">Normal</w> <w type="NN1">economy</w> <w type="NN1">return</w> <w type="VBZ">is</w> <w type="NN0">£262</w> <c type="PUN">.</c> </s>
Text Encoding Initiative (TEI) • Guidelines for the encoding of electronic texts using XML • For interchange • Guidelines freely available • http://www.tei-c.org/
Encoding • Choose language/format • e.g. XML • Choose coding scheme • e.g. TEI
Markup and encoding options • Word processing files • PDF • Database • HTML • SGML • XML • Plain text • Unicode
Advantages of XML • Standard not only in scholarly text encoding, but publishing, web, etc. • Growing number of tools • Community of users, support • Re-usable skills, useful to learn • XML resources good for repurposing • XML resources good for preservation • Disadvantages? Cost, complexity
Advantages of TEI • Standard in scholarly text encoding, • Community of users, support available • Extensible • TEI resources good for interchange • Disadvantages: • Cost, complexity • Compromises to text integrity • Overlapping hierarchies
Disadvantages of TEI • Cost, complexity • Compromises to text integrity • Overlapping hierarchies…
<p> <sp cat=”NRS”> One officer said:</sptag> <sp cat=”DS”> 'This is like an episode from Inspector Morse.</p> <p> "The victim was single but we believe he had several lady friends.</p> <p> "It is possible that it was something in the background of one of those relationships that caused his death. .</p> <p> "We don't think he was linked with any criminals or involved in any secret wrong doing." .</p></sp> <p> <sp cat=”N”> Police have not ruled out the possibility of a contract killing by a hitman.</sp></p>
Solution 4: Sharing • Inform user/subject communities • Metadata • Consider using archives
Oxford Text Archive • Founded in 1976 • Collect, catalogue, preserve, and distribute high-quality electronic resources • Advise creators and users • Part of the AHDS
AHDS http://www.ahds.ac.uk • Archaeology • History • Literature, Languages and Linguistics • Performing Arts • Visual Arts + • Executive
The OTA today • About 2000 resources, in 25 languages • Mostly primary texts, ‘classics’ • Language corpora • Increasingly, more complex resources, with more intellectual content • http://www.ota.ahds.ac.uk/
The OTA in the future • New webpage • More information • New ways of accessing resources • Workshops and training events • Digitisation workshop • For new projects • Specialised workshops • New resources
Oxford Text Archiveand good practice in the creation of electronic resourceshttp://ota.ahds.ac.uk Martin Wynne martin.wynne@ota.ahds.ac.uk with a lot of help from Ylva Berglund ylva.berglund@oucs.ox.ac.uk