1 / 15

Text Encoding Issues

Learn about BAWE project, text hierarchy, formulae encoding, and interactive tagging techniques for efficient document structuring and analysis. Explore examples and principles of markup for effective text encoding.

agustine
Download Presentation

Text Encoding Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16th, 2005

  2. Assessed student writing Which theoretical approach has best helped you ‘make sense’ of The Waste Land and why? Case Study of the white-throated capuchin monkey (Cebus capucinus) ‘Would you agree that subordination was inscribed into the life of a domestic servant?’ Explore the significance of the chat show genre as contributor to the project of feminist heterosexual politics Information Systems Development Critical Commentary: p180, from "Le jour je m'égarais..." to "le démon de mon coeur". “The expenditure of National Lottery funds on the arts in Britain cannot be convincingly defended”. Discuss.

  3. Assessed student writing

  4. Text Encoding Issues General issues • A first stage of BAWE mark-up • Dimensions • Interactive tagging Specific questions • Text hierarchy • Formulae

  5. A first stage of BAWE mark-up • shift in document format DOC  XML: TEI standard • formatting: preserve information • automatic vs. manual stepsof annotation

  6. Text hierarchy front, body, back sections paragraphs “s-units” Text flow highlighting lists figures tables formulae block quotes Dimensions of mark-up

  7. Interactive tagging • Tagging by clicking: • graphical interface • quick tagging • reduce errors • impose coherence

  8. Interactive tagging

  9. What goes into <front> vs. <body>? • Example of two first pages:

  10. Encoding of example pages <front> <titlePage> <docTitle> <titlePart type="main">Case Study of the white-throated capuchin monkey (<hi rend="italic">Cebus capucinus</hi>)</titlePart> <titlePart>xxx</titlePart> </docTitle> <figure id="BAWE_3016a-pic1"/> </titlePage> </front> <body> <front> <docTitle> <titlePart type="main" rend="underline">Discuss the handling of the discourses of religion and the effects of religious and ethical change in the Victorian period</titlePart> </docTitle> </front> <body> • Anthropology vs. English Studies assignment

  11. Formulae • equations (and all kinds of variations of =) • chemical formulae • arithmetic expressions • logical expressions • expressions following some other discipline-specific formalism (e.g. computer code, phonetic transcription etc.) • a part ("term") of any of these (if non-NL)

  12. Insert empty <formula> tag • anything that has been inserted with the MS formula editor (appears as a "field"); • any complex formal expression, i.e. that cannot be represented as a simple sequence of characters (e.g. fraction, square root) 0 I(∆s) = Q • any formal expression separated typographically from running text (new paragraph)

  13. Example ... The slope of the yield curve can be analysed by looking at the spread between the long-term and the one-period, short-term interest rate, denoted as Snt = Rnt – rt. If we manipulate equation 1, the yield spread, Snt, can be written as the expectation of a weighted average of future changes in short-term interest rates as follows: Snt = Et Snt* Snt* = (1/n) [(n-1)Δrt+1+ (n-2)Δrt+2+ …+ Δrt+(n-1)] [2] <p><s>...</s> <s>The slope of the yield curve can be analysed by looking at the spread between the long-term and the one-period, short-term interest rate, denoted as S<hi rend="italic"><hi rend="sup">n</hi><hi rend="sub">t</hi></hi> = R<hi rend="italic"><hi rend="sup">n</hi><hi rend="sub">t</hi></hi> – r<hi rend="italic"><hi rend="sub">t</hi></hi>.</s> <s>If we manipulate equation 1, the yield spread, S<hi rend="italic"><hi rend="sup">n</hi><hi rend="sub">t</hi></hi>, can be written as the expectation of a weighted average of future changes in short-term interest rates as follows:</s></p> <p><formula notation="" id="EC0001-form2"/></p>

  14. Principles of mark-up • Keep the structure of the document as close to the original as possible • Mark up elements relevant to our research • Should be cost effective

  15. Text Encoding Issues Signe Oksefjell Ebeling sebeling@brookes.ac.uk Alois Heuboeck a.heuboeck@reading.ac.uk

More Related