190 likes | 366 Views
Building a corpus to investigate the presentation of speech, thought and writing in Spoken British English. Dan McIntyre, John Heywood, Tony McEnery, Elena Semino and Mick Short Department of Linguistics and Modern English Language Lancaster University, UK. Aims of the project.
E N D
Building a corpus to investigatethe presentation of speech,thought and writing in SpokenBritish English Dan McIntyre, John Heywood, Tony McEnery, Elena Semino and Mick Short Department of Linguistics and Modern English Language Lancaster University, UK PALC 2003
Aims of the project • To investigate the forms and functions of speech, thought and writing presentation in spoken data. • To compare the presentation of ST&WP in a corpus of spoken data with the findings from an equivalent corpus of written texts. • To further test the model of speech and thought presentation outlined in Leech and Short (1981).
What is speech, thought and writing presentation? • Prototypically, the presentation in a posterior discourse of what was said, thought or written in a (supposed) anterior discourse. Speaker’s words • Direct speech • [DS] ‘Shut up, you silly old fool,’ [RS] she said. • Indirect speech • [RS] She told him [IS] that he should shut up. • Representation of a speech act • [RSA] She commanded him. Reporter’s words
Selecting the corpus data • 120 transcripts - approximately 260,000 words. • Texts taken from the British National Corpus (BNC) and Centre for North West Regional Studies (CNWRS) oral history archives at Lancaster University. • CNWRS interview tapes digitised to be time-aligned with text by Softsound Ltd, Cambridge, UK. • BNC sound files identified where possible.
The ST&WP categories Main categories
ST&WP category features Category features
ST&WP category features Category features
Annotating the corpus for ST&WP • We use the element <sptag> and mark the ST&WP category within the attribute cat. • Tags designed for concordancing using Wordsmith Tools. • 15 fields to mark ST&WP categories. • x used as a placeholder for empty positions. elementattribute attribute value <sptag cat = “xxxxxxxxxxxxxxx”> fields 1 - 15
Annotating the corpus for ST&WP • We use the element <sptag> and mark the ST&WP category within the attribute cat. • Tags designed for concordancing using Wordsmith Tools. • 15 fields to mark ST&WP categories. • x used as a placeholder for empty positions. element attribute attribute value <sptag cat = “FIW”> fields 1 - 15
Annotating the corpus for ST&WP • We use the element <sptag> and mark the ST&WP category within the attribute cat. • Tags designed for concordancing using Wordsmith Tools. • 15 fields to mark ST&WP categories. • x used as a placeholder for empty positions. element attribute attribute value <sptag cat = “xDSxxxh”> fields 1 - 15
Annotating the corpus for ST&WP • We use the element <sptag> and mark the ST&WP category within the attribute cat. • Tags designed for concordancing using Wordsmith Tools. • 15 fields to mark ST&WP categories. • x used as a placeholder for empty positions. element attribute attribute value <sptag cat = “xRSAxxghxxxxp”> fields 1 - 15
Annotating the corpus for ST&WP • We use the element <sptag> and mark the ST&WP category within the attribute cat. • Tags designed for concordancing using Wordsmith Tools. • 15 fields to mark ST&WP categories. • x used as a placeholder for empty positions. <sptag cat=“xDS”> = <sptag one=“x” two=“D” three=“S”>
A sample extract from a marked-up file <sptag cat="A">Then they went to Hereford and there were Quakers there and </sptag><sptag cat="xRIxxxxxxi">he had a hard time of it</sptag><sptag cat="xRIxxxxxxi">they didn't like Catholics</sptag><sptag cat="A">and I can remember <note desc="S implied">they sent me</note> I was a manageress in the laundry here and <note desc="S implied">they sent me to Kendal</note> when we opened a laundry at Kendal and I was staying at a lodging in Kendal and the man was th they were Quakers and </sptag><sptag cat="xRSxx2">I said to the young lady, I said</sptag><sptag cat="xDS"> Would you mind if you made my dinner on Friday it doesn't matter if it's only bread and butter, but no meat, because we don't eat meat on a Friday or no bacon just bread anything plain it doesn't matter what it is but no meat</sptag><sptag cat="xRS">and the old man says</sptag><sptag cat="xDS">I'm sorry for thee</sptag><sptag cat="xRT">and I thought </sptag><sptag cat="xDT">oh</sptag><sptag cat="A">but he was a Quaker. Anyway</sptag><sptag cat="xRS">she says</sptag><sptag cat="xDS">shut up you silly old fool</sptag>
Preliminary results: comparative numbers and percentages of speech tags in the Spoken and Written Corpora, in relation to total number of discourse presentation tags Spoken Corpus Total tags = 34,927 A = 21,467 RU = 255 RS = 2,774 Ambiguities = 1,149 ST&WP tags = 8,783 Written Corpus Total tags = 16,533 N = 3,601 Ambiguities = 885 ST&WP tags = 8,588
Preliminary results: comparative numbers and percentages of thought tags in the Spoken and Written Corpora, in relation to total number of discourse presentation tags Spoken Corpus Total tags = 34,927 A = 21,467 RU = 255 RT = 1,109 Ambiguities = 488 ST&WP tags = 8,783 Written Corpus Total tags = 16,533 N = 3,601 Ambiguities = 885 ST&WP tags = 8,588
Preliminary results: comparative numbers and percentages of writing tags in the Spoken and Written Corpora, in relation to total number of discourse presentation tags Spoken Corpus Total tags = 34,927 A = 21,467 RU = 255 RW = 145 Ambiguities = 295 ST&WP tags = 8,783 Written Corpus Total tags = 16,533 N = 3,601 Ambiguities = 885 ST&WP tags = 8,588
Where next? • Further refinement of ST&WP annotation. • ST&WP and prosodic discontinuities (e.g. voice quality.) • Combination of quantitative and qualitative analyses. • Comparison of findings from the two corpora.