100 likes | 214 Views
Investigating speech, thought and writing presentation in a corpus of spoken British English. An AHRB funded project under the supervision of Mick Short, Elena Semino and Tony McEnery. Research Assistants John Heywood and Dan McIntyre. Project outline.
E N D
Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short, Elena Semino and Tony McEnery Research Assistants John Heywood and Dan McIntyre
Project outline • To compare speech, thought and writing presentation in spoken and written English. • To build a new corpus of 260,000 words of spoken British English to compare with the ST&WP Written English Corpus (1995-99). • To investigate the presentation of speech, thought and writing in the ST&WP Spoken Corpus by tagging with the Leech and Short (1981) category set. • To further test and adapt the Leech and Short (1981) model of S&TP. • The project is funded until February 2003.
Construction of the corpus • 120 texts - approximately 260,000 words. • Texts rich in ST&WP taken from the British National Corpus (BNC) and the Centre for North West Regional Studies (CNWRS) oral history archives at Lancaster University. • CNWRS interview tapes digitised to be time-aligned with text.
Number and distribution of NWRS files in the corpus NWRS Archive Family and Social Life Archive Childhood and Schooling Archive Male Female Male Female 1890-1940 1940-1970 1890-1940 1940-1970 7 records 7 records 8 records 8 records 15 records 15 records i.e. 60 files with an equal balance of male and female speakers in each age-range
Number and distribution of BNC files in the corpus BNC spoken data Spoken Demographic Spoken Context- Governed Male Female 0-14 15-24 25-34 35-44 45-59 60+ 0-14 15-24 25-34 35-44 45-59 60+ 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files i.e. 60 files with an equal balance of male and female speakers in each age-range
The development of the tag-set Leech & Short (1981) The ST&WP Written Project (1995…) 3 main genres: Fiction, Biography & Autobiography, and Newspaper Journalism: each divided into Serious/Popular sections. embedded, hypothetical, inferred, quote
The development of the tag-set – new tags The ST&WP Spoken Project (2001) BNC spoken demographic data and NWRS oral history interviews embedded, negative / absence,hypothetical, inferred, quote, reiterated, interrogative, imperative, uncompleted, 2 / 3 / 4
Issues arising • Technical issues: • Legibility. • Comparability between NWRS and BNC data. • Tagging issues: • Comparability between written and spoken corpora. • What counts as ST&WP? • Functional and formal criteria. • Embedding. • Repetition (e.g. he said he said well he said). • Report of ‘mention’. • Reading, hearing, listening and singing dogs!