Investigating speech, thought and writing presentation in a corpus of spoken British English

Investigating speech, thought and writing presentation in a corpus of spoken British English An AHRB funded project under the supervision of Mick Short, Elena Semino and Tony McEnery Research Assistants John Heywood and Dan McIntyre

Project outline • To compare speech, thought and writing presentation in spoken and written English. • To build a new corpus of 260,000 words of spoken British English to compare with the ST&WP Written English Corpus (1995-99). • To investigate the presentation of speech, thought and writing in the ST&WP Spoken Corpus by tagging with the Leech and Short (1981) category set. • To further test and adapt the Leech and Short (1981) model of S&TP. • The project is funded until February 2003.

Construction of the corpus • 120 texts - approximately 260,000 words. • Texts rich in ST&WP taken from the British National Corpus (BNC) and the Centre for North West Regional Studies (CNWRS) oral history archives at Lancaster University. • CNWRS interview tapes digitised to be time-aligned with text.

Number and distribution of NWRS files in the corpus NWRS Archive Family and Social Life Archive Childhood and Schooling Archive Male Female Male Female 1890-1940 1940-1970 1890-1940 1940-1970 7 records 7 records 8 records 8 records 15 records 15 records i.e. 60 files with an equal balance of male and female speakers in each age-range

Number and distribution of BNC files in the corpus BNC spoken data Spoken Demographic Spoken Context- Governed Male Female 0-14 15-24 25-34 35-44 45-59 60+ 0-14 15-24 25-34 35-44 45-59 60+ 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files i.e. 60 files with an equal balance of male and female speakers in each age-range

The development of the tag-set Leech & Short (1981) The ST&WP Written Project (1995…) 3 main genres: Fiction, Biography & Autobiography, and Newspaper Journalism: each divided into Serious/Popular sections. embedded, hypothetical, inferred, quote

The development of the tag-set – new tags The ST&WP Spoken Project (2001) BNC spoken demographic data and NWRS oral history interviews embedded, negative / absence,hypothetical, inferred, quote, reiterated, interrogative, imperative, uncompleted, 2 / 3 / 4

A 15-field tag-set: 5 main categories

A 15-field tag-set: 10 category attributes

Issues arising • Technical issues: • Legibility. • Comparability between NWRS and BNC data. • Tagging issues: • Comparability between written and spoken corpora. • What counts as ST&WP? • Functional and formal criteria. • Embedding. • Repetition (e.g. he said he said well he said). • Report of ‘mention’. • Reading, hearing, listening and singing dogs!

Investigating speech, thought and writing presentation in a corpus of spoken British English

Investigating speech, thought and writing presentation in a corpus of spoken British English

Presentation Transcript

Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners ’ English

Spoken English/ Written English: From Corpus to Curriculum to Classroom

Socrates- freedom of thought and speech!!!!

Building a corpus of pathological speech

Compiling a corpus of transcribed speech

Recent change in the English verb phrase: findings from a spoken corpus

ABI speech corpus

Compiling a Spoken Chinese Corpus of Situated Discourse

English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)

SPOKEN LANGUAGE CORPUS PROJECT

Spoken English

Compiling a corpus of transcribed speech

Spoken English

Spoken English in Madurai

Spoken-English-in-Secunderabad

Spoken English in Coimbatore

Spoken English in Chennai

British English Academy | Spoken English Classes

Spoken English in Chandigarh

Spoken English in Chandigarh

Spoken English Classes in Lucknow | Spoken English Coaching in Lucknow

A Corpus-based Analysis of Errors in Parts of Speech in Chinese Learner English