A Corpus Study of Native and Non-native Accented Speech

A Corpus Study of Native and Non-native Accented Speech Chen-huei Wu Department of Linguistics, UIUC

Outline of the presentation • Rationale and research questions • Literature review • Corpus and methodology • Future analysis

Where does the impression of accent come from? • On the production side, second language (L2) learners usually have some sort of difficulty in acquiring native-like speech performance. • On the perception side, it is interesting that human can recognize accents, and further recognize accents close to their native languages.

Where does the impression of accent come from? • What is the speech patterns of accented speech? • What influences the perception of accents? • What matters more: mispronunciation ofvowels, consonants, grammar errors, word choice or speech rhythm?

This study • Vowel production in Mandarin as an L2 • Spontaneous speech • Corpus study • Perception of accentedness

Research Questions • What are the speech patterns of vowel production by L1 and L2 speakers in conversational speech? • What are the similarities and differences of vowel production between L1 and L2 speakers? • If the vowel production of L2 learners is off-target, does it affect all the vowels, or some of the vowels? • How do native listeners perceive L2 accents? • Do native listeners tolerate mispronunciation in some areas but not others?

Goals • To find the hidden reasons behind the impression of accent • To identify critical acoustic variables that affect native listeners’ perceptions of accented speech

Literature Review • Segmental level: phone acquisition (Anderson-Hsieh et. al., 1992; Koster and Koet, 1993; Flege 1995) • Suprasegmental level: stress timing, peak alignment, speech rate, pause frequency and pause duration (Munro, 1995; Trofovimich and Baker, 2006)

This Study • Acoustic analysis and perceptual rating asking native speakers of Chinese to listen to speech samples and to judge the level of accentedness in the speech they hear. • Three speech communities: nativespeakers of Chinese, heritage speakers and Chinese learners

Chinese learner's spontaneous speech corpus • More than 150 hours of speech • More than 100 speakers • 63 hours of turn-marking • 48 hours of transcriptions • 700 disfluency labels (4-hour speech) • 5200 phone labels in (20 minutes)

Chinese learner's spontaneous speech corpus • Two type of classroom activities (Shih, 2006) • Variety show • Debate • Speech styles • Casual speech • Prepared speech

Speech Material: the variety showfrom fall 2005-Spring 2008

Procedure Classroom recording (video + audio) Annotation of Speaker Turn Transcription Random Selection Extracted Audio files Automated Phone Segmentation Manual Checking Data Analysis Perceptual Rating Data Analysis Data Analysis

Annotation of speaker turn • The criteria for speaker turn-marking • No long overlap speech • No long silent pause • No continuous laughter or clapping • But, there might be still some noise in turned-marked utterances

Random Selection • The utterances for each speaker will be filtered according to the following criteria: • At most 30 seconds long • At least 15 seconds long • Then, randomly selected from the filtered utterances • Therefore, the one-minute speech data for each speaker will be composed by 2-4 utterances.

Automated Phone Segmentation • Jiahong Yuan's aligner • Yuan and Liberman (2008) from The Penn Phonetics Lab Forced Aligner (http://wms-609.sas.upenn.edu/research/alignment/align.htm) • The toolkit includes • models: the acoustic models, parameter files, and CMU pronunciation dictionary • align.py: a python script that automates the procedure of doing forced alignment

Automated Phone Segmentation • Yuan’s aligner modified by Shih • In addition to the English dictionary, two Chinese dictionaries were added: • Master.big5.darpa Chinese-character Pinyin Pronunciation e.g 包 BAO B AW1 • Pinyindict Pinyin Pronunciation s.g BAO B AW1

Test on laboratory speech

Perceptual rating on accentedness • Untrained and linguistic naïve raters

Perceptual rating on accentedness Questions 1. Is the speaker a native speaker or not? Native Non-native 2. How accented is the speech? 1 2 3 4 (4: accented; 1: no accent) 3. How fluent is the speech? 1 2 3 4 (4: fluent; 1: not fluent) 4. How comprehensible is the speech? 1 2 3 4 (4: comprehensive; 1 not comprehensive)

Perceptual rating on accentedness

Contribution • To provide the missing link between quantitative acoustic analysis andthe impression of accent.

Thank you!

A Corpus Study of Native and Non-native Accented Speech

A Corpus Study of Native and Non-native Accented Speech

Presentation Transcript

Emotion and Lying in a Non-native Language

Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection

Fingerprinting Native and Non-native Biodiversity, The Theory of Biotic Acceptance, and, the story of a challenging

Native

Non-Native Android Development

Keystone and Non-Native Species

Non-native species

Native A Native Americans

Non-Native Invasive Plant Removal

Non-native-speaker’s map

Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

Three conditions for Verb-Subject order in non-native English: A corpus-based study

Native and Non-native Wetland Plants Found in Utah

REMOVAL REVERSED : Native/non-Native joint management of reclaimed lands

Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition

The partner effect in non-native speech

Native vs. Non-native ‘th‘

Non - Native Species

Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection

Emotion and Lying in a Non-native Language

NON-NATIVE ENGLISH SPEAKERS