1 / 18

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY. R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols. Institute of Phonetic Sciences / ACLC University of Amsterdam, Herengracht 338, 1016 CG Amsterdam, The Netherlands tel: +31 20 5252183; fax: +31 20 5252197 email: Rob.van.Son@hum.uva.nl

riona
Download Presentation

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University of Amsterdam, Herengracht 338, 1016 CG Amsterdam, The Netherlandstel: +31 20 5252183; fax: +31 20 5252197 email: Rob.van.Son@hum.uva.nl ICSLP2000, Beijing, China, Oct. 20, 2000

  2. INTRODUCTION • Speech is "efficient":Important components are emphasized Less important ones are de-emphasized • Two mechanisms: 1) Prosody:Lexical Stress and Sentence Accent (Prominence) 2) Predictability:Frequency of Occurrence (tested)and Context (not tested)

  3. MECHANISMS FOR EFFICIENT SPEECH Speech emphasis should mirror importance which largely corresponds to unpredictability • Prosodic structure distributes emphasis according to importance (lexical stress, sentence accent / prominence) • Speakers can (de-)emphasize according to supposed (un)importance • Speech production mechanisms can facilitate redundant speech or hamper unpredictable speech

  4. QUESTIONS • Can the distribution of emphasis or reduction be completely explained from Prosody? (Lexical stress and Sentence Accent / Prominence) • If not, can we identify a speech production mechanism that would assist efficiency in speech? e.g. preprogrammed articulation of redundant and / or high-frequent syllable-like segments?

  5. SPEECH MATERIAL (DUTCH) • Single Male Speaker: Vowels and Consonants Matched Informal and Read speech, 791 matched VCV pairs • Polyphone: Vowels only273 speakers (out of 5000), telephone speech, 1244 read sentences Segmented with a modified HMM recognizer (Xue Wang) • Corpora sizes: Number of realizations of vowels and consonants Unstressed Stressed Total Corpus  Accent  – + – + Single consonants 550 180 569 283 1582 Speaker vowels 812 461 528 224 2025 Polyphone vowels 4435 4942 9603 3516 22496 • Accent: Sentence accent / Prominence • Stressed/Unstressed: Lexical stress

  6. METHODS: SPEECH PREPARATION • Single speaker corpus • All 2 x 791 VCV segments hand-labeled • Also sentence accent determined by hand • 22 Native listeners identified consonants from this corpus • Polyphone corpus • Automatically labeled using a pronunciation lexicon and a modified HMM recognizer • 10 Judges marked prominent words (prominence 1-10) • Word and Syllable -log2(Frequencies) for both corpora were determined from Dutch CELEX

  7. METHODS: ANALYSISSingle Speaker CorpusConsonants and Vowels • Duration in ms (vowels and consonants) • Contrast (vowels only)F1 / F2distance to (300, 1450) Hz in semitones • Spectral Center of Gravity (CoG) (V and C)Weighted mean frequency in semitones at point of maximum energy • Log2(Perplexity)from consonant identification Calculated from confusion matrices

  8. METHODS: ANALYSISPolyphone CorpusVowels only • Loudness in sone • Spectral Center of Gravity (CoG) Weighted mean frequency in semitones averaged over the segment • Prominence (1-10)The number of 'PROMINENT' listener judgements0 – 5 is considered Unaccented6 –10 is considered Accented

  9. Consonants Duration x CoG Duration x Px (n=1582) CoG x Px Vowels Duration x Contr. (n=2025) Duration x CoG Contrast x CoG Polyphone G I Loudness x CoG (n=22496) Filled: p<=0.01 CONSISTENCY OF MEASUREMENTS Correlation coefficients between factors } G Single Speaker E S A 2 C Polyphone Filled symbols: P<=0.01 • Duration in ms • Loudness in sones • CoG: Spectral Center of Gravity (semitones) • Px: log2(Perplexity) plotted is –R • Contrast:F1/ F2distance to (300, 1450) Hz (semitones)

  10. Duration CoG Perplexity Filled: p<=0.01 CONSONANT REDUCTION VERSUS FREQUENCY OF OCCURRENCE (correlation coefficients) Single speaker corpus (n=1582) G E A Filled symbols: P<=0.01 • CoG: Spectral Center of Gravity (semitones) • Perplexity: log2(Perplexity), plotted is –R. • Syllable and word frequencies were correlated (R=0.230, p=0.01)

  11. VOWEL REDUCTION VERSUS FREQUENCY OF OCCURRENCE (correlation coefficients) Single speaker corpus (n=2025) Filled symbols: P<=0.01 • Duration in ms • Contrast: F1/ F2 distance to (300, 1450) Hz (semitones) • CoG: Spectral Center of Gravity (semitones) • Syllable and word frequencies were correlated (R=0.280, p<=0.01)

  12. DISCUSSION OF SINGLE SPEAKER DATA • There are consistent correlations between frequency of occurrence and “acoustic reduction” (duration, CoG and contrast), but not for consonant identification (perplexity) • Correlations for syllable frequencies tend to be larger than those for word frequencies (p0.01) • Correlations were found after accounting for Phoneme identity, Lexical Stress and Sentence Accent

  13. PROMINENCE VERSUS VOWEL REDUCTION AND FREQUENCY OF OCCURRENCE (correlation coefficients) Polyphone corpus (n=22496) G Loudness E CoG C Syllable freq. A Word freq. Filled: p<=0.01 Filled symbols: P<=0.01 • Loudness (sone) • CoG: Spectral Center of Gravity (semitones) • Syllable and word frequencies (-log2(freq))

  14. VOWEL REDUCTION VERSUS FREQUENCY OF OCCURRENCE (correlation coefficients) Polyphone corpus (n=22496) Filled symbols: P<=0.01 Accent: + Prom > 5 – Prom <= 5 • Loudness (sone) • CoG: Spectral Center of Gravity (semitones) • Syllable and word frequencies were correlated (R=0.316, p<=0.01)

  15. DISCUSSION OF POLYPHONE DATA • Perceived prominence correlates with “acoustic vowel reduction” (loudness, CoG) and frequency of occurrence (syllable and word) • There are small but consistent correlations between “acoustic vowel reduction” and frequency of occurrence • Correlations were found after accounting for Vowel identity,Lexical Stress and Prominence

  16. CONCLUSIONS • LEXICAL STRESS and SENTENCE ACCENT / PROMINENCE cannot explain all of the “efficiency” of speech: FREQUENCY OF OCCURRENCE and possibly CONTEXT in general are needed for a full account • A SYLLABARY which speeds up (and reduces) the articulation of “stored”, high-frequency, syllables with respect to “computed”, rare, syllables might explain at least part of our data

  17. SPOKEN LANGUAGE CORPUSHow Efficient is Speech • 8-10 speakers: ~60 minutes of speech each (fixed and variable materials) • Informal story telling and retold stories ~15 min • Reading continuous texts ~15 min • Reading Isolated (Pseudo-) sentences ~20 min • Word lists ~ 5 min • Syllable lists ~ 5 min

  18. MEASURINGSPEECH EFFICIENCY • Speaking Style differences (Informal, Retold, Read, Sentences, Lists) • Predictability • Frequency of Occurrence (words and syllables) • In Context (language models) • Cloze-tests • Shadowing (RT or delay) • Acoustic Reduction • Segment identification • Duration • Spectral reduction

More Related