1 / 26

Speech acoustics and phonetics

Speech acoustics and phonetics. Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC). NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 1, 2002. Overview. Dynamics in speech acoustics

galena
Download Presentation

Speech acoustics and phonetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech acoustics and phonetics Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 1, 2002

  2. Overview • Dynamics in speech acoustics • Contour modeling (mainly formants) • Aspects of spectral undershoot • Modeling V and C reduction • Phonetic knowledge from speech corpora • IFA, CGN, TIMIT, found speech • Conclusions Speech acoustics and phonetics, Il Ciocco

  3. Dynamics in speech acoustics • Dynamics is the norm, not stationarity • articulatory efficiency • Dynamics is everywhere • generally no word boundaries in speech • deletion of words, syllables, phonemes; insertion • within/between word coarticulation/assimilation • vowel and consonant reduction • Acoustic manifestations • segment duration, F0, loudness, spectral quality Speech acoustics and phonetics, Il Ciocco

  4. Dynamics is the norm • The speaker speaks as sloppily as the listeners allow him to do in communication • communicative efficiency • Articulatory vs. perceptual efficiency • do spectral transitions facilitate or hamper perception? —> see other presentation • Speaker flexibility; speaking style (clear vs. sloppy); speaking rate Speech acoustics and phonetics, Il Ciocco

  5. Dynamics is everywhere • Deletion • ‘bread and butter’ /brEmbY3/ • ‘Amsterdam’ (Du) /Amst@rdAm/ —>/Ams@dAm/ • ‘koninklijke’ (Du) /konIŋkl@k@/ —>/kol@k@/ • Insertion • homorganic glide insertion: ‘die een’ (Du) /dij@n/ • Degemination • ‘is zichtbaar’ (Du) /Is zIxtbar/ —>/IsIxbar/ • Reduction, coarticulation, assimilation Speech acoustics and phonetics, Il Ciocco

  6. Acoustic manifestations • pitch, loudness, formant, component contours • contour stylization (e.g., pitch in praat) • contour modeling • n-th degree curve fitting (D.van Bergem) • Legendre polynomials ) (R.van Son) • 16 points per segment ) • (phoneme) segmentation • by hand (time consuming; non-consistent) • automatically (via forced phoneme recognition and a pronunciation lexicon with alternatives; systematic errors) Speech acoustics and phonetics, Il Ciocco

  7. Contour modeling • allows modeling of specific phenomena • pitch accentuation (vs. vowel onset) • reduction, centralization, undershoot • allows generation of stimuli for perc. expts. • phoneme identification in extending context • 2-alternatives forced choice identif. of continua • discrimination, RT • allows statistics on large speech corpora • TIMIT, CGN, IFA-corpus, Switchboard Speech acoustics and phonetics, Il Ciocco

  8. Static vs. dynamic V recogn. • see Weenink (2001) • “Vowel normalizations with the TIMIT acoustic phonetic speech corpus”, IFA Proc. 24, 117-123 • 438 males, both train & test sent. of TIMIT • 35,385 vowel segments, hand segmented • 13 monophthongeal vowel categories • 1-Bark bandfilter anal. (18), intensity. normal. • 3 frames per segment: central and 25 ms L/R Speech acoustics and phonetics, Il Ciocco

  9. Some results • Vowel classif. (%) with discriminant functions Speech acoustics and phonetics, Il Ciocco

  10. Formant tracks / speaking rate • Ph.D. thesis Rob van Son (1993) • “Spectro-temporal features of vowel segments” • see also Speech Comm. 13, 135-148 (Pols & vSon) • 850-words text, read at normal and fast rate • hand segmentation of 7 most freq. V + schwa • formant tracks • via 16 points per segm. or 5 Legendre polynomials • influence of rate, V-dur., context, sent. acc. • evidence for duration-controlled undershoot? Speech acoustics and phonetics, Il Ciocco

  11. Some results • no differences for F1/F2 in vowel center for normal- or fast-rate speech; only some over- all rise in F1 for fast rate (irrespective of V) • same formant track shape (normalized to 16 points) for normal- or fast-rate speech • same results when using the more elaborate Legendre polynomials • Concl.: changes in V-duration do not change the amount of undershoot —> active control of articulation speed Speech acoustics and phonetics, Il Ciocco

  12. Formant representations e e zeroth order Legendre Legendre polynomial coefficients (mean Fi in vowel segment) second order polynomials (axes reversed) Speech acoustics and phonetics, Il Ciocco

  13. Modeling vowel reduction • Ph.D. thesis Dick van Bergem (1995) • “Acoustic and lexical vowel reduction” • see also Speech Communication 16, 329-358 • lexical V reduction Fr /betõ/ vs. Du /b@tOn/ • acoustic V reduction /banan, bAnan, b@nan/ • f(sent. acc., w. str., w. class): can-candy-canteen • coarticulatory effects on the schwa • C1@C2V- and VC1@C2-typenonsense words • perceptual effects (full V or schwa, f.i. ‘ananas’) Speech acoustics and phonetics, Il Ciocco

  14. Some results t-n w-l The schwa is not just a centralized vowel but something that is completely assimilated with its phonemic context Speech acoustics and phonetics, Il Ciocco

  15. Modeling consonant reduction • Sp. Comm. (1999) 28, 125-140 (vSon & Pols) • 20 min. speech, both spontaneous and read • 2 x 791 similar VCV; hand segmented • 5 aspects of V and C reduction • related to coarticulation: F2 slope differences at CV- vs. VC-boundaries; F2 locus equations (F2 onset vs. F2 target) • related to speaking effort: duration; spectral COG (mean freq.); V-C sound energy differences Speech acoustics and phonetics, Il Ciocco

  16. Some results • V markedly reduced in spontaneous speech • lower F2-slope diff. in spontaneous speech —> decrease in articulation speed • no systematic effect on F2 locus equation; V onsets and targets change in concert —> any V reduction mirrored by comparable change in C • spont. sp.: V and C shorter; lower COG —> decrease in vocal and articulatory effort Speech acoustics and phonetics, Il Ciocco

  17. Access to large corpora • more, and more realistic, data • phonetic knowledge via statistical analyses • f.i. highly accessible IFA-corpus (free, SQL) • see “Structure and access of the open source IFA-corpus”, IFA Proc. 24, 15-26 (vSon & Pols) • on-line http://www.fon.hum.uva.nl/IFAcorpus/ • 4 M/4F speakers, 5.5 hrs of speech • from informal to read + sent., words, syllables • ~ 50Kwords segm. and labeled at phoneme level Speech acoustics and phonetics, Il Ciocco

  18. Some results • speech + annot. + meta data: relational DB • realization of final n, f.i. Du ‘geven’ /xev@(n)/ Read Speech acoustics and phonetics, Il Ciocco

  19. Spoken Dutch Corpus (CGN) • 10 M words, 1,000 hrs of speech • variety of styles, incl. telephone speech • adult Dutch and Flemish speakers • for linguistic and technological research • see various LREC and ICSLP papers (2002) • see also http://lands.let.kun.nl/cgn/home.htm • fully transcribed: orthogr., POS, lemmas • partly transcr.: phonemic, prosodic, syntactic Speech acoustics and phonetics, Il Ciocco

  20. TIMIT • popular DB in acoustic phonetics and ASR • also telephone version (NTIMIT) • hand segmented & labeled at phoneme level • 438 males, 192 females (8 dialect regions) • 10 sent./sp. (2 fixed, 1 phon. compact, 7 diverse) sa1: “She had her dark suit in greasy wash water all year” • includes separate test data (112 M, 56 F) • e.g. Ph.D thesis X. Wang (1997) “Incorporating knowledge on segmental duration in HMM-based continuous speech recognition” Speech acoustics and phonetics, Il Ciocco

  21. overall average=95 ms normal rate=95 primary stress=104 word final=136 utterance final=186 Useful info: durational variability Adopted from Wang (1998) Speech acoustics and phonetics, Il Ciocco

  22. all 3,696 training sent. (sx + si) of TIMIT training set 0 normalized phone duration speaking rate

  23. ‘found’ speech • DARPA-LVSR community rather ambitious • Broadcast News (BN), Sp.Comm. 37 (2002) For Proc. DARPA Workshops, see http://www.nist.gov/speech/proc/darpa99/index.htm Speech acoustics and phonetics, Il Ciocco

  24. Articul.-acoustic features in ASR • “A Dutch treatment of an elitist approach to articulatory-acoustic feature classification”, Proc. Eurospeech-2001, 1729-1732 (M. Wester et al.) • “Integrating articulatory features into acoustic models for speech recognition”, Phonus 5, 73-86 (K. Kirchhoff, 2000) • “An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition”, JASA 111 (2), 1086-1101 (J. Sun & L. Deng, 2002) Speech acoustics and phonetics, Il Ciocco

  25. Conclusions • examples of dynamics in speech acoustics • going from formal to informal speech: • less dynamics, more reduction (artic. guided) • undershoot vs. speaking style • sloppiness or articulatory limits? • functionality of dynamics? —> other paper • systematicity of dynamics? • easing ASR, rules for TTS, acquiring knowledge? Speech acoustics and phonetics, Il Ciocco

More Related