260 likes | 496 Views
CS 551/651: Structure of Spoken Language Lecture 7: Syllable Structure, Vowel Neutralization, and Coarticulation John-Paul Hosom Fall 2008. NOTE There’s a tutorial on the web that allows you to hear the effect of different formant values:
E N D
CS 551/651: Structure of Spoken Language Lecture 7: Syllable Structure, Vowel Neutralization, and Coarticulation John-Paul Hosom Fall 2008
NOTE • There’s a tutorial on the web that allows you tohear the effect of different formant values: • http://www.asel.udel.edu/speech/tutorials/synthesis/ceevees.html • You can enter start time, end time, amplitude, and formant • values for beginning, middle and end of a “syllable”, then • generate a waveform and hear the result.
Syllables • Words are composed of phonetic clusters: syllables • Each syllable has a nucleus; typically the nucleus isa vowel or diphthong, sometimes a syllabic nasal or lateral (button, bottle) or retroflex (bird) • Nucleus is syllabic nasal or lateral only when following alveolar consonant in previous syllable of a word • Syllable boundaries sometimes ambiguous: “tasty”: tas/ty tast/y ta/sty “bottling”: bott/l/ing bott/ling • Syllable can be broken into components: syllable contains {onset, rhyme} rhyme contains {nucleus, coda}onset and coda are consonants, rhyme is a vowel, syllabicnasal, or syllabic lateral
Syllables Limitations on consonant clusters: not all CCC combinations are possible in syllable-initial position. Of those that are possible, almost half are very rare. possibly only one word in English: “spew” only a few English words pronounced (optionally) with /s t y/: “Stewart”, “steward”, “stew” very few English words/root with /s k l/: “sclerosis” very few English words with /s k y/: “skew”, “askew”, “obscure” graphic from http://www.arts.uwa.edu.au/LingWWW/LIN101-102
Syllables • Sonority corresponds roughly to degree of constrictionalong vocal and/or nasal tract • Ordering of sonority: vowels, glides (/w/, /y/), liquids (/l/, /r/), nasals, fricatives, affricates, plosives • If a binary classification (sonorant/non-sonorant), then sonorant consists of all vowels, glides, liquids, and nasals. • Fricatives, affricates, and plosives may be clustered into onecategory, “obstruents,” for purposes of sonority • Syllabification can be done according to “sonority principle”;the sonority must rise and fall in a syllable • Also, there’s the Maximal Onset Principle:“Put a consonant in the onset rather than the coda when possible”
Syllables • Because of rise and fall of sonority in syllables, the followingrestrictions occur: (a) glide (/w/,/y/) must be immediately adjacent to a vowel, (b) /r/ is next closest consonant to vowel, (c) /l/ is next closest consonant to vowel, (d) nasal is next closest, (e) obstruent is farthest from the vowel (but there may be more than one obstruent in onset or coda) • Obstruents in a cluster must have same voicing • In series of obstruents between two vowels, voicing can change only once, at the syllable boundary. • English allows up to 3 consonants in syllable initial position, 4 consonants at syllable final position
Syllables • Examples: sphere /s f iy r/, streak /s t r iy k/, texts /t eh k s t s/, helms /h eh l m z/ but not /t l iy/ or /p w iy/ • The ordering of glides and liquids doesn’t matter for our purposes (applying to syllabification), because glides and liquids can not occur sequentially within the same syllablein English. (However, two liquids in the same syllable ispossible, e.g. “Carl” and “girl”, as long as /r/ is closer toto the vowel than /l/.) • In English, most burst-fricative pairs are represented as distinct phonemes (/ch/, /jh/), although there are some othercases of burst-fricative pairs (e.g. “tsunami,” “bishops”). • It’s also possible to have two or more adjacent fricatives: “eleven twelfths”
Vowel Neutralization • When speech is uttered very quickly (or is not well enunciated),the formants tend to shift toward that of a neutral vowel: (from Daniloff, p. 320) (from van Bergem 1993 p. 8)
Vowel Neutralization • Target undershoot: /m ih pc ph ih eh/
Vowel Neutralization /m ih pc ph ih eh/ Target undershoot: /ih/ extracted and concatenated from “mip”:
Vowel Neutralization • However, neutralization is not always so simple; sometimesvowel formants shift away from the neutral position,depending on their context, and vowels tend toward slightlydifferent neutral “targets”. • Neutralization is to some extent an artifact of averagingover speakers and contexts (van Bergem 1993) vowels from one speaker in different phonetic contexts, and in “reduced” and “isolated” speaking conditions
Coarticulation • Coarticulation is the “blending” of adjacent speech sounds, • due to gradual movement of the articulators. • Coarticulation makes automatic speech recognition andtext-to-speech synthesis difficult, but humans use coarticulationto conserve effort while speaking and provide robustnessduring recognition. • There is Right-to-Left (RL) or “anticipatory” and Left-to-Right (LR) or “carry-over” coarticulation • Models of coarticulation and syllabification: Locus Theory Modified Locus Theory (Klatt)Öhman’s Theory Kozhevnikov-Chistovich (KC) Theory Wickelgren’s Theory, etc.
Coarticulation RL coarticulation occurs due to high-level planning of phonetic sequences: “spoon”: [s p uw n] rounding in isolation –– + – rounding in context + + + + more observable if neighboring sounds not specified with respect to potentially coarticulated feature; e.g. /s/, /p/, /n/ not specified with respect to lip rounding (from Daniloff, pp. 323-324)
Coarticulation: Locus Theory Locus Theory (Delattre, Liberman, and Cooper, 1955) “there are, for each consonant, characteristic frequency positions, or loci, at which the formant transitions begin, or to which they may be assumed to point. On this basis, the transitions may be regarded simply as movements of the formants from their respective loci to the frequency levels appropriate for the next phone … The spectrographic patterns …, which produce /d/ before /iy/, /aa/, and /ow/, show how … these transitions seem to be pointing to a [F2] locus in the vicinity of 1800 [Hz].” Each consonant has “target frequencies” independent of the neighboring vowels. Formants transition from these target frequencies to the vowel target frequencies.
Coarticulation: Locus Theory • Locus Theory: • Consonants and vowels both have “targets” of articulatorpositions and therefore formant frequency locations • Given sufficient duration of a syllable, all phonemes reachtheir targets • The slope of the formants during a transition from a consonantto a vowel is relatively constant until reaching the target • If the syllable duration doesn’t allow enough time for theformants to reach their targets, “target undershoot” occursand the formants change direction before fully realizingthe intended vowel
Coarticulation: Locus Theory • Locus Theory: (From Klatt 1987, p. 753)
Coarticulation: Modified Locus Theory • Problems with Locus Theory: • A transition may have both rapid and slow components;rapid release of obstruction via tongue tip, followed by slow movement of tongue body. • Preceding vowel can influence F2 onset of a CV transition(Öhman, 1966) • F2 may be insensitive to oral constrictions (obstruents)if the tongue position is toward the front of the mouth (as in /iy/) • (as reported by Fant 1973, Klatt1987)
Coarticulation: Modified Locus Theory • Modified Locus Theory: • Klatt hypothesized that main effects of the vowel on thearticulation of consonants are front/back position and liprounding • Vowels divided into three sets: {+front} {+round} {–front, –round}(because there are no rounded front vowels in English,sets 1 and 2 are mutually exclusive) • {+front} /iy ih eh ae/ • {+round} /uw ao ow er/ • {–front, –round} /uh ah aa aw/ • Predicted Fonset from Ftarget for these 3 classes (locus theory) • Achieved 95% intelligibility for CVC nonsense syllables
Coarticulation: Locus Theory • Modified Locus Theory: ×= -front, -round ° = +front • = +round (From Klatt 1987, p. 754)
Coarticulation: Öhman’s Theory Öhman (1965) found that loci of consonants is NOT independent of neighboring vowels: and that for /g/ more than one locus is required Conclusion: consonant “gestures” are superimposed on vowel “gestures” that are present during the consonant; even when consonant is being uttered in VCV, there is effect of both V on C.
Coarticulation: Öhman’s Theory Öhman (1966) proposed model of coarticulation based on vocal-tract shape evolving over time. Assumes that vocal-tract shapes can be mapped to formant frequencies. For VCV utterances: where s(x,t) is the vocal tract shape at position x and time t, v(x) is the vocal tract shape at position x for a given vowel, c(x) is the vocal tract shape of the consonant, k(t) is an interpolation value (from 0 to 1), and wc(x) describes the degree to which c(x) “resists” coarticulation. v(x) describes the shape of the vocal tract, which may be a combination of two vowels if V1 V2. (v(x) will vary over time from V1 to V2)
Coarticulation: Kozhevnikov-Chistovich (KC) Theory • Syllabification using CnV pattern: CV, CCV, CCCV, … • phrase “give true answers”: • g ih v t r uw ae n s er z • −−−− −−−−−−−−−−− −− −−−−−−− − • S1 S2 S3 S4 S5 • (2) Measured relative durations of words, “syllables”, vowels: • relative duration of vowel = Dvow / Dsyll,syllable = Dsyll / Dword word = Dword / Dphrase
Coarticulation: Kozhevnikov-Chistovich (KC) Theory They measured articulatory effects of vowel on consonants. They found coarticulation within syllable but not across syllables: C1 V1 C2 C3 V2 • articulatory gestures for consonant(s) and vowel begin nearly • simultaneously with onset of initial consonant in syllable • Example: lip rounding in /uw/ begins with /v/ in “give true answers”, • but nasalization of /ae/ does not occur. • focused only on LR coarticulation, effect of V on previous C. • assumes motor programming of speech is discontinuous at VC boundary • counter-examples showing LR coarticulation (Moll and Daniloff 1971, Kent, Carney, and Severeid 1974, Öhman 1966)
Coarticulation: Wickelgren’s Theory Speech units are mentally coded as context-sensitive units: in phonetic string /X Y Z/, Y is encoded as XYZ “By assuming (context-sensitive) allophones to be the basic unit of articulation, … it is trivial to account for how the ‘same phoneme’ in different phonemic environments can be … different in some respects at all levels of the speech process” (Wickelgren 1969, p. 11) However, coarticulation can spread over more than one phone (up to seven phones distance). Other criticisms: MacNeilage 1970, Whitaker 1970, Halwes and Jenkins 1971; “Allophonic richness may only beget strategic poverty” (Kent and Minifie 1977) However, Wickelgren’s is theonly model currently used in ASR and concatenative text-to-speech (exceptions: Wouters 2001, Wrede 2001).
Coarticulation: Gay’s Theory • Gay, 1977: The syllabic unit of motor organization is the CV unit • Based on X-ray motion pictures of VCV utterances • anticipatory tongue movements for V2 in V1CV2 sequencedon’t begin until closure of C has been attained • movement toward V2 occurs during closure of C, havinga large effect on position and shape of tongue during releaseof closure • V1 has little effect on position of tongue at moment ofclosure • supports KC theory; conflicts with Öhman’s findings
Coarticulation • Other models: MacNeilage, Henke, Benguerel and Cowan,Moll and Daniloff, Liberman, Tatham, etc. • Some are “feature based” in that each phonetic segmentis assigned distinctive features which can then be modifiedin regular ways • Some are “hierarchical models”, with several levels oforganization and complex interaction between levels • However, “coarticulatory patterns are not explainedadequately by any … theories or models” (Kent and Minifie, 1977) • Conflicting evidence (Öhman and Kent & Moll vs. KC and Gay)