580 likes | 718 Views
Parsing acoustic variability as a mechanism for feature abstraction. Jennifer Cole Bob McMurray Gary Linebaugh Cheyenne Munson University of Illinois University of Iowa. www.psychology.uiowa.edu/faculty/mcmurray. Phonetic precursors to phonological sound patterns.
E N D
Parsing acoustic variability as a mechanism for feature abstraction Jennifer Cole Bob McMurray Gary Linebaugh Cheyenne Munson University of Illinois University of Iowa www.psychology.uiowa.edu/faculty/mcmurray
Phonetic precursors to phonological sound patterns • Many phonological sound patterns are claimed to have precursors in systematic phonetic variation that arises due to coarticulation • Assimilation • Vowel harmony from V-to-V coarticulation (Ohala 1994; Beddor et al. 2001) • Palatalization from V-to-C coarticulation (Ohala 1994) • Nasal Place assimilation (-mb, -nd, -ŋg) from C-to-C coarticulation (Browman & Goldstein 1991) • Assimilation • Epenthesis • Epenthetic stops from C-C coarticulation: sen[t]se (Ohala 1998) • Assimilation • Epenthesis • Deletion • Consonant cluster simplification via deletion from C-C coarticulation: perfec(t) memory (Browman & Goldstein 1991)
The role of the listener Phonologization: • when acoustic properties that arise due to coarticulation are interpreted by the listener as primary phonological properties of the target sound. • generalization over variable acoustic input that results in a new constraint on sound patterning.
The role of the listener • From V-to-V coarticulation … i ʌ ɛ ɑ
The role of the listener […ɛɑ…ɑ…] • From V-to-V coarticulation … […ɛi…i…] i ʌ ɛ ɑ
The role of the listener i […ɛɑ…ɑ…] ɑ • Perception may yield vowel assimilation […ɛi…i…] i ʌ ɛ ɑ
The role of the listener • But – distinct factors can produce similar variants: […ɛi…i…] […ɛ ŋ…] i ʌ ɛ ɑ
From perception to phonology • What is the mechanism for mapping from continuous perceptual features to phonological categories? ɛimidandhigh centralandfront-peripheral ɛɑmidandlow centraland back
From perception to phonology • The problem: • The perceptual system is confronted with uncertainty due to variation arising from multiple sources. • What is the mechanism for mapping from continuous perceptual features to phonological categories? ɛi mid andhigh central andfront-peripheral ɛɑmid andlow central and back • Yet, patterns of variation must get associated with individual features of the context vowel (e.g,. high, front) if coarticulation serves as a precursor to phonological assimilation. • How do lawful, categorical patterns emerge from ambiguous, variable input? • …the lack of invariance problem!
Our claims • What is the mechanism for mapping from continuous perceptual features to phonological categories? Our claims: Variability is retained. Acoustic variability is parsed into components related to the target segment and the local context. Feature abstraction through parsing. Acoustic parsing provides a mechanism for the emergence of phonological features from patterned variation in fine phonetic detail.
Variability is retained • Listeners are sensitive to fine-grained acoustic variation. (Goldinger 2000; Hay 2000; Pierrehumbert 2003) • Variability is retained,not discarded Consistent with exemplar models of the lexicon, phonetic detail is encoded and stored, and can inform subsequent categorization of new sound tokens.
Variability is retained • Variability is useful for the identification of sounds in contexts of coarticulation. • The perceptual system uses information about variability to identify a sound and its context, in parallel. • Variability due to coarticulation is exploited to facilitate perception. • -- Listeners benefit from the presence of anticipatory coarticulation in predicting the identity of the upcoming sound. • (Martin & Bunnell 1982; Fowler 1981, 1984; Gow 2001, 2003; • Munson, this conference) • Variability due to coarticulation is subtracted to identify the “underlying” target sound. • (Fowler 1984; Beddor et al. 2001, 2002; Gow 2003)
Variability and perceptual facilitation Perceptual facilitation from V-to-V coarticulation is expected to occur only if: • The effects of coarticulation are systematic—an influencing vowel conditions a consistent acoustic effect on target vowels; • The listener can recognize coarticulatory effects on the target vowel; • The listener can isolate the effects of context vowel from other sources of variation, and attribute those effects to the context vowel.
Feature abstraction through parsing More specifically…under coarticulation of vowel height and backness, • The listener must parse out the portion of the variance in F1 and F2 that is due to coarticulation, and base their perception of the target vowel on the residual values. • Acoustic parsing isolates the effects of context vowel on F1 and F2.
Feature abstraction through parsing • The parsed acoustic variance defines features of the context vowel, over which new generalizations can be formed. phonologization ɛi [ɛ] + [i] [ɛ] + [i] [ɛ] + [high]
Feature abstraction through parsing • The parsed acoustic variance defines features of the context vowel, over which new generalizations can be formed. phonologization i [ɛ] + [i] [ɛ] + [i] [ɛ] + [high] phonologized to [i]
Feature abstraction through parsing • The parsed acoustic variance defines features of the context vowel, over which new generalizations can be formed. phonologization Question: Why phonologization? If target and context vowels can both be identified from the fine phonetic detail…. What’s the force driving phonologization?
Testing the model The acoustic parsing model of speech perception requires that there is a robust and systematic pattern of acoustic variation from V-to-V coarticulation. This paper: we present supporting evidence from an acoustic study of coarticulation. • We examine a range of V-to-V coarticulatory effects in VCV contexts that cross a word boundary, where coarticulation cannot be attributed to lexicalized phonetic patterns.
Key Questions • Extent of phenomenon • Does V-to-V coarticulation cross word boundaries? • Does V-to-V coarticulation affect both F1 and F2? • Relative strength of V-to-V effects vs. other forms of coarticulation? • Usefulness of phenomenon • How could V-to-V effects translate to perceptual inferences? • Is the information by V-to-V coarticulation different when other sources of variation are explained?
Methods i ʌ ɛ æ ɑ Target vowels: ɛ ʌ Measure coarticulation Context vowels: i æ ɑ Induce Coarticulation • /u/ excluded from contexts (rounded + fronted) • intervening consonant varied in • - place (labial, coronal, velar) • - voicing • - /ɛg/ excluded (tends to be raised)
Methods • Methods • 10 University of Illinois students. • 48 phrases x 3 repetitions. • Sentences embedded in neutral carrier sentences • /ɛ/ He said ‘_______’ all the time • /ʌ/ I love ‘_______’ as a title • Coding • F1, F2, F3 • - Converted to Bark for analysis • LPC (Burg Method) • Outliers / misproductions inspected by hand
Analysis Target x Voicing x Context F1 F2 Voicing p=.033 p=.001 Target p=.005 p=.001 Context p=.001 p=.001 Interactions n.s. n.s. V-to-V coarticulation crosses word boundaries. Clear effects of coarticulatory context on both F1 and F2. Target x Place x Context F1 F2 Place n.s. p=.001 Target p=.01 p=.001 Context p=.001 p=.001 Interactions some some
Analysis High 400 500 600 700 F1 (Hz) æ 800 i ɑ 900 Same Low 1000 2000 1500 1000 Front Back F2 (Hz) Male • A lot of unexplained variance… • How does the perceptual system “get to” the V-to-V coarticulation? • How useful is V-to-V coarticulation? • Does accounting for other sources of variance in the signal improve the usefulness of V-to-V? Female
Strategy ? ʌ ɛ ɑ i 1431 hz 1801 hz F2 Need to systematically account for sources of variance prior to evaluating V-to-V coarticulation. ɑ-coarticulated ɛ? or i-coarticulated ʌ?
Strategy ? i ʌ ɛ 1431 hz 1801 hz F2 Need to systematically account for sources of variance prior to evaluating V-to-V coarticulation. A slightlyi-coarticulated ɛ? or A reallyi-coarticulated ʌ?
Strategy ? ʌ ɛ ɑ i 1431 hz 1801 hz F2 Need to systematically account for sources of variance prior to evaluating V-to-V coarticulation. If you knew the category… If ʌ, then expect i If ɛ then expect ɑ ? - ʌ: Positive (more i-like) ? - ɛ: Negative (more ɑ-like) F2? – F2category = coarticulation direction
Strategy Target – F2? = coarticulation direction Strategy: 1) Compute mean of a source of variance 2) Subtract that mean from F1/F2 3) Residual is coarticulation direction. 4) Repeat for each source of variance (speaker, target vowel, place, voicing).
Strategy Hierarchical Regression can do exactly these things. 1) Compute mean of a source of variance F1predicted = 1 * target + 0 If target = 0 for /ʌ/ and 1 for /ɛ/… ʌ)F1predicted = 1 * 0 + 0 Mean /ʌ/ = 0 ɛ) F1predicted = 1 * 1 + 0 Mean /ɛ/ = 0 + 1
Strategy • Hierarchical Regression can do exactly these things. • Compute mean of a source of variance. • Subtract that mean from F1/F2 • 3) Residual is coarticulation direction. Residual F1actual - F1predicted = F1actual - (1 · target + 0) ʌ) Residtarget = F1actual- 0 ɛ) Residtarget = F1actual - (0 + 1)
Strategy • Hierarchical Regression can do exactly these things. • Compute mean of a source of variance. • Subtract that mean from F1/F2 • Residual is coarticulation direction. • 4) Repeat for each source of variance (speaker, target vowel, place, voicing). F1 = 0 * Target+ 0 Residtarget = 2 * Place + 0 Residplace = 3 * Voicing+ 0 Residvoicing = 4 * V-to-V+ 0
Strategy • Construct a hierarchical regression to systematically account for known sources of variance from F1 and F2 • Speaker • Target vowel • Place (intervening C) • Voicing (intervening C) • Interactions between target, place & voicing • After partialing out these factors, how much variance does vowel context (V-to-V) account for?
Regression F2 i 3 æ ɑ 4 Same 5 F1 (Bark) 6 7 8 15 14 13 12 11 10 9 8 F2 (Bark) 1) Raw Data Male Female
Regression F2 i -1.5 æ -1 ɑ Same -0.5 F1 Resid (Bark) 0 0.5 1 1.5 2 1 0 -1 -2 -3 F2 Resid (Bark) • 1) Raw Data • Partialed Out • 2) Subject ʌ ɛ
Regression F2 • 1) Raw Data • Partialed Out • 2) Subject • 3) Target Vowel
Regression F2 i -1 æ ɑ -0.5 Same F1 Resid (Bark) 0 0.5 1 1 0.5 0 -0.5 -1 F2 Resid (Bark) • 1) Raw Data • Partialed Out • 2) Subject • 3) Target Vowel • 4) Consonant
Regression F2 i -1 æ ɑ -0.5 Same F1 Resid (Bark) 0 t 0.5 1 1 0.5 0 -0.5 -1 F2 Resid (Bark) • 1) Raw Data • Partialed Out • 2) Subject • 3) Target Vowel • 4) Consonant • 5) Interactions
Regression F1 Total R2=.884 Post-hoc analysis: height only.
Regression F1 Total R2=.884 Post-hoc analysis: height only.
Regression F2 Total R2=.940 Post-hoc analysis: height + backness.
Regression Summary Progressively accounting for variance is powerful F1: 88% of variance F2: 94% of variance using only known sources of variance V-to-V coarticulation is readily apparent when other sources of variance are explained. Effect of V-to-V coarticulation has a similar size to place/voicing effects. How useful would this be?
Predicting Vowel Identity Same Same Same i ɑ æ Context Vowel • Multinomial Logistic Regression (MLR) • Classification algorithm • Predict category membership from multiple variables. • Categories do not have to be binary
Predicting Vowel Identity • Multinomial Logistic Regression (MLR) • Classification algorithm • Predict category membership from multiple variables. • Categories do not have to be binary • Assumes optimal listener. • Computes % correct. • How much well could a listener do under ideal circumstances with information provided.
Predicting Vowel Identity 60 50 Partialed out Subject Vowel Place Voicing Interactions 40 % Correct 30 20 10 0 i ɑ æ Same Vowel Model does quite well at predicting all vowels but the identity.
Predicting Vowel Identity -12 -10 ɛ-i ʌ-i ɛ-æ ʌ-æ -8 ɛ-ɑ ʌ-ɑ i i -6 ɑ ɑ æ æ -4 F1 (Z) -2 0 -10 2 -8 4 -6 6 32 26 20 14 8 2 -4 -4 F2 (Z) F1 (Z) -2 0 2 4 6 18 12 6 0 -6 -12 -18 F2 (Z)