1 / 39

Graded Constraints in English Word Forms: Theory and Data

Graded Constraints in English Word Forms: Theory and Data. James L. McClelland and Brent C. Vander Wyk. Graded Constraint Theory. Agrees with “classical OT” that there are violable constraints.

bao
Download Presentation

Graded Constraints in English Word Forms: Theory and Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graded Constraints inEnglish Word Forms:Theory and Data James L. McClelland and Brent C. Vander Wyk

  2. Graded Constraint Theory • Agrees with “classical OT” that there are violable constraints. • Also consistent with natural phonology and the thinking of many others now emphasizing graded constraints (Boursma, Hayes and Wilson, Burzio, Harris, Hammond …) • Does not insist on strict ranking, allowing constraints to combine to determine the overall goodness of forms. • What can and cannot occur is determined by the cumulative weight of the constraints. • Weights can be such that strict ranking effects occur, but there can also be graded effects: • Forms that violate a constraint can just be less common than forms that do not. • Forms that violate different constraints can be about equally good, if the constraints that each violates have about the same weight.

  3. Why Care? • GCT accounts for data showing there are graded patterns in: • language structure • language judgments • the time it takes to speak • Even if you only care about what does and does not occur: • GCT can lead to a simpler theory.

  4. Overview • Unit of analysis, corpus, and evidence of graded constraints. • Tenets of Graded Constraint Theory • Non-parametric and parametric corpus analysis • Rating experiments • Production duration experiment

  5. Unit of Analysis: The Rhyme Type • Rhyme Type = Vowel Type (V or VV) + Following Consonants • Vt: bat, pet, mit • VVt: bait, mite, shout • VVks: hoax, coax

  6. Corpus • Pronunciations of Monomorphemic Monosyllabic entries in the CELEX lemma corpus, when used in stressed contexts: ‘This is the Book’ • Excludes all forms that even hint at morphological complexity, including: • wealth, first • Uses ‘Received Pronunciation’ so no /r/ in coda (‘fahm’ not ‘farm’).

  7. Measure: Observed per-vowel occurrence rate • The number of different lemmas of each rhyme type divided by the number of vowels of the given type (5 for short vowels, 10 for long vowels). • There are 113 words with the ‘Vt’ rhyme type, so its rate is: 113/5 = 22.6 • Analysis focuses on the number of lemmas, regardless of each lemma’s frequency. • Phonologically identical lemmas count twice, e.g. ‘bat’.

  8. Evidence of Graded Constraints t d Data SuggestsGraded ConstraintsAgainst: -Voiced consonants - Long vowels -Non-coronals V VV k g p b V V VV VV

  9. Effect of coda embellishments V In every case, the embellishment reduces the number ofword forms containing the indicated coda. Some go below threshold.

  10. VV Effect of coda ‘embellishments’ (cont’d) Same thing happens here – more cases below thresholdbecause of the added constraint.

  11. Hard and Soft Constraints The Template (fill slots from left to right): • Hard constraints: • Bare short vowels do not occur in stressed monosyllablesNasals share place w/ following C’s;Obstruents must agree in voicingNo geminates allowed (e.g., *tt). • Soft Constraints: • Added segments of any type X • Long vowels [long]/V • Voiced obstruents [voi]/O • Non-coronal articulations [-cor]/O • Non-alv place for coronal fricatives [-alv]/F[cor]

  12. Summary of constraints: • Keep it short, simple, coronal, and unvoiced.

  13. Partial Ordering Law For forms i and i’: If i’ violates a proper superset of the constraints violated by i, then i’ should occur less frequently than i, or neither should occur at all. Definitions: Base form, immediate descendent: An immediate descendent i’ of a base form i violates the same constraints as i’ plus one additional constraint.

  14. Partial Ordering Graph for ‘legal’ rhymes containing at least one stop and at most two consonants

  15. Overall Partial Ordering Results • 20 violations out of 363 base/immediate descendent pairs involving at most two coda consonants; only 13 of the violations were statistically significant. • The few three-consonant codas are always rarer than any of the base forms of which they are immediate descendents.- E.g. Vkst < Vks, Vkt, Vst. • The constraint against adding segments, ↓X is very robust: one violation:- Vŋ < Vŋk (sing / sink) • The constraint against voiced obstruents, ↓VO is also quite robust:- There are just two violations: Vlt<Vld, VVf<VVv • Among forms with simple coronal codas, the constraint against long vowels is weak, and is reliably reversed when only the base set of long vowels is considered.- This statement includes: /t/, /d/, /s/, /z/, /l/, and /n/

  16. Parametric Analysis Linear Model with Threshold Multiplicative Model (=MaxEnt model of G&J;H&W) Apply to rhymes containing stops in next set of analyses. Later we will apply the first model also torhymes containing fricatives.

  17. Index Description: Constraint against… Notation C1: Added segments of any type X C1a Added pre-stop nasals N/_S C1b Added pre-stop /l/ l/_S C1c Added pre-stop /s/ s/_S C1d Added post-stop /s/ s/S_ C1e Added post-stop /t/ t/S_ C1f Added pre-fricative nasals N/_F C1g Added pre-fricative /l/ l/_F C2: Long vowels [long]/V C2a In rimes containing stops [long]/V{S} C2b In rimes containing fricatives [long]/V{F} C3: Voiced obstruents [voi]/O C3a Voiced stops [voi]/S C3b Voiced fricatives [voi]/F C4: Non-coronal articulations [-cor]/O C4a Labial stops [lab]/S C4b Dorsal stops [dor]/S C4c Labio-dental fricatives [lbd]/F C5: Non-alveolar place for coronal fricatives [-alv]/F[cor] C5a Dental place for coronal fricatives [den]/F[cor] C5b Palatal place for coronal fricatives [pal]/F[cor] Questions: How best to formulate the specific constraints? List at left is what we’ve used in analyses below but it is somewhat problematic. Constraint against first stop or fricative are not made explicit Constraints against other coda consonants (and even vowel length) are all context sensitive. Context sensitivity -> added complexity to theory but what should we expect? General and SpecificConstraints

  18. Questions • Which function provides the best overall fit to the pattern of observed occurrence rates? • Do we need to consider interactions among the constraints to fit the data? • Can we explain what does and what does not occur through the combined effects of the graded constraints?

  19. The Threshold Linear Model Beats the Product Model

  20. Short Vowel Long Vowel Unvoiced Avg Per Vowel Occurrence Rate Voiced

  21. Results with Interactions

  22. Even with otherwise simple forms, the preference for coronals does operate among frequent words Line is Avg. P(coronal) within each frequency range for V + unv. stop, V + voiced stop, VV + unv. Stop, V + nasal Indifference line

  23. Corpus Analysis Summary • Observed per-vowel occurrence rates mostly agree with GCT. • The linear threshold model accounts for actual occurrence rates better than the multiplicative (MaxEnt) model. • Cumulative impact of constraints explains quite well which forms do and do not occur. • The occurrence rate data suggests some constraint interactions, which are mostly of the form: • The constraint favoring coronals is weak at best unless combined with other constraints. • A similar thing occurs with the preference for short vowels.

  24. One Explanation Over the whole corpus the language tends to minimize the average per word constraint violation score: Taken by itself, this score would result in all words being ‘as simpleas possible’ (null, perhaps?) Saturating the simplest distinct forms with infrequent words may beone way of tending to minimizing this score, while still keepingdifferent word forms distinct. Premium placed on distinctness might explain V < VV in forms withsimple coronal consonants.

  25. Word Goodness Judgments • Are native English speakers sensitive to the constraints? • Does the GCT do a better job explaining goodness judgments than other variables? • Phonotactic probability • Lexical density • Are the constraint weightings for judgments different from those for occurrence rates?

  26. A perspective on judgments oflinguistic forms • Judging forms does not give special access to ‘the grammar’ • Rather it is a task like any other, subject to a range of factors. • Thus we expect the pattern of judgments to be sensitive to task details, and subject to variation in concert with such details. • As a consequence we have now run three versions of this experiment: • Judge goodness • Judge typicality • Judge goodness after repeating the heard form • Also allows assessment of how well participants heard and how well they can produce the forms.

  27. Experiments • Carnegie Mellon undergraduates rated how good various non-word forms would be as words ‘if a new word were needed’. • Experiment 1: Small set of rhymes considered, onset always ‘v’. Vowel either ‘sit’ or ‘tree’. • Experiment 2: Larger set of rhymes, onset could be p, t, or k. Vowel could be ‘sit’ or ‘pet’; ‘tree’, ‘say’, ‘by’, ‘cow’. • Some rhymes are attested in English, others not e.g., ‘veelb’. • Fillers were included in both experiments. • Additional subjects were recruited to rate familiarity of items. Highly familiar items were removed in some analyses or familiarity was used as a regressor. • All data shown here are from Experiment 2. • Very similar results were obtained in a replication of Experiment 2 but with participants rating typicality of the form instead of goodness. • Results from goodness judgments made after pronunciation have been collected but not yet analyzed.

  28. Non-parametric comparisons indicate all constraints are honored

  29. Regression Analysis • Predictors: • Lexical Similarity • Phonotactic Probability • Attestedness • Familiarity of the item (slang, etc) • GCT • Using weights from fit to corpus • Using weights refit to ratings

  30. Expt 2

  31. Do the constraints affect spoken word durations? • S’s read written form, e.g. KET, then said: • Say ket again? (as Question) • Say ket again. (normal) • Say ket again. (fast) • Say ket again. (slow) • Say ket again. (normal) • Analysis considers first ‘normal’ and ‘fast’ responses to each item.

  32. ‘ket’ ‘keend’ ‘kest’

  33. Durations explained by GCT (refit) with Pace Included as a Predictor

  34. Comparison of Parameter ValuesAcross Experiments

  35. A word form that violates several graded constraints: Thanks!

  36. Patient Data • In a previously published study: • Anterior Aphasic patients repeated a large number of present and past tense forms as a part of a larger experiment. • No effect of inflectional status was found on accuracy, but there was a large overall complexity effect. • Vowel length and coda voicing did not account for any of the variance. • The data are reasonably well fit with a logistic model including constraints against added segments of all types: • Constraint against Fricatives is very high. • Constraint against nasals and labial place are very low. • There is a very large penalty for dental place of articulation. • There is also a penalty against a voiced /d/ segment following another voiced obstruent. Such forms never occur in uninflected word forms in English.

  37. 100 80 60 40 20 0 Actual Percent Correct 0 20 40 60 80 100 Predicted Percent Correct

More Related