140 likes | 331 Views
A Stochastic, Corpus-Based Approach to Mid-Vowel Distribution in Italian. Erica Cei UCLA, Spring 2012. Presentation based on a Linguistics 199 project done under Prof. Bruce Hayes in Fall 2011. Italian: a 7-vowel system. Tense vs. lax vowels: minimal pairs
E N D
A Stochastic, Corpus-Based Approach to Mid-Vowel Distribution in Italian Erica Cei UCLA, Spring 2012 Presentation based on a Linguistics 199 project done under Prof. Bruce Hayes in Fall 2011
Italian: a 7-vowel system • Tense vs. lax vowels: minimal pairs <pesca> [ˈpes.ka] ‘(s)he fishes’ <pesca> [ˈpɛs.ka] ‘peach’ <torre> [ˈtor.re] ‘tower’ <torre> [ˈtɔr.re] ‘to remove’ • Vowel reduction in unstressed syllables <peschina> [pesˈki.na] ‘little peach’ <torretta> [torˈret.ta] ‘little tower’ i u e o ɛ ɔ a Note: This study focuses on the dialect of Tuscan Italian spoken in the province of Pisa (esp. town of Cascina).
Constructing the Corpus • Text sources • Television subtitles from 2008 (Matthias Buchmeier) • Text of a 1923 novel (ItaloSvevo, La coscienzadi Zeno) • Consultants • Me (fluent heritage speaker, monolingual in Italian to age of 3) • Parents, friends from same region
Constructing the Corpus (cont’d.) • Methodology • First rough pass: Orthography to IPA (with Excel) • Refinement: Details not in orthography added in by hand (with a custom program made by Bruce Hayes of UCLA) • [s]/[z] distinction, [ts]/[dz] distinction, mid vowel height, stress, transcription of foreign words • Tags
Questioning the Status of e, ɛ, o, and ɔ • …Is there really a phonemic distinction between tense and lax vowels? • In words unknown to me, I guessed at better than chance • Enter the English Phonology Search software (programmed by Bruce Hayes in 2011) • Goal: Identify effect of every possible preceding and following environment for mid vowels; do some favor one height over another? English Phonology Search available at linguistics.ucla.edu/people/hayes/EnglishPhonologySearch/index.htm
Logistic Regression • Statistical model that separates out ‘overlapping’ factors (i.e. accounts for interaction effects) (with R) • Set to try to predict lax using probabilities R (a statistics program) is available at http://www.r-project.org/
Results • 42 significant factors:
Some Highlights (cont’d.) • Front and back vowels pattern differently • Almost all significant contexts encouraged lax Mid vowels before glides Mid vowels before codas
Some Highlights • There are zero sequences *ɛɲ]σ • Mid vowels before laterals tend to be lax Mid vowels before ɲ]σ Lax Tense
Performance of the Model Thin line: prediction of a model that assumes that in every context, tense and lax vowels are equally likely to occur. Scatterplot: bulge above the thin line shows model performs at better than chance.
Performance of the Model (cont’d.) As before, the thin line shows what we would predict if we assumed that mid vowels were equally likely to be tense or lax in any given context. The downward bulge shows that the model performs better than chance.
Performance of the Model (cont’d.) • On a -1 to 1 scale, the correlation between reality and 0.577565 • Words with low probability of lax: • impegna, sgombra • Words with medium probability of lax: • aziendali, gomiti • Words with high probability of lax: • coniugi, logica
Future steps • Accounting for morphology (in progress) • Suffixes with stressed mid vowel, i.e. –ɛllo • May cause some effects to vanish • Wug testing • A wug test has been designed but needs to be tested with more subjects • Very, very preliminary results suggest that the model tends to correctly predict responses
Bibliography and Acknowledgements Ryan, Kevin. Gradient Weight in Phonology. UCLA, 2011. Web. <http://www.linguistics.ucla.edu/general/dissertations/RyanDissertationUCLA2011.pdf>. Many thanks to Prof. Bruce Hayes