A Stochastic, Corpus-Based Approach to Mid-Vowel Distribution in Italian

A Stochastic, Corpus-Based Approach to Mid-Vowel Distribution in Italian Erica Cei UCLA, Spring 2012 Presentation based on a Linguistics 199 project done under Prof. Bruce Hayes in Fall 2011

Italian: a 7-vowel system • Tense vs. lax vowels: minimal pairs <pesca> [ˈpes.ka] ‘(s)he fishes’ <pesca> [ˈpɛs.ka] ‘peach’ <torre> [ˈtor.re] ‘tower’ <torre> [ˈtɔr.re] ‘to remove’ • Vowel reduction in unstressed syllables <peschina> [pesˈki.na] ‘little peach’ <torretta> [torˈret.ta] ‘little tower’ i u e o ɛ ɔ a Note: This study focuses on the dialect of Tuscan Italian spoken in the province of Pisa (esp. town of Cascina).

Constructing the Corpus • Text sources • Television subtitles from 2008 (Matthias Buchmeier) • Text of a 1923 novel (ItaloSvevo, La coscienzadi Zeno) • Consultants • Me (fluent heritage speaker, monolingual in Italian to age of 3) • Parents, friends from same region

Constructing the Corpus (cont’d.) • Methodology • First rough pass: Orthography to IPA (with Excel) • Refinement: Details not in orthography added in by hand (with a custom program made by Bruce Hayes of UCLA) • [s]/[z] distinction, [ts]/[dz] distinction, mid vowel height, stress, transcription of foreign words • Tags

Questioning the Status of e, ɛ, o, and ɔ • …Is there really a phonemic distinction between tense and lax vowels? • In words unknown to me, I guessed at better than chance • Enter the English Phonology Search software (programmed by Bruce Hayes in 2011) • Goal: Identify effect of every possible preceding and following environment for mid vowels; do some favor one height over another? English Phonology Search available at linguistics.ucla.edu/people/hayes/EnglishPhonologySearch/index.htm

Logistic Regression • Statistical model that separates out ‘overlapping’ factors (i.e. accounts for interaction effects) (with R) • Set to try to predict lax using probabilities R (a statistics program) is available at http://www.r-project.org/

Results • 42 significant factors:

Some Highlights (cont’d.) • Front and back vowels pattern differently • Almost all significant contexts encouraged lax Mid vowels before glides Mid vowels before codas

Some Highlights • There are zero sequences *ɛɲ]σ • Mid vowels before laterals tend to be lax Mid vowels before ɲ]σ Lax Tense

Performance of the Model Thin line: prediction of a model that assumes that in every context, tense and lax vowels are equally likely to occur. Scatterplot: bulge above the thin line shows model performs at better than chance.

Performance of the Model (cont’d.) As before, the thin line shows what we would predict if we assumed that mid vowels were equally likely to be tense or lax in any given context. The downward bulge shows that the model performs better than chance.

Performance of the Model (cont’d.) • On a -1 to 1 scale, the correlation between reality and 0.577565 • Words with low probability of lax: • impegna, sgombra • Words with medium probability of lax: • aziendali, gomiti • Words with high probability of lax: • coniugi, logica

Future steps • Accounting for morphology (in progress) • Suffixes with stressed mid vowel, i.e. –ɛllo • May cause some effects to vanish • Wug testing • A wug test has been designed but needs to be tested with more subjects • Very, very preliminary results suggest that the model tends to correctly predict responses

Bibliography and Acknowledgements Ryan, Kevin. Gradient Weight in Phonology. UCLA, 2011. Web. <http://www.linguistics.ucla.edu/general/dissertations/RyanDissertationUCLA2011.pdf>. Many thanks to Prof. Bruce Hayes

A Stochastic, Corpus-Based Approach to Mid-Vowel Distribution in Italian

A Stochastic, Corpus-Based Approach to Mid-Vowel Distribution in Italian

Presentation Transcript

Evaluating Sight Translation: A Corpus-based Approach

A Bayesian approach to traffic estimation in stochastic user equilibrium networks

Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet

ReadingCorp : a corpus-based approach to teaching Russian for Research

Towards a Methodology for a Corpus-Based Approach t o Translation Evaluation

A stochastic dominance approach to program evaluation

A stochastic dominance approach to program evaluation

A Lyapunov Optimization Approach to Repeated Stochastic Games

A stochastic dominance approach to program evaluation

A corpus-based approach to communication in tourism: hotel literature in Brazil and the US

A Corpus Based Approach to Near Synonymy of German Multi-Word Expressions

Vowel Formants in a Spectogram

A Stochastic Programming Approach to Managing Email Overload

A POSSIBLE STOCHASTIC APPROACH to THREAT ASSESSMENT

A stochastic approach to Molecular Replacement.

A Behavioural Approach to Stochastic End Use Modelling

A Corpus Based Computational Linguistics

STOCHASTIC DOMINANCE APPROACH TO PORTFOLIO OPTIMIZATION

Distribution Gamma Function Stochastic Process

A Stochastic Model-Based Approach to SAR ATR

Meaning and Phraseology: A Corpus-Driven Approach