Measures of Variation Among English and American Dialects

Measures of Variation Among English and American Dialects Robert Shackleton U.S. Congressional Budget Office

Goals • Compare speech variants used by English and American speakers, using easily accessible data • Use several different quantitative methods to assess variation among speakers • Compare different quantitative methods • Use results to gain some insight into English origins of American speech variants

Data • Nearly all data from Kurath & McDavid’s Pronunciation of English in the Atlantic States; some from Kurath’s Dialect Structure of Southern England • All or nearly all data collected by Guy Lowman • 82 phonemes classified into 285 variants by Kurath and McDavid

Data • Four regions • Southern England (59 informants); settled <700 • Southeastern Massachusetts (22 informants); settled <1650 • S.E. Virginia / N.E. North Carolina (31 informants); settled ~<1690 • S.W. Virginia / S. West Virginia (19 informants); settled ~1750-1800 • Informants largely older, rural, long-settled families • In some cases, more than one variant per informant • Some missing data • Some data arbitrarily attributed to one of two or three possible informants in a given locality

Methods • Shared variants: based on proportion of variants shared between two speakers • Genetic distance: based on relative frequencies of variants, treating variants of a given phoneme as analogous with allelles of a given gene • Linguistic distance: measured as a Euclidean distance between variants in an idealized geometric grid (e.g. ² and e are closer to each other than i and Þ) • Each measure involves arbitrary assumptions • Choice of phonemes to include • Classification of responses into variants • Quantification of distances among variants • Important difference: first two approaches assume that variants are discrete; linguistic approach does not

Genetic Approach • Nei's genetic distance D measures how closely related populations of pronunciation patterns are if: • Change is always to a completely new variant • All phonemes have the same rate of change • Population sizes remain constant over time • Occurrence of variant = 1; absence = 0 • Occasionally, frequency of variant in a set of similar words (0 < x < 1) • In some cases, more than one variant per speaker • Each informant represented by a vector of 285 numbers, each between 0 and 1 • In this sample: • D ranges from 0.00 to 1.70 • 50% shared pronunciations => D = 0.7

Linguistic Approach • Variants are characterized by a set of numbers representing degrees of height, backing, rounding, rhoticity, length

Linguistic Approach • Difference between variants measured as Euclidean distance • Distance between two speakers LD measured as the average Euclidean distance between their variants • Could also measure the dispersion of distances, etc. • In this sample: • LD ranges from 0.00 to 1.68 • 50% shared variants => LD = 0.70 to 1.16

Cluster Analysis • Methods of grouping informants on the basis of similarity of their speech patterns • Many different approaches • Different measures of similarity—Pearson correlations, Euclidean distances, cosines, genetic or linguistic distances • Different methods of grouping similar observations into clusters—single, average, and complete linkages, various algorithms for estimating phylogenetic relationships • Results highly dependent on approach • English speakers tend to group into five regions (East Midlands, East Anglia, Southeast, Southwest, Devonshire) • North American regions tend to be distinct, and to cluster most closely with to Southeast England • EVNC and SWVA consistently cluster together

Results • Distance measures are generally correlated • Nei’s distance and shared variants are very similar, despite nonlinearity • Linguistic distance is least similar—contains different information about similarity of speech forms

Shared Variants East Midlands East Anglia Southeast Southwest Devonshire Massachusetts EVNC SWVA

Nei’s Genetic Distance East Midlands East Anglia Southeast Southwest Devonshire Massachusetts EVNC SWVA

Linguistic Distance East Midlands East Anglia Southeast Southwest Devonshire Massachusetts EVNC SWVA

Distribution of Variants • Some variants are widespread; others not • 12% appear in all 8 regions • 29% appear in 7 regions • 42% appear in 6 regions • 59% appear in 5 regions • Even within regions, lots of variation • Informants in a given region typically share 60% to 75% of variants, but range is 33% to 90% • Degree of variation reflected in genetic and linguistic distance measures • Southern England • More diversity than in North America • 91% of variants found somewhere • 23% found in every region • 20% found only in southern England • Shared variants between English informants 22% to 83% • Shared variants between English and American informants 18% to 63%

Distribution of Variants • North American regions • Less diversity than in England—22% of southern English variants absent • 80% of variants found somewhere • 37% found in every region • 9% found only in North America (12% of North American variants) • Nearly half of American “innovations” shared across all N. American regions • Many “innovations” are known to have existed in southern England, but were not recorded • North American distribution of southern English variants • Slightly greater frequency of eastern (esp. southeastern) English variants in American regions • Of 41 variants found in eastern but not in western England, 14 (34%) appear in Massachusetts and in the South, 7 (17%) in Massachusetts but not in the South, 13 (32%) in the South but not in Massachusetts • Of 33 variants found in western but not in eastern England, 5 (15%) appear in Massachusetts and in the South, 2 (6%) in Massachusetts but not in the South, 11 (33%) in the South but not in Massachusetts

Distribution of Variants • Massachusetts • Both more and fewer shared variants with English informants than the South • On average, more shared variants with the South than with English informants • By all measures, MA informants show somewhat greater affinity with eastern English • The South • EVNC and SWVA comparatively homogeneous and similar • Similar intra- and interregional variation • Similar variation with MA and England • Slightly greater affinity with western English than MA • Southern American informants have greatest number of shared variants with Devonshire informants, but lowest linguistic distance with southeastern English informants • Can illustrate differences using average values, or values for “typical” informants who have the greatest average number of shared variants or lowest average distance with all other speakers in region

Regional Comparison: Averages

Regional Comparison: “Typical” Informants

Conclusions • Different measures yield somewhat different, complementary insights into linguistic variation • By all measures, extensive variation in and among regions • Patterns of variation—increasing in population and age of settlement—are reminiscent of species-area relationship • American settlement resulted in lower variation in American regions, leveling, and somewhat different populations of variants in different regions • Slightly dominant influence from the metropolitan area • Greater eastern influence in the north, western influence in the south • Relatively little innovation • Leveling process analagous to loss of species during reduction in habitat • Results are largely consistent with the historical record of early English immigration to North America (except for absence of East Anglian influence in Massachusetts)

Measures of Variation Among English and American Dialects

Measures of Variation Among English and American Dialects

Presentation Transcript

Measures of Variation

Measures of Variation

ENGLISH DIALECTS AND ACCENTS

Measures of Variation

ENGLISH DIALECTS

English Dialects

English Dialects

Dialects of English

English Dialects

English Dialects

English Dialects

Measures of Variation

Dialects of Modern English

MEASURES OF VARIATION

Measures of Variation

Measures of Variation

Measures of Variation

Measures of variation

Measures of Variation

Measures of Variation

MEASURES OF VARIATION

Measures of Variation