310 likes | 474 Views
Estimating Rates Of Lexical Change. Andrew Meade a.meade@reading.ac.uk University of Reading. Rates Of Lexical Change. Lexical rate variation Inferring evolutionary histories Calculating rates of lexical evolution Underlying processes. 15 lines representing a bull.
E N D
Estimating Rates Of Lexical Change Andrew Meade a.meade@reading.ac.uk University of Reading
Rates Of Lexical Change • Lexical rate variation • Inferring evolutionary histories • Calculating rates of lexical evolution • Underlying processes
15 lines representing a bull
Phylogenetic Comparative Methods • Popular in biology for 20 years • ancestral states, correlated evolution and rates of evolution, hypothesis testing Traditional statistics Assumes data is independent Comparative methods
The Language ‘gene’ • Swadesh list, Morris Swadesh 1940, onwards • 200 meaning forming basic vocabulary • Chosen to be stable, fundamental and resistant to borrowing. • 95 Indo European languages + Hittite and Tocharian
Cognate classes • Word with a common evolutionary ancestry English Fish Danish Fisk Dutch Visch Czech Ryba Russian Ryba Bulgarian Riba Fish Ryba 34other languages 23 other languages
IE cognate classes Average 17 1 “Who”, “Three” 35 “Person”, “Dirty” 1 17 35
Phylogenetic inference Time 1000 years Q10 0 Non cognate 1 Cognate Q01 0 0 0 0 0 0 1 1
MCMC Phylogenetic inference • Creates a statically justified sample of trees • Sample tress in proportion to there probability • Used to correct of the non-independence in the data Results = Data + Method
Random tree -58204 Log units 4.1 x 1014107 Most probable Infinite number of poor trees
Out group Greek Indo-Iranian Slavic Celtic Germanic Romance
Inferring lexical rates “Name”, 3 cognate classes Class A, Gypsy (Alav), Persian (Esm) Class B, Latvian (Vards), Lithuanian (Vardas) Class C, All the rest, Hindi (Nam), Greek (Onoma), Italian (Nome) Class A B A, C B, ect The estimated instantiations transition rate C A B A A B A C B C Class B Class C C B To many parameters, not enough data
Inferring lexical rates 2 cognate classes Class 1 Class 2 Slow rate Fast rate
“Salt” “Red” “Five”
Mean rates for the 200 words Mean = 3.05 1.82 Median = 2.74 Min. = 0.09 Max = 9.27 100 fold difference Slow ‘two’, ‘who’, ‘one’, ‘night’, ‘to die’ Fast ‘dirty’, ‘to turn’, ‘to stab’,
Word Half life 50% chance of the word being replaced by a non-cognate form Based on IE being 8000 years
I-E tree showing variation in rates of lexical replacement, per 10k years “One” 0.43 “Ear” 0.88 “Sand” 4.5 ROMANCE GERMANIC GERMANIC SLAVIC INDO- IRANIAN GREEK
Approximately 100-fold variation in rates of word evolution • Cultural replicators can evolve more slowly than some human genes (e.g., compare 'five' with lactase gene) • Possibility of deep linguistic reconstructions • What processes explain the variation ?
Spoken word frequency British National Corpus N = 4840 words mean = 194 geometric mean = 35.94 median = 25
Distribution of frequency of word use (20-100 million words) Most words used < 100 times per million
Correlations between frequencies of word use r=0.88 r=0.87 Frequent of use is very stable thru out IE r=0.87
Frequency vs rate of lexical evolution r=-0.37 r=-0.35 r=-0.41 r=-0.32
Parts of speech conjunctions ---- prepositions ---- adjectives ---- verbs ---- nouns ---- special adverbs---- pronouns ---- numbers ---- R2=0.48 R2=0.48 Numbers, pronouns, special adverbs Stronger selection? R2=0.48 R2=0.50
Summary • Simple model accounts for 50% of variation in rates of evolution across 87 languages representing ~130,000 years of evolution • Spoken word frequency seems to exert a general influence on rates of word evolution • High frequency words less likely to be borrowed • Languages evolve initially in less frequently used parts of vocabulary, retaining mutual intelligibility • Cultural replicators can evolve more slowly than some human genes (e.g., compare 'five'” with lactase gene)
Acknowledgements • Mark Pagel • Quentin Atkinson • Russell Gray • ACET