New approaches to language and prehistory from typology, genetics, and quantitative linguistics

New approaches to language andprehistory from typology, genetics,and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Lecture IV: The utility of phylogenetic algorithms and software: some case studies

Case study A Can the algoritms help us in refining lexicostatistics? Let‘s compare a phylogeny based on traditional methods and one based on lexicostatistics using modern phylogenetic methods. Language family studied: Mixe-Zoquean

The location of Mixe-Zoquean languages

Classification criteria: shared (mostly phonological) innovations: 1. Not defined 2. /h/ inserted before final consonant 3. Vowel length is lost 4. Word-final vowels lost 5. Palatalizing effect of front vowels 6. Apparently some morphological and lexical innovations (not clear) 7. (Mostly implicit by the language being intermediate in several respects, and also having its own innovations, see 12) 8a. Syllable-initial nasals become prenasalised stops; 8b. /t/ and /ts/ merge before /i/; 8c. final devoicing 9a. Development of a quantity distinction in consonants; 9b. An analogical extension involving verb classes 12. Unlaut, syncope, anaptyxis, strees changes 22. An /h/ is inserted into verb roots whose final consonant is a stop (from Wichmann 1995)

A 110-word Swadesh-style list

Encoding of cognation as discrete characters

A distance matrix of cognate percentages

Case study B Sweet dreams and crude reality: evaluating Dunn et al. (2005) on Austronesian and Papuan

What does it take for a match between two tree to be „close“? A crude test of how well two trees match is to count the Robinson-Foulds distance or „symmetrical“ differences. This is a count of how many nodes that are in one tree but not the other. First tree A is compared to tree B and then tree B to tree A and the result is divided by two (implemented in TreeDist.exe in the Phylip packages, among others)

The distance between the „traditional“ and the „typological“ Austronesian trees is 4. Now we may ask: if we generate 10,000 random trees with 16 taxa, how like are you to get draw a random pair from this pool that has 4 or less differences. I carried out this test (in collaboration with Mihai Albu, who generated the trees, and Thomas Mailund, who ran the trees through his program, which is similar to TreeDist.exe).

Results:

The conclusion seems to be in favor of Dunn et al., but. . . the time depth of the Melomelanesian subgroup of Austronesian is very shallow, perhaps 1000 years or so (this is to be checked). The time depth of the Papuan group, if it exists at all could be 10 times as large. How good does a method work at such a time depth if it only barely works at a shallow level?

On a more optimistic note: If the exact same dataset that Dunn et al. used (supplied online along with their paper) is subjected to a Bayesian analysis, the Robinson-Foulds distance is down to 3! (Thanks to Arpiar Saundars for carrying out the analysis)

Traditional tree Tree produced by Bayesian analysis of typological data

The probability of a Robinson-Foulds distance of 3 is around 0.01

Intermediary conclusion Given that a reasonably good tree can be obtained by using typological data the method could perhaps work. And it could work even better using an adequate algorithm. . .

But does it actually work?

A little problems not to be overlooked: Hm, low bootstrap values. . . .

How low can you go?

Bootstrapping in SplitsTree (10,000 runs)

Zooming in on the inner nodes

Bootstrap values of all inner nodes 0.221 1 3 6 7 8 9 13 15, 0.274 1 7 8 9, 0.308 1 3 6 7 8 9 13 14 15, 0.362 1 3 5 6 7 8 9 13 14 15, 0.433 1 3 7 8 9, 0.506 1 3 4 5 6 7 8 9 10 13 14 15, 0.524 1 2 3 4 5 7 8 9 10 11 12 14, 0.596 1 2 3 5 6 7 8 9 11 12 13 14 15, 0.661 1 2 3 4 5 6 8 10 11 12 13 14 15, 0.673 1 7 9, 0.701 1 3 4 5 6 7 8 9 10 12 13 14 15, 0.939 1 2 3 4 5 7 8 9 10 11 12 14 15,

What have Dunn et al. accomplished? • They are the first to have published phylogenetic trees using typological data as input • They have produced a nice dataset, including new data from fieldwork BUT • The comparison between an Austronesian tree based on the comparative method and one based on typological data is not carried out in a rigorous manner • The algorithm used (Maximum Parsimony) is the worst one available • The data are organised in binary variables, which is the worst possible way because the chance factor increases as the possible number of values of a features decreases • They argue that a fit between the proposed phylogeny and geographical patterns is in favor of the proposed phylogeny being real and not due to diffusion. But precisely diffused items are expected to pattern geographically. And actually the fit is poor. • The ask a program to produce a tree. It obeys. But it also produces bootstrap values where 11 out of 12 inner nodes are below or way below 90%. This is a tree that doesn‘t want to be a tree. Yet they accept it at face value. CONCLUSION (1) • Nothing substantial has been accomplished, neither methodologically nor empirically CONCLUSION (2) • Don‘t believe everything you read in Science and—trust me—don‘t necessarily trust people who work at Max Planck institutes

Case study C Let‘s dream on. . . . Towards a subgrouping of proto-New World

Step 1 Make a selection of languages belonging to the West Coast, as defined by speakers being dependant on the Pacific for subsistence or navigating on it. Assumption: there could be a group within the New World family which is mostly confined to the Pacific Coast. The list: Haida, Squamish, Makah, Quileute, Coos (Hanis), Karok, Wappo, Maricopa, Huave, Quechua, Aymara, Epena Pedee, Awa Pit, Mapudungun, Qawasqar

Step 2 Find out whether there are traits among the American founder traits that are significantly better represented in this group of languages. Result: two traits: fusion of Agent and Patient markers; inflectional synthesis of the verb: 8-9 catogories per word.

Step 3 Extend the set of Pacific languages to Pacific-Style languages by the criterion of sharing one of the two „significantly Pacific“ features

Step 4 Reduce the set by removing languages that don‘t shared at least 25% of all WALS features that have a significantly Pacific distribution

Step 5 Make a classification of Pacific-Style languages, using many WALS feautres (here 96 features)

Step 6 Fiddle a bit further, and interesting patterns emerge (in the next, Haida is excluded)

Conclusion A knowledge of ancestral states at the root of the tree can significantly improve subgrouping. Such „Founder traits“ also lend more credibility to a phylogeny. To be able to argue for new genealogical relations by using typological data we need either (1) strongly support roots, involving comparison with languages of the rest of the world or (2) strong internal statistical support such as high bootstrap values. Preferably we should have both. There is light at the end of the tunnel.

Thanks Keep in touch: wichmann@eva.mpg.de

New approaches to language and prehistory from typology, genetics, and quantitative linguistics

New approaches to language and prehistory from typology, genetics, and quantitative linguistics

Presentation Transcript

Linguistics 001: Linguistic Typology

Language and Linguistics

Quantitative Genetics and Genetic Diversity

Language typology

Language and Linguistics

Typology,Genetic, Areal, and Historical Linguistics

Language Typology

New approaches to language and prehistory from typology, (genetics), and quantitative linguistics

Language typology

JPN494: Japanese Language and Linguistics JPN543: Advanced Japanese Language and Linguistics

Quantitative and Behavior Genetics

JPN494: Japanese Language and Linguistics JPN543: Advanced Japanese Language and Linguistics

JPN494: Japanese Language and Linguistics JPN543: Advanced Japanese Language and Linguistics

Linguistics and Language Teaching

Linguistics and English Language

Quantitative genetics and breeding theory

Quantitative and Qualitative Approaches

Language typology and word formation

From Language to Linguistics

Language typology and word formation