340 likes | 400 Views
Historical inference from linguistic and genetic data. Potentially “…the best evidence of the derivation of … the human race” (Thomas Jefferson) BUT Inferences are complex methods and results from several disciplines Intellectual stakes are high Work has often been careless
E N D
Historical inferencefrom linguistic and genetic data Potentially “…the best evidence of the derivation of … the human race” (Thomas Jefferson) BUT Inferences are complex methods and results from several disciplines Intellectual stakes are high Work has often been careless sometimes spectacularly so dangers of overinterpretation and “scientism”
General methodological problems • Not all graphs are trees • “treeness” tests often left out • “treeness” hypothesis can often be rejected • Tree inference may be underdetermined • Branching structure • Root choice • Rates of change may not be constant • for different markers • across time • Gene trees (and language trees) may not be population trees • Biology and language are complicated • simplifying assumptions are sometimes perniciously mistaken
Trees vs. Clines (etc.) • A tree structure represents the results of a sequence of splits in population (or language) • no further influences among separate branches • if rates of change are constant, distances should be quantized • Within an interbreeding (intercommunicating) population, distances reflect the amount of gene flow (transmission of linguistic traits) • should correlate strongly with accessibility • e.g. geographical distance in the simplest case
The… procedures outlined here provide a rigorous method for inferring whether the geographical pattern of variation is consistent with an historical split (fragmentation) or no split(recurrent gene flow) using criteria that are completley explicit. For example, in analyzing the mtDNA of tiger salamanders, a clear split into eastern and western lineages was detected for mtDNA. Using the same explicit criteria, there was no split among any human populations. Quite the contrary, the present analysis documents recurrent and continual genetic interchange among all Old World human populations throughout the entire time period marked by mt DNA. Accordingly, estimating a date for a 'split' of Africans from non-Africans based on evidnece from mtDNA is certainly allowed by many computer programs, but the results are meaningless because a date is being assigned to an 'event' that never occurred. Templeton (1997)
Methods for tree inference(“phylogeny”) • Two general approaches • clustering (easier but cruder) • generate and evaluate alternative trees • Distance-based methods • based on matrix of distances/similarities • Parsimony • based on set of partly-shared characters or traits http://evolution.genetics.washington.edu/phylip/software.html documents 193 different phylogeny packages
Cognate percentagesfor 8 Vanuatu languages Toga 64 Mosina 64 58 Peterara 57 51 65 Nduindui 29 28 34 32 Sakao 51 45 55 52 40 Malo 39 39 45 41 43 50 Fortsenal 52 48 57 60 31 48 45 Raga Data from Guy (1994)
Reconstruction Algorithm(Guy 1994) “A message is input at the root of a tree-shaped transmissionnetwork, whence it is transmitted to the terminal nodes. As they travel,copies of the original message are affected by errors consisting inrandomly selected segments of the message being replaced by othersegments randomly drawn from a pool of possible segments (the "alphabet“of the message). The problem is: from the garbled versions of theoriginal message collected at the terminal nodes, reconstruct thenetwork and the history of the transmission of the message.” “Additive-distance” tree with weights on branches ratherthan on nodes -- doesn’t assume constant rate of change…
Explanatory force of the model • Set of distances grows as • Set of binary-tree branch labels grows as • For 8 languages: we predict 28 numbers (the inter-language cognate proportions) with 14 numbers (the binary tree branch proportions)
Inferred tree Toga -830-----:-919-----:-972-----:-947-----: Mosina -770-----' | | | Peterara -----829-----------' | | Nduindui -----795-----------:-949-----' | Raga -----755-----------' | Sakao -----567-----------:-883-----:-895-----' Fortsenal -----759-----------' | Malo ----------772----------------' Mosina/Toga: .77*.83 = .6391 (really 64%) Peterara/Mosina: .829*.919*.77 = .5866 (really 58%) Peterara/Toga: .829*.919*.830 = .6323 (really 64%) from Guy (1994)
True - predictedcognate percentages Toga 0 Mosina 1 -1 Peterara 1 -1 4 Nduindui -2 -1 0 0 Sakao 2 0 2 3 1 Malo -3 0 -1 -2 0 -2 Fortsenal -1 -1 -1 0 1 1 4 Raga The model fits very well!
Where’s the root? Isn’t it obvious? Toga -830-----:-919-----:-972-----:-947-----:--Protolanguage Mosina -770-----' | | | Peterara -----829-----------' | | Nduindui -----795-----------:-949-----' | Raga -----755-----------' | Sakao -----567-----------:-883-----:-895-----' Fortsenal -----759-----------' | Malo ----------772----------------'
Oops: other options protolanguage Toga -830-----:-919-----:-972-----:-947-----: Mosina -770-----' | | | Peterara -----829-----------' | | Nduindui -----795-----------:-949-----' | Raga -----755-----------' | Sakao -----567-----------:-883-----:-895-----' Fortsenal -----759-----------' | Malo ----------772----------------'
And some more… protolanguage Toga -830-:-919-:-972-:-947-:-895-:-883-:-567- Sakao Mosina -770-' | | | `-759- Fortsenal Peterara -----829---' | `---772----- Malo Nduindui -----795---:-949-' Raga -----755---' In the absence of other constraints, the root can be placed anywhere in the tree without changing the model’s fit!
Possible “other constraints” • Historical evidence • about earlier forms • about structure of relationships among contemporary forms • “outgroup” • Constraints on rate of change • linguistic (or genetic) “clock”
A universal constantfor glottochronology? Thirteen sets of data, presented in partial justification of these assumptions, serve as a basis for calculating a universal constant to express the average rate of retention k of the basic-root morphemes: k = 0.8048 ± 0.0176 per millennium, with a confidence limit of 90%. Lees (1953)
Some more retentive languages(rates per 1000 years) Bergsland & Vogt (1962)
Some less retentive ones Bergsland & Vogt estimate of vocabulary retention in East Greenlandic as .722 in 600 years, or .34 per millenium. David Lithgow (pers. com. circa 1970) has observed a replacement of some 20% of the basic vocabulary in Muyuw (Woodlark island) in one generation. Raise 0.8 to the 33rd power, and that gives you the retention rate of Muyuw per 1000 years should it continue to evolve at that rate: 0.06%. Jacques Guy (1994)
“Language chains” A .77 B .65 .76 C Configurations like this are taken as prima facie evidence of “non-treeness”, to be attributed to borrowing/mixing/cline types of situations. But in fact they can also easily be generated by variable rates of change: A ----------- 90% -----------. |____ protolanguage B ---- 95% ----. | |---- 90% ----' C ---- 80% ----' Note that the required difference in mean rate of change is only (.9-.9*.8)/.9 = .2 , or 20%
Three fascinating “results” • Mitochrondrial Eve • Mitochrondial Clans • The three-wave theory: converging linguistic and genetic evidence
Mitochondrial Eve Cann, Stoneking, and Wilson (1987): mtDNA comparisons of 147 people from Europe, Africa, Asia, Australia, and new Guinea show that all present human mtDNA is descended from a single African woman who lived about 200,000 years ago.
First problem • Computer program was used to find a tree consistent with the mtDNA data • But so were many other (unreported) trees! • order of answers depended on order of data • root could be effectively anywhere in the dataset • e.g. Melanesian Eve, Asian Eve, European Eve…
Other problems • mtDNA may not change at a constant rate • mtDNA changes may be adaptive • Gene trees may not be population trees • DNA (including mtDNA) can spread by gradual flow or by range expansion • spread can be influenced by other factors
Early results: Native Americans come from four genetic lineages, labeled A through D. Amerinds have all four lineages, NaDene only A, and Eskaleuts A and D. Current results: The four mtDNA lineages divide into nine distinct genetic subtypes. All four lineages are in all three language groups. Many local populations have all four lineages and a number even have all the subtypes. All subtypes can be found in North, Central and South America. “It isn't realistic to believe that the same lineages ended up in all these populations across two continents by separate migrations."
http://www.oxfordancestors.com/: Oxford Ancestors We put the Genes in Genealogy Oxford Ancestors is the World's first organization to harness the power and precision of modern DNA- based genetics in the service of genealogy. MatriLine™ interprets your deep maternal ancestry, linking you - if your roots are in Europe - to one of seven women: Ursula, Tara, Helena, Katrine, Velda, Xenia or Jasmine.
And MtDNA inheritance may not even be entirely clonal! • Mice • demonstration of “paternal leakage” • Hagelberg • rare mtDNA mutation in Vanuatu • Erye-Walker • statistics of mtDNA “homoplasies”
Island evidence • Erika Hagelberg (Proc. R. Soc. 1999) • Island of Nguna (Vanuatu, Melanesia) • 3 main MtDNA population groups • as expected for the region • In all three groups, the same mutation is sometimes found • previously known only from one Northern European • Repeated chance mutation is unlikely • local spread by recombination seems more probable
Statistics of mtDNA “homoplasies” • Mutations that occur in different mtDNA haplogroups around the world • Assuming purely maternal inheritance, these were thought to represent chance recurrence of mutations in “hypervariable” regions • Eyre-Walker et al. (Proc. R. Soc. 1999): • regions are not statistically more variable than others • mutations cluster geographically • MacCauley (1999) counters • much of the result comes from a dataset that may be errorful • “no need to panic”
Reaction of another mtDNA afficionado: …I am reminded of a comment by a bishop’s wife in Victorian England, also concerning human origins: “Let us hope that it isn’t true, and if it is, that it will not become generally known.”