310 likes | 691 Views
Constructing Linguistic Phylogenetic Tree 语言谱系树的构建. Chenxi Shao Zihe Li 邵晨曦 李子鹤. Tree representation of evolution. Darwin 1859 On the Origin of Species by Means of Natural Selection or The preservation of Favored Races in the Struggle for Life. Tree diagram
E N D
Constructing Linguistic Phylogenetic Tree语言谱系树的构建 Chenxi Shao Zihe Li 邵晨曦 李子鹤
Tree representation of evolution • Darwin 1859 On the Origin of Species by Means of Natural Selection or The preservation of Favored Races in the Struggle for Life. • Tree diagram • shared, derived characteristics
Phylogenetic tree • Zuckerkandl, E. and L. Pauling 1965 Molecules as documents of evolutionary history. J. theor. Biol 8: 357-366. • Comparison of DNA sequences
Quantitative representation • Meyers, L. F. and William S-Y. Wang 1963 Tree representations in Linguistics. Project on Linguistics Analysis Report 3, Ohio State University.
How to construct a linguistic phylogentic tree? • Wang, William S.-Y. and Zhongwei Shen 1992: • Four steps: • Selection of characters • Quantization of characters • Calculation of correlation coefficients • Clustering analysis —— Selection and encoding of linguistic information Mathematical algorithm
Mathematical algorithm • 1. Maximun parsimony 最大俭省算法 • Example: subgrouping of Bai dialects (Wang 2006) • 2. Neighbour-joining 邻接法 • Example: subgrouping of Yi dialects (Wang 汪锋 2010)
3. Average linkage 平均联结法(UPGMA法) • Example: Affinity among Chinese Dialects(Cheng, C.C.郑锦全1988) • 4. Minimum spanning 最短系连法(弗罗茨瓦夫分类法) • Example: Affinity among Chinese Dialects(Ma 马希文1989)
Selection and encoding of linguistic information • 1. Unique, Shared innovation characters • Classical version:the position of Armenian(Hübschmann 1875)
Modern version: Subgrouping Bai dialects (Wang 2006) • 19 innovation characters generalized from reconstruction. • Convert innovation to 1’s and preservation to 0’s
Penny in PHYLIP • Rooted tree
Problems: • In many cases, the defining of “innovation” is backstepping of assigning value to historical phonemes. • For many sound changes, we have no consensus on their universality.
2. Retention rate of Swadesh-100 words (Wang, William S-Y.1993) • Strict correspondence among languages • Calculate the proportion of Swadesh-100 words that satisfies strict correspondence rules between each two languages • Construct a matrix of affinity/distance between each two languages
Example: subgrouping of Austro-Yue Languages (Chen and He 2002)
Neighbor in PHYLIP • Unrooted tree
Problems: • Much debate on Swadesh-100 words • universality • choice of words • Uniformity in the rate of change
Inspiration from biological study • Phylogenetic classifications of different organisms employ different segments of gene: • The Prokaryotes:16SrDNA(王洪媛、江晓路等2004) • Some mammals:18SrDNA(刘诚刚、杜志恒等2012) • Some fishes:CRY61(孙婷、刘伟等2012)
Difference in homonymic relationship • Scope of examination: morphemes that can be undoubtedly reconstructed to the proto- language. • Compare the differences in homonymic relationships among those morphemes between each two languages.
L-language, M-morpheme, innov-innovation, Rel-relationship D- difference in homonymic relationships among morphemes between each two languages
difference in homonymic relationships among morphemes between each two languages shows distance between the two languages • Distance matrix • Principle: difference in homonymic relationships measures unshared innovation.
Merits: • 1. No need of a universal list of core words. • Morphemes that can be reconstructed are basic morphemes (words) in languages in question. • 2. Generalizing structural changes into differences in homonymic relationships among basic morphemes. • Specific values of sounds are out of consideration. • 3. Weight of different sound changes are taken into consideration. • Changes involving more morphemes are more important.
Demerit • Borrowings through correspondence among languages in question cannot be eliminated
Phylogenetic trees of Naxi • Characters/Parsimony
How to make choice ? • Character Approach is the optimal choice when you have confidence in the specific value of each historical phoneme (Indo-European languages) • Swadesh-100 words Approach is optimal choice when internal contact can be identified in the group of languages in question. (Chinese dialects) • D-value Approach is optimal choice when there is no evidence of internal contact. (Naxi and many other minority languages)
主要参考文献 • Hsieh, H-I. 1973 A new method of dialectal subgrouping. Journal of Chinese Linguistics 1. 64-92 • Hübschmann 1875 “On the position of Armenian in the Sphere of the Indo-European Languages”. In Lehmann ed. A reader in Nineteenth Century Historical Indo-European Linguistics. Bloomington: Indiana University Press, 1976. • Krishnamurti et al 1983 Unchanged cognates as a criterion in linguistic subgrouping. Language 59. 544-688 • Meyers, L. F. and William S-Y. Wang 1963 Tree representations in Linguistics. Project on Linguistics Analysis Report 3, Ohio State University. • Saitou, N. and M. Nei 1987 The neighbor-joining method: a new method of reconstructing phylogenetic trees. Miol. Boil. Evol. 4. 406-425
Wang, Feng. 2006. Comparison of Languages in Contact: the Distillation method and the case of Bai. Taipei: Academic Sinica. • Wang, William S-Y.1993 Glottochronology, lexicostatistics, and other numerical methods.收入《王士元语言学论文集》 • 陆致极 1986 《闽方言内部差异程度及分区的计算机聚类分析》,《语言研究》第2期 • 马希文 1989 《比较方言学中的计量方法》,《中国语文》第5期 • 汪锋 2010 《白彝关系语素研究》 国家社会科学基金结项报告 • 王士元、沈钟伟 1992《方言关系的计量表述》 《中国语文》第2期 • 郑锦全 1988 《汉语方言亲疏关系的计量研究》,《中国语文》第2期