1.82k likes | 1.93k Views
A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster. Overview. Project: ASJP ( A utomated S imilarity J udgment P rogram). Overview. Project: ASJP are: S ö ren Wichmann (BRD; Netherlands) Viveka Velupillai (BRD) Andr é Müller (BRD)
E N D
Advances inAutomatedLanguageClassificationASJP ConsortiumDik Bakker, Lancaster
Overview Project: ASJP (Automated Similarity Judgment Program) ASJP: Automatic Reconstruction
Overview • Project: • ASJP are: • Sören Wichmann (BRD; Netherlands) • Viveka Velupillai (BRD) • André Müller (BRD) • Robert Mailhammer (BRD) • Hagen Jung (BRD) • Eric Holman (US) • Anthony Grant (UK) • Dmitry Egorov (Russia) • Pamela Brown (US) • Cecil Brown (US) • Dik Bakker (UK; Netherlands) ASJP: Automatic Reconstruction
Overview Project: ASJP (Automated Similarity Judgment Program) ASJP: Automatic Reconstruction
Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships ASJP: Automatic Reconstruction
Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships Basis: Distance matrix between individual languages on basis of linguistic features ASJP: Automatic Reconstruction
Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships Basis: Distance matrix between individual languages on basis of linguistic features Method: Lexicostatistics: mass comparison of lexical items ASJP: Automatic Reconstruction
Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals (a.o): ASJP: Automatic Reconstruction
Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications ASJP: Automatic Reconstruction
Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages ASJP: Automatic Reconstruction
Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families ASJP: Automatic Reconstruction
Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families -Search for (ir)regularities in phylogenies ASJP: Automatic Reconstruction
Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families -Search for (ir)regularities in phylogenies -Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) ASJP: Automatic Reconstruction
Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families -Search for (ir)regularities in phylogenies -Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) -Experimentally find the best/optimal dating method ASJP: Automatic Reconstruction
Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families -Search for (ir)regularities in phylogenies -Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) - Experimentally find the best/optimal dating method - Detect borrowings ASJP: Automatic Reconstruction
Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families -Search for (ir)regularities in phylogenies -Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) - Experimentally find the best/optimal dating method - Detect borrowings Today ... ASJP: Automatic Reconstruction
Overview 1. The basic list of lexical items ASJP: Automatic Reconstruction
Overview 1. The basic list of lexical items 2. Comparing languages ASJP: Automatic Reconstruction
Overview 1. The basic list of lexical items 2. Comparing languages 3. Some results: genetic and areal proximity ASJP: Automatic Reconstruction
Overview 1. The basic list of lexical items 2. Comparing languages 3. Some results: genetic and areal proximity 4. On Inheritance vs Borrowing ASJP: Automatic Reconstruction
Overview 1. The basic list of lexical items 2. Comparing languages 3. Some results: genetic and areal proximity 4. On Inheritance vs Borrowing 5. Conclusions ASJP: Automatic Reconstruction
1. The basic list of lexical items ASJP: Automatic Reconstruction
Lexical items Word list: Swadesh 100 basic meanings ASJP: Automatic Reconstruction
Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages ASJP: Automatic Reconstruction
Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar ASJP: Automatic Reconstruction
Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed ASJP: Automatic Reconstruction
Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent ASJP: Automatic Reconstruction
Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time ASJP: Automatic Reconstruction
Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time - Few synonyms ASJP: Automatic Reconstruction
Lexical items: further reduction Early analyses have shown: - Optimal 40/100 item subset gives same results ASJP: Automatic Reconstruction
Lexical items: further reduction • Early analyses have shown: • - Optimal 40/100 item subset gives same results Lesswork ASJP: Automatic Reconstruction
Lexical items: further reduction • Early analyses have shown: • - Optimal 40/100 item subset gives same results Lesswork Lessmissingdata ASJP: Automatic Reconstruction
Lexical items: further reduction • Early analyses have shown: • - Optimal 40/100 item subset gives same results Lesswork Lessmissingdata • Fasterprocessing; combinatorial explosion: 40 : 100 ~ 3 * 107 : 2 * 1010 ASJP: Automatic Reconstruction
Lexical items: stability Most stable items: ASJP: Automatic Reconstruction
Lexical items: stability Most stable items: Iteratively throw out the most unstable item in terms of variation within genera (3500-4000 years; Dryer 2001; 2005) E.g. Germanic, Romance, Slavic, … ASJP: Automatic Reconstruction
Lexical items: stability Most stable items: Iteratively throw out the most unstable item in terms of variation within genera (3500-4000 years; Dryer 2001; 2005) E.g. Germanic, Romance, Slavic, … Formula: S = (E - U)/(100 - U) (weighted average % matches Eq vs Uneq) ASJP: Automatic Reconstruction
Ethnologue (Goodmann-Kruskal) WALS (Pearson) ++ < Stability > -- ASJP: Automatic Reconstruction
40 Most Stable ASJP: Automatic Reconstruction
H o m o p h o n e s ASJP: Automatic Reconstruction
Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: ASJP: Automatic Reconstruction
Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: - data entry via keyboard ASJP: Automatic Reconstruction
Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: - data entry via keyboard - simple programming language (Fortran; Pascal) ASJP: Automatic Reconstruction