1 / 182

A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster. Overview. Project: ASJP ( A utomated S imilarity J udgment P rogram). Overview. Project: ASJP are: S ö ren Wichmann (BRD; Netherlands) Viveka Velupillai (BRD) Andr é Müller (BRD)

hedda
Download Presentation

A dvances in Automated Language Classification ASJP Consortium Dik Bakke r, Lancaster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advances inAutomatedLanguageClassificationASJP ConsortiumDik Bakker, Lancaster

  2. Overview Project: ASJP (Automated Similarity Judgment Program) ASJP: Automatic Reconstruction

  3. Overview • Project: • ASJP are: • Sören Wichmann (BRD; Netherlands) • Viveka Velupillai (BRD) • André Müller (BRD) • Robert Mailhammer (BRD) • Hagen Jung (BRD) • Eric Holman (US) • Anthony Grant (UK) • Dmitry Egorov (Russia) • Pamela Brown (US) • Cecil Brown (US) • Dik Bakker (UK; Netherlands) ASJP: Automatic Reconstruction

  4. Overview Project: ASJP (Automated Similarity Judgment Program) ASJP: Automatic Reconstruction

  5. Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships ASJP: Automatic Reconstruction

  6. Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships Basis: Distance matrix between individual languages on basis of linguistic features ASJP: Automatic Reconstruction

  7. Overview Project: ASJP (Automated Similarity Judgment Program) Overall goal: Automatic reconstruction of language relationships Basis: Distance matrix between individual languages on basis of linguistic features Method: Lexicostatistics: mass comparison of lexical items ASJP: Automatic Reconstruction

  8. Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals (a.o): ASJP: Automatic Reconstruction

  9. Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications ASJP: Automatic Reconstruction

  10. Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages ASJP: Automatic Reconstruction

  11. Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families ASJP: Automatic Reconstruction

  12. Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families -Search for (ir)regularities in phylogenies ASJP: Automatic Reconstruction

  13. Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families -Search for (ir)regularities in phylogenies -Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) ASJP: Automatic Reconstruction

  14. Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families -Search for (ir)regularities in phylogenies -Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) -Experimentally find the best/optimal dating method ASJP: Automatic Reconstruction

  15. Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families -Search for (ir)regularities in phylogenies -Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) - Experimentally find the best/optimal dating method - Detect borrowings ASJP: Automatic Reconstruction

  16. Overview MAIN GOAL: Reconstruction of Language Relationships Derived goals: - Critical assessment and refinement of existing classifications - Classify newly described and unclassified languages - Estimate time depths between languages / genera / families -Search for (ir)regularities in phylogenies -Test hypotheses (e.g. Atkinson et al 2008; ‘elbow’ phenomenon) - Experimentally find the best/optimal dating method - Detect borrowings Today ... ASJP: Automatic Reconstruction

  17. Overview 1. The basic list of lexical items ASJP: Automatic Reconstruction

  18. Overview 1. The basic list of lexical items 2. Comparing languages ASJP: Automatic Reconstruction

  19. Overview 1. The basic list of lexical items 2. Comparing languages 3. Some results: genetic and areal proximity ASJP: Automatic Reconstruction

  20. Overview 1. The basic list of lexical items 2. Comparing languages 3. Some results: genetic and areal proximity 4. On Inheritance vs Borrowing ASJP: Automatic Reconstruction

  21. Overview 1. The basic list of lexical items 2. Comparing languages 3. Some results: genetic and areal proximity 4. On Inheritance vs Borrowing 5. Conclusions ASJP: Automatic Reconstruction

  22. 1. The basic list of lexical items ASJP: Automatic Reconstruction

  23. Lexical items Word list: Swadesh 100 basic meanings ASJP: Automatic Reconstruction

  24. Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages ASJP: Automatic Reconstruction

  25. Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar ASJP: Automatic Reconstruction

  26. Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed ASJP: Automatic Reconstruction

  27. Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent ASJP: Automatic Reconstruction

  28. Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time ASJP: Automatic Reconstruction

  29. Lexical items Word list: Swadesh 100 basic meanings - Word coined in most languages - Collected in field work lexicon / grammar - Inherited rather than borrowed - Culturally independent - Stable over time - Few synonyms ASJP: Automatic Reconstruction

  30. ASJP: Automatic Reconstruction

  31. ASJP: Automatic Reconstruction

  32. ASJP: Automatic Reconstruction

  33. ASJP: Automatic Reconstruction

  34. ASJP: Automatic Reconstruction

  35. ASJP: Automatic Reconstruction

  36. ASJP: Automatic Reconstruction

  37. Lexical items: further reduction Early analyses have shown: - Optimal 40/100 item subset gives same results ASJP: Automatic Reconstruction

  38. Lexical items: further reduction • Early analyses have shown: • - Optimal 40/100 item subset gives same results  Lesswork ASJP: Automatic Reconstruction

  39. Lexical items: further reduction • Early analyses have shown: • - Optimal 40/100 item subset gives same results  Lesswork  Lessmissingdata ASJP: Automatic Reconstruction

  40. Lexical items: further reduction • Early analyses have shown: • - Optimal 40/100 item subset gives same results  Lesswork  Lessmissingdata • Fasterprocessing; combinatorial explosion: 40 : 100 ~ 3 * 107 : 2 * 1010 ASJP: Automatic Reconstruction

  41. Lexical items: stability Most stable items: ASJP: Automatic Reconstruction

  42. Lexical items: stability Most stable items: Iteratively throw out the most unstable item in terms of variation within genera (3500-4000 years; Dryer 2001; 2005) E.g. Germanic, Romance, Slavic, … ASJP: Automatic Reconstruction

  43. Lexical items: stability Most stable items: Iteratively throw out the most unstable item in terms of variation within genera (3500-4000 years; Dryer 2001; 2005) E.g. Germanic, Romance, Slavic, … Formula: S = (E - U)/(100 - U) (weighted average % matches Eq vs Uneq) ASJP: Automatic Reconstruction

  44. Ethnologue (Goodmann-Kruskal) WALS (Pearson) ++ < Stability > -- ASJP: Automatic Reconstruction

  45. ASJP: Automatic Reconstruction

  46. 40 Most Stable ASJP: Automatic Reconstruction

  47. H o m o p h o n e s ASJP: Automatic Reconstruction

  48. Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: ASJP: Automatic Reconstruction

  49. Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: - data entry via keyboard ASJP: Automatic Reconstruction

  50. Lexical items: transcription First phase of project (2007): Problems with full IPA representation of words: - data entry via keyboard - simple programming language (Fortran; Pascal) ASJP: Automatic Reconstruction

More Related