1 / 15

Bridging First Language Acquisition & Historical Dialectology with Digital Humanities

This project aims to merge resources from language acquisition and dialect geography through a demonstrator tool for interdisciplinary research on lexical characteristics of concepts.

dmiddleton
Download Presentation

Bridging First Language Acquisition & Historical Dialectology with Digital Humanities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridging the Gap between First Language Acquisition and Historical Dialectology with the Help of Digital Humanities Folkert de Vriend & Martin Snijders 18/11/2011

  2. Time and team • Project duration: 1 year (may 2011 - may 2012) • Multi-disciplinairy team: • Leonie Cornips • Wilbert Heeringa • Marc Kemps-Snijders • Martin Snijders • Student assistants: Anke, Gertruud, Yvonne • Jos Swanenberg • Folkert de Vriend

  3. General • COAVA: COgnition, Acquisition and VAriation Tool • Aims of COAVA: • A) Curation of resources from two separate linguistic subdisciplines: first language acquisition and dialect geography. • B) Development of a demonstrator tool for interdisciplinary research into the lexical characteristics of concepts

  4. A) Curation

  5. Resources in COAVA • Seven corpora from CHILDES • The Netherlands and Flanders • Children (mostly between 2 and 3,5 years) • Part III of WBD/WLD • (Dutch and Flemmish) Brabant and Limburg • Adults

  6. CLARIN-compliance Dialect data and CHILDES data • CMDI-metadata • Persistent identifiers • ISOcat Dialect data • Lexical Markup Framework (LMF)

  7. B) Demonstrator

  8. Lexical characteristics • First language acquisition: • For some concepts the lexical form typically is acquired early (‘dog’ for instance) while for other concepts the lexical form typically is acquired later (‘blue titmouse’ for instance.).’ • Dialect geography: • For some concepts there is lot of lexical variation while for other concepts there is very little variation.

  9. Value of combined interpretation • For researchers in both disciplines these characteristics are interesting for at least two reasons: • Research into the ‘basic level vocabulary’ of a community • Research into the relation between age of acquisition and (dialect)variation

  10. Implementation • A concept taxonomy is constructed. This taxonomy will only contain concepts for which lexical forms can be found in both resources • Since the Dutch CHILDES data mostly contain data for children aged between 2 and 3,5 years of age we focus on lexical forms that are nouns. • To enable linking from this taxonomy to the CHILDES data, these first need to be lemmatised and tagged for their POS (Lexicon by Gilles)

  11. Demo

  12. Technology • Client server application • Search services • Java/Google Web Toolkit • Apache/Tomcat • Solr search server • Open Source

  13. Solr • Indices, multi core • Facetted search • Fast

  14. Demo

  15. Thank you

More Related