1 / 25

Andy C. Chin The Hong Kong Institute of Education andychin@ied.hk

PNC2013 Kyoto University December 10-11 2013. New Language Resources for Cantonese Linguistics Research: A Linguistic Corpus of Mid-20 th Century Hong Kong Cantonese. Andy C. Chin The Hong Kong Institute of Education andychin@ied.edu.hk. Outline . Why “Cantonese”?

sancho
Download Presentation

Andy C. Chin The Hong Kong Institute of Education andychin@ied.hk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PNC2013 Kyoto University December 10-11 2013 New Language Resources for Cantonese Linguistics Research: A Linguistic Corpus of Mid-20th Century Hong Kong Cantonese Andy C. Chin The Hong Kong Institute of Education andychin@ied.edu.hk

  2. Outline • Why “Cantonese”? • Research on early Cantonese (19th - mid-20th C) – Diachronic development • The corpus • Source of data • Demonstration of search engine

  3. Cantonese in Hong Kong

  4. Cantonese • One of the dialects of the Chinese language family • In spite of being a dialect, Cantonese serves as a lingua franca in Hong Kong, Macau and most part of Guangdong Province of China

  5. Use of Cantonese

  6. “Cantonese” in early Hong Kong • A fishing village • Population: 1851: ~33,000 • Four major ethnic groups: • Guangfu廣府 (本地) • Danjia蛋家 (seafaring people) • Hakka客家 • Min閩語(鶴佬/潮州) • Their languages are mutually unintelligible

  7. Given the long history of Cantonese in HK • We are interested in understanding its development in the past 200 years • Are there any differences between early Cantonese and modern Cantonese? • How can we capture these differences?

  8. Diachronic studies of Cantonese • Two approaches • Apparent time approach • Real time approach

  9. Apparent time approach • age-stratified variation in a linguistic form is often indicative of a change in progress • 75 vs. 50 vs. 25 y/o  changes over 50 years • language of 200 years ago? • language change:Can we assume a speaker still speak the language of his time? • if two speakers show no difference with respect to a linguistic feature, does it mean that there has been no change?

  10. Real time approach • samples the population over an extended period of time – longitudinal study • To collect data produced in the period concerned

  11. Limitations on Research in Cantonese • Cantonese is a vernacularlanguage • Spoken data is needed • Any records of Cantonese of early 19th-C? - spoken data vs. written records

  12. With these early materials, • We are able to reconstruct the early stage of the Cantonese language (about 200 years ago) • Some of the linguistic features are very different from those in modern Cantonese

  13. Previous research on Cantonese Neutral Qs Directional complements Aspect markers demonstratives phonology Verb complement … Comparative construction Lexicon (sociolinguistics) Dative verb GIVE Sentence final particles Grammar of the late Qing period …

  14. Furthermore, • Some linguistic changes took place/completed around the mid-20th century • Dative marker: 過  畀 (送本書過/畀佢) • Neutral Q:你去睇戲唔呀 你去唔去睇戲呀 • … • New and old features might co-exist in mid-20th C

  15. ~66 years 120 years Morrison (1828) Chao (1947) 2013

  16. Existing Cantonese corpora • The Hong Kong Cantonese Child Language Corpus • The Hong Kong Bilingual Child Language Corpus • Hong Kong Cantonese Corpus • The Hong Kong Cantonese Adult Language Corpus • 19th Century Cantonese Corpus

  17. Source of corpus data • Real time vs.Apparent time • Naturally occurring data • HK Cantonese movies(粵語長片)

  18. http://corpus.ied.edu.hk/hkcc/

  19. HK Movie Industry in mid-20th C. Year No. of Cantonese movies No. of PTH movies 1952 - 1955 627 222 1956 - 1960 963314 1961 - 1965 928 206 1966 - 1970 361 286 Total 2879 1028 Source of data:Chung (2004:177)

  20. About the corpus • 21 movies have been transcribed with Chinese characters: ~200k characters • Word segmentation • search engine (14 movies, since Apr 2012) • http://corpus.ied.edu.hk/hkcc/ • 350+ registered users

  21. Search criteria • Characters or words (segmented units) • Cantonese pronunciation • Movie names • Names of speakers • Gender of speakers • …

  22. 契爺艷史(1952) • Yes-No question • VP-Neg: 你位千金有讀書冇呀? • V-Neg-VO: 呢道係咪有位黃小姐? • Dative marker • 重要畀錢過人? • 咪可以快啲還清啲債畀人?

  23. Some challenges • Quality of speech • Overlap of speech • Representations of colloquial vocabulary • Parts-of-speech: How many types? • Discourse features • …

  24. Acknowledgments • ECS research grants, RGC: • Linguistic Analysis of Mid-20th Century Hong Kong Cantonese by Constructing an Annotated Spoken Corpus (2013/2015) • HKIEd Internal Research Grants: • RG41/2010-2011: Spoken Corpus Construction and Linguistic Analysis of Mid-20th Century Cantonese • RG62/12-13R: A Preliminary Linguistic Analysis of Mid-20th Century Cantonese from a Corpus-based Approach

  25. Demonstration • http://corpus.ied.edu.hk/hkcc/

More Related