1 / 70

Corpora in Linguistic Research

Corpora in Linguistic Research. 南京大学 李长生 电话: 025-8443-6787 Email : csli@jlonline.com. Order of Presentation. I. Corpus Research versus Linguistic Research II. Influential Corpora III. Corpus Analysis IV. More on Statistical Analysis V. Q and maybe A (anytime during presentation).

esmeralda
Download Presentation

Corpora in Linguistic Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corpora in Linguistic Research 南京大学 李长生 电话:025-8443-6787 Email:csli@jlonline.com

  2. Order of Presentation • I. Corpus Research versus Linguistic Research • II. Influential Corpora • III. Corpus Analysis • IV. More on Statistical Analysis • V. Q and maybe A (anytime during presentation)

  3. I. Corpus Research versus Linguistic Research • Corpus Research=Linguistic Research • Language (features) • Learner language (features)

  4. I. Corpus Research versus Linguistic Research • Corpus Research≠Linguistic Research • (Large,) representative authentic data

  5. II. Influential Corpora • Native-speaker corpora • Learner corpora

  6. Native-speaker Corpora

  7. Collins Corpus/Bank of English • A 2.5-billion word analytical database of English. • Contains written material from websites, newspapers, magazines and books published around the world, and spoken material from radio, TV and everyday conversations.  • New data is fed into the corpus every month, to help the Collins dictionary editors identify new words and meanings from the moment they are first used. • Bank of English: part of the Collins Corpus. • Contains 650 million words from a carefully chosen selection of sources, to give a balanced and accurate reflection of English as it is used every day.

  8. British National Corpus • Contains approximately 100 million words of written texts (90%) and transcripts of speech (10%) in modern British English. • Can be accessed online remotely using the BNC Online service.

  9. American National Corpus • Contains 11.5 million words of written and spokenAmerican English data (8.3 million words for writing and 3.2 million words for speech)

  10. Longman/Lancaster Corpus • Contains about 30 million words of published English. • British data takes up 50% and American data 40% while the other 10% represents other varieties such as Australian, African and Irish English.

  11. Learner Corpora

  12. International Corpus of Learner English • Contains argumentative essays written by advanced learners of English, i.e. university students of English as a foreign language (EFL) in their 3rd or 4th year of study. • Contains over 2.5 million words in the form of 3,640 texts ranging between 500-1,000 words in length written by EFL learners from 11 mother tongue backgrounds, namely, Bulgarian, Czech, Dutch, Finnish, French, German, Italian, Polish, Russian, Spanish, and Swedish.

  13. CLEC • Contains one million words from writing produced by Chinese learners of English from five proficiency levels: middle school students, junior and senior non-English majors, and junior and senior English majors. • Annotated with learner errors using an annotation scheme which consists of 61 error types clustered in 11 categories.

  14. SWECCL • 包含我国英语专业大学生的口语和笔语总共约200万词

  15. LSECCL • Year 1 • Recording 1 • Task 1 - Reading aloud • Task 2 - Monologue - The Most Unforgettable Birthday • Task 3 - Dialogue - Holiday plan • Recording 2 • Task 1 - Retelling • Task 2 - Monologue - Whether it is appropriate for college students to rent apartments outside the campus and live there • Task 3 - Dialogue - Whether exams should be abolished

  16. LSECCL • Year 2 • Recording 1 • Task 1 - Reading aloud • Task 2 - Monologue - Describe one of your persons you admire most • Task 3 - Dialogue - What gift to buy for a friend - Lily • Recording 2 • Task 1 - Retelling • Task 2 - Monologue - Make critical comments on the use of electronic dictionaries among college students • Task 3 - Dialogue - Whether it is a good practice or not to keep one’s own computer in dorm

  17. LSECCL • Year 3 • Recording 1 • Task 1 - Reading aloud • Task 2 - Monologue - Describe one of your experiences when you had a great ambition to do something • Task 3 - Dialogue - Talk about ways of relaxation after a month-long preparation for an exam • Recording 2 • Task 1 - Retelling • Task 2 - Monologue - Do you think it is appropriate for college students to get married • Task 3 - Dialogue - Talk about the necessity of having certificates

  18. LSECCL • Year 4 • Recording 1 • Task 1 - Reading aloud • Task 2 - Monologue - The Most Unforgettable Birthday • Task 3 - Dialogue - Holiday plan • Recording 2 • Task 1 - Retelling • Task 2 - Monologue - Whether it is appropriate for college students to rent apartments outside the campus and live there • Task 3 - Dialogue - Whether exams should be abolished

  19. III. Corpus Analysis • (Tagging corpus data) • Calculating frequencies and frequencydifferences • Frequencies of occurrence • Frequencies of co-occurrence • Frequency differences across registers/corpora/ periods of time • (Transferring frequencies) • Statistical analysis

  20. Lexis • 《大学英语课程教学要求》(2007) 参考词汇表

  21. Lexis • headwords

  22. Lexis • meanings: deal (Biber et al., 1998)

  23. Lexis • synonyms: utterly, perfectly

  24. Lexis • synonyms: big, large, great (Biber et al., 1998)

  25. Lexis • collocations: system

  26. Lexis • chunks (Qi, 2006) • 第一步: 运行WordList • 第二步: 选定语料库 • 第三步: 制作索引 • 第四步: 点击计算(Compute)Clusters

  27. Grammar • that-clause, to-clause (Biber et al., 1998) <V* that <CST> to <TO> * <V?I>/to <TO> * <R* * <V?I>/to <TO> * <R* R <* * <V?I>

  28. Grammar • syntactic co-occurrences of try (McEnery and Wilson, 2001)

  29. Learner Language • Frequency differences across corpora • Frequency differences across periods of time

  30. Across Corpora ICLE L1 (NNS-NNS) SWECCL L1 (NNS-NS) BNC

  31. Corpus Analysis

  32. Tagging Corpus Data • CLAWS • book  book_NN1 • 超级批量文本替换 • book_NN1  book <NN1>

  33. Calculating Frequencies and Frequency Differences • passive voice (be done) (Li, 2007a) • * <VB* * <V?N>

  34. Statistical Analysis • 差异 • 两库或三库 • 1. chi-square • Under Analyze, choose Descriptive Statistics, then Crosstabs. Move one variable into the Row(s) box and the other into the Column(s) box. Click Statistics, and check off Chi-square. Click Cells, and check off Expected. • 2. one-way chi-square • Under Analyze, choose Nonparametric Tests, then Chi-Square. Move the variable into the Test Variable List box. Click OK.

  35. Another Example • AWL (Li, 2007a) • +matchlist

  36. Across Periods of Time LSECCL Grades (Year 1-Year 2-Year 3-Year 4)

  37. Li (2007b) Title • 1)    Key terms • 3)    Noun phrase • 4)    Word limit (<20) • 5)    Capitalization

  38. Abstract • Summary

  39. Acknowledgments • Specific

  40. Introduction • Motivation for the study, theoretical and practical significance of the study, overall structure

  41. Literature Review • Key terms • Theoretical issues • Empirical studies • Unresolved issues

  42. Literature Review • Bibliographies/Indices/Databases (ERIC, NJU, Google Scholar, corpus4u) • Papers (Chen, 2004) • Journals (Applied Linguistics, Language Learning) • Books (FLTRP)

  43. Research Questions LSECCL Grades (Year 1-Year 2-Year 3-Year 4)

  44. Corpus Analysis

  45. Tagging Corpus Data • Microsoft Word • I think  I think <sv> <ip> <cm> <0>

  46. Calculating Frequencies and Frequency Differences • <sv>/<ap>/<dn> • <cm>

  47. Transferring Frequencies • Microsoft Excel • =COUNTIF(N1:N5000,"D:\YEAR1\1-2-B02B.TXT")

  48. Statistical Analysis • Changes in frequency differences • 三次或三次以上数据 • Wilcoxon • Under Analyze, choose Nonparametric Tests, then2 Related Samples. Move the variables into the Test Pair(s) List box.

  49. Results and Discussion • Answers to the research questions, and reasons for the answers

  50. Conclusion • Summary of the findings, theoretical and practical implications of the findings, and limitations of the study

More Related