1 / 0

IA901 2012 Session Four

IA901 2012 Session Four. Lab Session: Corpora What is a corpus? What do corpora tell us about the English language Corpus-driven language description Practical application of corpora in the classroom. A link to last week…. HONIED or HONEYED? ENJOY → ENJOYED PLAY → PLAYED

kedem
Download Presentation

IA901 2012 Session Four

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IA901 2012 Session Four

    Lab Session: Corpora What is a corpus? What do corpora tell us about the English language Corpus-driven language description Practical application of corpora in the classroom
  2. A link to last week…

  3. HONIED or HONEYED? ENJOY → ENJOYED PLAY → PLAYED WORRY → WORRIED HURRY → HURRIED MONEY → MONIED / MONEYED?
  4. “Honeyed” is almost 40 times as common (online) as “honied”
  5. Also in relation to last week’s session, I found that: “mother-in-laws” is almost 50% more common than “mothers-in-law” “tablespoonfuls” is 12 times more common than “tablespoonsful” “passersby” is almost 17 times more common than “passerbys” “gin and tonics” is over 60 times more common than “gins and tonic” “works of art” is over 250 times more common than “work of arts”
  6. What is a corpus?

  7. What can it tell us?

  8. Where do you think this word list comes from?
  9. And this?
  10. created using wordle.net
  11. So… is my IA902 corpus a “principled collection of texts available for qualitative and quantitative analysis”? (Biber, Conrad, Reppen, 1998)
  12. A history of corpora

  13. 1700s: Dr Johnson wrote the first comprehensive dictionary of English, compiled by manually collating samples of language from 1560-1660.
  14. 1960s Brown Corpus of Standard American English : first of the modern, computer readable, general corpora 1980s John Sinclair & colleagues: Collins Birmingham University International Language Database (COBUILD) 1987 Collins COBUILD English Dictionary 1990 Willis: the Lexical Syllabus 2007 Cambridge International Corpus => 1 billion words
  15. ANC BASE BNC BoE BROWN CIC CANCODE COBUILD MICASE American National Corpus British Academic Spoken English British National Corpus Bank of English Brown University Cambridge International Corpus Cambridge & Nottingham Corpus of Discourse in English Collins Birmingham University International Language Database Michigan Corpus of Academic Spoken English
  16. Corpora not limited to general or native-speaker data: Business and Academic corpora The International Corpus of Learner English VOICE (The Vienna-Oxford International Corpus of English) is a collection of English as a Lingua Franca Corpus development with the idea of SUEs (Successful Users of English) as a model How big does a corpus need to be?
  17. What do corpora tell us?

    Frequency of individual words Frequency of “chunks”
  18. Frequency of individual words

  19. Word Freq % 1 I 13 9.77 2 YESTERDAY 9 6.77 3 TO 8 6.02 4 MM 6 4.51 5 NOW 5 3.76 6 OH 4 3.01 7 SHE 4 3.01 8 A 3 2.26 9 AWAY 3 2.26 10 BELIEVE 3 2.26
  20. Within a bigger corpus (say, 5 million words), which words would you expect to occur most frequently? Write down 10 words that you’d expect to be in the top 50. What differences would you expect to find between lists of the most frequent words in corpora of WRITTEN and SPOKEN English?
  21. From O’Keefe et al (2007)
  22. From O’Keefe et al (2007)
  23. O’Keefe et al (2007) divide the 2000 most frequently occurring words in the CIC and CANCODE corpora into 4 sub-lists: A = 1-500 B = 501-1000 C = 1001-1500 D = 1501-2000. Can you identify the most frequently-occurring word in each set below?
  24. O’Keefe et al (2007) divide the 2000 most frequently occurring words in the CIC and CANCODE corpora into 4 sublists: A = 1-500 B = 501-1000 C = 1001-1500 D = 1501-2000. Can you identify the most frequently-occurring word in each set below?
  25. “The broad categories of a basic vocabulary” (O’Keefe et al, 2007)
  26. “The broad categories of a basic vocabulary” (O’Keefe et al, 2007)
  27. “The broad categories of a basic vocabulary” (O’Keefe et al, 2007)
  28. “The broad categories of a basic vocabulary” (O’Keefe et al, 2007)
  29. “The broad categories of a basic vocabulary” (O’Keefe et al, 2007)
  30. “The broad categories of a basic vocabulary” (O’Keefe et al, 2007)
  31. “The broad categories of a basic vocabulary” (O’Keefe et al, 2007)
  32. Three Relevant Word lists? The General Service List (Michael West, 1953) The Academic Word List (AverilCoxhead, 2000) The Academic Keyword List (MagaliPaquot, 2010)
  33. Good news for the beginner? Bad news for the advanced-level student? From O’Keefe et al (2007)
  34. Frequency of “chunks”

    Collocation Strings of words Colligation
  35. Definitions Biber et al (2002): Collocation : “a combination of lexical words which frequently co-occur in texts” Lexical Bundle : “a sequence of words which is used repeatedly in texts”
  36. Alternatives: Collocation: “just the way we say it”? “the occurrence of two or more words within a short space of each other in a text” (Sinclair, 1991) “the relationship a lexical item has with items that appear with greater than random probability in its (textual) context” (Hoey, 1991) “a psychological association between words (rather than lemmas) up to four words apart =…evidenced by their occurrence together in corpora more often than is explicable in terms of random distribution” (Hoey, 2005) “the lexical company that words keep” (Hoey, 2011)
  37. Collocations Dictionaries
  38. username: mholloway, password: ia902
  39. What words collocate with both STUDY and RESEARCH?
  40. What words collocate with both STUDY and RESEARCH?
  41. “Chunks” : how long? How significant?
  42. Put the following items in order of the frequency with which they are used in spoken English: a bit of and things like that regularly since this that and the other twice From O’Keefe et al (2007)
  43. From O’Keefe et al (2007) a couple of, possible, at the moment, alone, all the time, fun, in terms of, something like that, expensive, you know what i mean, stairs, at the same time, nowhere
  44. Commonly-occurring six-word chunks: Do you know _______ _______ _______? At the end _______ _______ _______ And all the rest _______ _______ And all that sort _______ _______ I don’t know _______ _______ _______ Do you know what I mean? At the end of the day All of the rest of it And all that sort of thing I don’t know what it is
  45. From O’Keefe et al (2007)
  46. “a bit” is the 24th most common two-word chunk in CANCODE but,…what does “a bit” mean? Does it have any meaning by itself? How meaningful is “a bit” as a quantifier? What about its “hedging” function? It also belongs to several “frames”: e.g. it was a bit of a mess problem performance hassle nuisance bargain
  47. COLLIGATION : Where lexis meet grammar? Data on language usage tells us that: “a bit” is more likely than “the bit” “a bit” is likely to be followed by “of” + NP “a bit” is more likely to be used in an object position than a subject position
  48. From the CompleatLexcial Tutor:
  49. Entitle – Active or Passive?
  50. ILLUSTRATE and DRAW Among the many differences you may have found between these two words, did you discover anything about COLLIGATION? DRAW is a more frequent item than ILLUSTRATE Both verbs are frequently preceded by “to”. Relatively speaking, ILLUSTRATE occurs significantly more frequently with “to” than DRAW does ILLUSTRATE is frequently used in INFINITIVE CLAUSE To illustrate this, we can compare concordance lists of each word using any of the websites linked to on the IA902 blog.
  51. Widening context / Narrowing meaning

    Written and spoken contexts Semantic association Semantic prosody
  52. Differences in spoken and written English:- data on spoken English reflects an orientation to the “speaker-listener world in conversation”. (I, you)- spoken discourse markers (well, right)- high frequency items that are arguably not words at all (yeah, oh, er)
  53. What functions do ABSOLUTELY and DEFINITELY have in spoken English?
  54. What would you expect to be the most common uses of the words LIKE and MEAN?
  55. Collocates for LIKE & MEAN (BNC Written & Spoken + Brown) MEAN I=611 you=86 not=38 the=29 would=27 to=13 we=11 Didn’t=10 may=10 not=14 something=13 is=12 much=11 the=11 you=11 feel=10 will=9 a=8 could=8 that=8 Don’t=7 it=7 can=6 necessarily=6 (mm=2) LIKE would=35 look=27 was=25 I=20 looked=18 looks=17 and=15 more=15 just=14
  56. Semantic asssociationSemantic prosody
  57. Collocations: inner ear, glue ear; a clip round the ear; she whispered in his ear; ear, nose, and throat doctor; hear a voice in your ear Semantic association: parts of the body Semantic prosody???
  58. What’s the difference between SKINNY and SLIM? Slim : elegant, graceful Skinny: sick, shy?
  59. Differences between HANDSOME and PRETTY
  60. Differences between HANDSOME and PRETTY
  61. Differences between HANDSOME and PRETTY
  62. How would explain the difference between CAUSE and PROVIDE?
  63. How would explain the difference between CAUSE and PROVIDE?
  64. Materials

  65. Corpus-informed publications for students
  66. Corpus-informed publications for students
  67. Corpus-informed publications for students
  68. Corpus-informed publications for students
  69. Corpus-informed publications for students
  70. For teachers: Corpus-informed or “impulse-based”?
  71. Activities

  72. From Cobb (1997)
  73. Discussion

  74. Disadvantages? overly-reliant on technology? does navigation of corpora also require an element of “instinct”? the dangers of becoming “corpus-bound”
  75. From O’Keefe et al (2007)
  76. For further exploration

  77. - what do corpora tell us about existing theories of language? (see Hoey, 2005) - how can YOU use corpora in your teaching? - what use can you make of corpora in your research?
More Related