1 / 25

Concordances, collocations and connotation

Concordances, collocations and connotation. Barnbrook G (1996) Language and Computers . Edinburgh: EUP. Chapters 3,4,5 Partington A (1998) Patterns and Meanings . Amsterdam: John Benjamins. Chapters 1,2,4. Lexical information in corpora.

smithp
Download Presentation

Concordances, collocations and connotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concordances, collocations and connotation Barnbrook G (1996) Language and Computers. Edinburgh: EUP. Chapters 3,4,5 Partington A (1998) Patterns and Meanings. Amsterdam: John Benjamins. Chapters 1,2,4

  2. Lexical information in corpora • Start looking at the kind of information (about individual words) that can be got from corpora • Simple frequency information • Distribution information • Collocation (co-occurrence information) • Connotation (semantic prosody) • Introduce basic ideas • Future topics • Statistics • Case studies

  3. Frequency information • Most banal information: counting how many times a word (“type”) appears in a text • Most frequent words will be function words, so often f counts exclude words listed in a “stop list” • Should you count words or lemmas? • Should you distinguish alternate meanings of ambiguous word forms (if you can)?

  4. Frequency information • Frequency information on its own is not particularly interesting • Quite useful to comparef of related words • eg alternative readings of a given word form (already seen in probability calculations in tagging) • or comparing near synonyms, especially if we can take context into account (see later) • f of a given word in a given context can be indicative, eg pronouns more frequent as subject or 1st word of sentence

  5. Types and tokens • Remember distinction between “tokens” (words) and “types” (different words) • Type count gives a measure of how many DIFFERENT words are used • Type-token ratio gives a measure of “vocabulary richness” • If vocabulary is very varied, TTR will be higher • TTR is very sensitive to overall text length, so it is not meaningful to compare TTRs for texts of different lengths • Standardized TTR is the average of the TTR for each sequence of n words (typical default n=1000) in a text or corpus

  6. Vocabulary growth curve • Plotting types against tokens for a given text shows us how the TTR grows as the text gets longer • Typically, the curve starts steeply and then flattens, sooner or later reflecting homogeneity (or otherwise) of the text VGC for Macbeth in Basic English source: http://web.missouri.edu/~youmansc/vmp/help/Youmans-TypeToken.pdf

  7. Vocabulary growth curve • Comparative VGC for four texts • Simple measure used in some literary studies (a) (b) (c) (d) (a) Longfellow (b) Hemingway (c) Basic English (Macbeth) (d) Bible (Genesis 2)

  8. Vocabulary in context • “Concordance”, also known as KWIC list (key word in context) • Allows us to see the (immediate) environment in which a word appears • Listings can be customised to show what you want more clearly, eg • sorted according to next or previous word • showing more or less context

  9. source: A Partington Patterns and Meanings. Amsterdam (1998): John Benjamins

  10. CIWK search • inverted KWIC • specify the context and look to see what words occur in it

  11. Collocation • Term coined by J R Firth (1957) to characterise (part of) his theory of meaning • “You shall judge a word by the company it keeps” • “The occurrence of two or more words within a short space of each other in a text” (Sinclair 1991) • “The relationship a lexical item has with items tha appear with greater than random probability in its (textual) context” (Hoey 1991; emphasis added)

  12. Collocation, text type and style • Distinguish between general and more usual collocations vs technical and more personal ones • eg in a general corpus time collocates with save, spend, waste, fritter away, … • but in a corpus of sports reports time collocates with half, full, extra, injury, first, second, third, …

  13. Collocation and idiom • Listing collocations will often reveal idioms and cliches • Important to think of collocation as extending beyond neighbouring words (which can be captured by simple concordances)

  14. Collecting collocations • If we are to look beyond neighbouring words, what constraints might we impose? • Collocation means co-occurrence within some defined context • possibly a “window” of n words to left and/or right • if corpus is tagged/parsed, we can look at collocations within structures • or we can define the window in terms of constituents rather than words

  15. Measuring significance • The significance of any co-occurrences nees to be established • Raw co-occurrence frequency counts mean nothing • Need to be compared to something else • Need to compare a given co-occurrence with random chance, or with some other co-occurrence • More detail next time

  16. Collocation and synonymy • Collocation is good evidence in discussing (near) synonymy • Lots of studies take near synonyms and look to see if the nature of their relationship can be characterised by their distribution • In other words: what words does each of the synonym set collocate with? • Especially useful for language learners

  17. Example of sheer and synonyms • (from Partington book) • three senses (LDOCE) • pure, ‘nothing but’, eg sheer luck • steep, sheer drop • thin, sheer stockings • (Cobuild) use sheer to emphasize completeness of state • 92 occurrences of sheer (in meaning 1) in his corpus

  18. collocations of sheer • expression of magnitude of weight or volume to right (20%) • volume, weight, numbers, mass, scale, quantity, size • almost always with article the • expression of force, strength or energy (22%) • energy. exertion, force, muscle, strength, power, pressure, fury, pace, intensity • usually with the, or a preposition but no article • expression of persistence (14%) • pesistence, irreversibility, obstinacy, indomitability, insistence, reliability, integrity, hard work • left context: through, because of, out of, expressing causation, but not the

  19. collocations of sheer • nouns expressing strong emotion (11%) • fun, joy, panic, inspiration, enjoyment, terror • nouns expressing extreme personal qualities (11%) • beauty, glamour, brutality, thuggery, madness, folly • nouns expressing extreme ability or lack of same (8%) • expertise, competence, virtuosity, gamesmanship

  20. Synonyms of sheer - pure • LDOCE definitions, 5 meanings of which two overlap: • not mixed with anything • complete, thorough • Corpus has 135 examples • Larger variety of syntactic environments (sheer was always modifying a noun) including predicative, which sheer does not occur in • *? The drop was sheer • * His fury was sheer

  21. Synonyms of sheer - pure • Religious-moral context; sense of unmixed • doctrine, faith, goodness; chemicals, gold • But, many examples where it has an emphasizing function, like sheer • accident, chance, comedy, guesswork, honesty, idiocy, malice, nostalgia, pleasure, selfishness, talent, theatre, vulnerability, whim, wickedness • often with proper nouns (unlike sheer) • No examples of pure collocating with items expressing magnitude, force or persistence • Some overlap with sheer • personal qualities, emotion (though generally less extreme ones) • Only few examples of pure in prepositional phrase expressing causation; causes can be sheer, but states are pure

  22. Other synonyms of sheer • Partington does similar analysis of complete and absolute • Showsthateach of the “synoynms” has more typical uses and patterns, though there is some overlap • But there is also clear evidence of complementary usage

  23. Connotation and semantic prosody • Collocation can also be used to illustrate connotation • “secondary implications of a word” (Lyons 1977) • Three distinct uses of the term • marker of a particular speech variety (eg lovely) • cultural implications (words used to describe women show what society thinks of them) • marker of speakers evaluation (firm ~ stubborn) • “Semantic prosody” (Sinclair 1987) • use of a certain word spreads its connotation over the whole utterance

  24. Some examples • object of commit is often something bad (foul, deception, offence) • if something is described as rife, it is not good (crime, disease, mistakes), and describing it as rife expresses a negative connotation (speculation is rife) • both the above exemplify “unfavourable prosody”, but other prosodies are possible • good example claim vs admitresponsibility for an atrocity

  25. More power to your elbow • Examples given in last few slides were largely subjective • More interesting if we can back up observations with calculations of statistical significance • Next time we will look at some simple statistical measures

More Related