250 likes | 267 Views
Concordances, collocations and connotation. Barnbrook G (1996) Language and Computers . Edinburgh: EUP. Chapters 3,4,5 Partington A (1998) Patterns and Meanings . Amsterdam: John Benjamins. Chapters 1,2,4. Lexical information in corpora.
E N D
Concordances, collocations and connotation Barnbrook G (1996) Language and Computers. Edinburgh: EUP. Chapters 3,4,5 Partington A (1998) Patterns and Meanings. Amsterdam: John Benjamins. Chapters 1,2,4
Lexical information in corpora • Start looking at the kind of information (about individual words) that can be got from corpora • Simple frequency information • Distribution information • Collocation (co-occurrence information) • Connotation (semantic prosody) • Introduce basic ideas • Future topics • Statistics • Case studies
Frequency information • Most banal information: counting how many times a word (“type”) appears in a text • Most frequent words will be function words, so often f counts exclude words listed in a “stop list” • Should you count words or lemmas? • Should you distinguish alternate meanings of ambiguous word forms (if you can)?
Frequency information • Frequency information on its own is not particularly interesting • Quite useful to comparef of related words • eg alternative readings of a given word form (already seen in probability calculations in tagging) • or comparing near synonyms, especially if we can take context into account (see later) • f of a given word in a given context can be indicative, eg pronouns more frequent as subject or 1st word of sentence
Types and tokens • Remember distinction between “tokens” (words) and “types” (different words) • Type count gives a measure of how many DIFFERENT words are used • Type-token ratio gives a measure of “vocabulary richness” • If vocabulary is very varied, TTR will be higher • TTR is very sensitive to overall text length, so it is not meaningful to compare TTRs for texts of different lengths • Standardized TTR is the average of the TTR for each sequence of n words (typical default n=1000) in a text or corpus
Vocabulary growth curve • Plotting types against tokens for a given text shows us how the TTR grows as the text gets longer • Typically, the curve starts steeply and then flattens, sooner or later reflecting homogeneity (or otherwise) of the text VGC for Macbeth in Basic English source: http://web.missouri.edu/~youmansc/vmp/help/Youmans-TypeToken.pdf
Vocabulary growth curve • Comparative VGC for four texts • Simple measure used in some literary studies (a) (b) (c) (d) (a) Longfellow (b) Hemingway (c) Basic English (Macbeth) (d) Bible (Genesis 2)
Vocabulary in context • “Concordance”, also known as KWIC list (key word in context) • Allows us to see the (immediate) environment in which a word appears • Listings can be customised to show what you want more clearly, eg • sorted according to next or previous word • showing more or less context
source: A Partington Patterns and Meanings. Amsterdam (1998): John Benjamins
CIWK search • inverted KWIC • specify the context and look to see what words occur in it
Collocation • Term coined by J R Firth (1957) to characterise (part of) his theory of meaning • “You shall judge a word by the company it keeps” • “The occurrence of two or more words within a short space of each other in a text” (Sinclair 1991) • “The relationship a lexical item has with items tha appear with greater than random probability in its (textual) context” (Hoey 1991; emphasis added)
Collocation, text type and style • Distinguish between general and more usual collocations vs technical and more personal ones • eg in a general corpus time collocates with save, spend, waste, fritter away, … • but in a corpus of sports reports time collocates with half, full, extra, injury, first, second, third, …
Collocation and idiom • Listing collocations will often reveal idioms and cliches • Important to think of collocation as extending beyond neighbouring words (which can be captured by simple concordances)
Collecting collocations • If we are to look beyond neighbouring words, what constraints might we impose? • Collocation means co-occurrence within some defined context • possibly a “window” of n words to left and/or right • if corpus is tagged/parsed, we can look at collocations within structures • or we can define the window in terms of constituents rather than words
Measuring significance • The significance of any co-occurrences nees to be established • Raw co-occurrence frequency counts mean nothing • Need to be compared to something else • Need to compare a given co-occurrence with random chance, or with some other co-occurrence • More detail next time
Collocation and synonymy • Collocation is good evidence in discussing (near) synonymy • Lots of studies take near synonyms and look to see if the nature of their relationship can be characterised by their distribution • In other words: what words does each of the synonym set collocate with? • Especially useful for language learners
Example of sheer and synonyms • (from Partington book) • three senses (LDOCE) • pure, ‘nothing but’, eg sheer luck • steep, sheer drop • thin, sheer stockings • (Cobuild) use sheer to emphasize completeness of state • 92 occurrences of sheer (in meaning 1) in his corpus
collocations of sheer • expression of magnitude of weight or volume to right (20%) • volume, weight, numbers, mass, scale, quantity, size • almost always with article the • expression of force, strength or energy (22%) • energy. exertion, force, muscle, strength, power, pressure, fury, pace, intensity • usually with the, or a preposition but no article • expression of persistence (14%) • pesistence, irreversibility, obstinacy, indomitability, insistence, reliability, integrity, hard work • left context: through, because of, out of, expressing causation, but not the
collocations of sheer • nouns expressing strong emotion (11%) • fun, joy, panic, inspiration, enjoyment, terror • nouns expressing extreme personal qualities (11%) • beauty, glamour, brutality, thuggery, madness, folly • nouns expressing extreme ability or lack of same (8%) • expertise, competence, virtuosity, gamesmanship
Synonyms of sheer - pure • LDOCE definitions, 5 meanings of which two overlap: • not mixed with anything • complete, thorough • Corpus has 135 examples • Larger variety of syntactic environments (sheer was always modifying a noun) including predicative, which sheer does not occur in • *? The drop was sheer • * His fury was sheer
Synonyms of sheer - pure • Religious-moral context; sense of unmixed • doctrine, faith, goodness; chemicals, gold • But, many examples where it has an emphasizing function, like sheer • accident, chance, comedy, guesswork, honesty, idiocy, malice, nostalgia, pleasure, selfishness, talent, theatre, vulnerability, whim, wickedness • often with proper nouns (unlike sheer) • No examples of pure collocating with items expressing magnitude, force or persistence • Some overlap with sheer • personal qualities, emotion (though generally less extreme ones) • Only few examples of pure in prepositional phrase expressing causation; causes can be sheer, but states are pure
Other synonyms of sheer • Partington does similar analysis of complete and absolute • Showsthateach of the “synoynms” has more typical uses and patterns, though there is some overlap • But there is also clear evidence of complementary usage
Connotation and semantic prosody • Collocation can also be used to illustrate connotation • “secondary implications of a word” (Lyons 1977) • Three distinct uses of the term • marker of a particular speech variety (eg lovely) • cultural implications (words used to describe women show what society thinks of them) • marker of speakers evaluation (firm ~ stubborn) • “Semantic prosody” (Sinclair 1987) • use of a certain word spreads its connotation over the whole utterance
Some examples • object of commit is often something bad (foul, deception, offence) • if something is described as rife, it is not good (crime, disease, mistakes), and describing it as rife expresses a negative connotation (speculation is rife) • both the above exemplify “unfavourable prosody”, but other prosodies are possible • good example claim vs admitresponsibility for an atrocity
More power to your elbow • Examples given in last few slides were largely subjective • More interesting if we can back up observations with calculations of statistical significance • Next time we will look at some simple statistical measures