350 likes | 720 Views
How People use words to make meanings __ How to compute the meaning of natural language utterances. Patrick Hanks Professor in Lexicography University of Wolverhampton ***. Goals of the tutorial. To explore the relationship between meaning and phraseology.
E N D
How People use words to make meanings__How to compute the meaning of natural language utterances Patrick Hanks Professor in Lexicography University of Wolverhampton ***
Goals of the tutorial • To explore the relationship between meaning and phraseology. • To explore the relationship between conventional uses of words and creative uses such as freshly coined metaphors. • To discover factors that contribute to the dynamic power of natural language, including anomalous arguments, ellipsis, and other “explotations” of normal usage.
Procedure • We shall focus on verbs. • We shall not invent examples. • Instead, we shall analyse data. • When tempted to invent an example, we try to remember to check for a comparable one in a corpus. • We shall look at large numbers of actual uses of a verb, using concordances to a very large corpus. • We shall ask questions such as: • What patterns of normal use of this verb can we detect? • What is the nature of a “pattern”? • Does each pattern have a different meaning? • What is the nature of lexical ambiguity, and why has it been so troublesome for NLP?
Patterns in Corpora • When you first open a concordance, very often some patterns of use leap out at you. • Collocations make patterns: one word goes with another • To see how words make meanings, we need to analyse collocations • The more you look, the more patterns you see. • BUT THEN • When you try to formalize the patterns, you start to see more and more exceptions. • The boundaries of ‘correct usage’ are fuzzy and there are many outlying cases.
Analysis of Meaning in Language • Analysis based on predicate logic is doomed to failure: • Words are NOT building blocks in a ‘Lego set’ • A word does NOT denote ‘all and only’ members of a set • Word meaning is NOT determined by necessary and sufficient conditions for set membership • But precise meanings for natural language terms can be stipulated (using the loose, vague terminology of a natural language • For analysing the lexicon of a natural language, a prototype-based approach is necessary: • mapping prototypical interpretations (‘implicatures’) onto prototypical phraseology • computing similarity to a prototype
The linguistic ‘double-helix’ hypothesis • A language is a system of rule-governed behaviour. • Not one, but TWO (interlinked) sets of rules: • Rules governing the normal uses of words to make meanings • Rules governing the exploitation of norms
Lexicon and prototypes • Each word is typically used in one or more patterns of usage (valency + collocations) • Each pattern is associated with a meaning: • a meaning is a set of prototypical beliefs • In CPA, meanings are expressed as ‘anchored implicatures’. • few patterns are associated with more than one meaning. • Corpus data enables us to discover the patterns that are associated with each word.
What is a pattern? (1) • The verb is the pivot of the clause. • A pattern is a statement of the clause structure (valency) associated with a meaning of a verb • together with the typical semantic values of each argument. • arguments of verbs are populated by lexical sets of collocates • Different semantic values of arguments activate different meanings of the verb.
What is a pattern? (2) • [[Human]] fire [[Firearm]] • [[Human]] fire [[Projectile]] • [[Human 1]] fire [[Human 2]] • [[Anything]] fire [[Human]] {with enthusiasm} • [[Human]] fire [NO OBJ] • Etc.
Semantic Types and Ontology • Items in double square brackets are semantic types. • Semantic types are being gathered together into a shallow ontology. • (This is work in progress in the currect CPA project) • Each type in the ontology will (eventually) be populated with a set of lexical items on the basis of what’s in the corpus under each relevant pattern.
Shimmering lexical sets • Lexical sets are not stable – not „all and only”. • Example: • [[Human]] attend [[Event]] • [[Event]] = meeting, wedding, funeral, etc. • But not thunderstorm, suicide.
Meanings and boundaries • Boundaries of all linguistic and lexical categories are fuzzy. • There are many borderline cases. • Instead of fussing about boundaries, we should focus instead on identifying prototypes. • Then we can decide what goes with what. • Many decision will be obvious. • Some decisions – especially about boundary cases – will be arbitrary.
Computing meaning • Each user of a language has a “corpus” of uses stored inside his or her head • These are traces of utterances that the person has seen, heard, or uttered • Each person’s mental corpus of English (etc .) is different • What all these “mental corpora” have in common is patterns • By analysing a huge corpus of texts computationally, we can create a pattern dictionary that represents patterns shared by many (most?) users of the language. • for use by computers as well as by people. • In a pattern dictionary, each pattern is associated with a meaning (or a translation, or some other interpretation or fact)
Now comes the long, slow, hard bit • We haven’t built the inventory (the pattern dictionary) yet. It is work in progress. • We have to compile an inventory of normal patterns. • a Pattern Dictionary of English Verbs (pdev)
What are the components of a normal context? – verbs The apparatus for corpus pattern analysis of verbs: • Valencies (NOT “NP VP” BUT “SPOCA”). • Semantic values for the lexical sets in each valency slot: [[Event]], [[Phys Obj]], [[Human]], [[Institution]], [[Location]], etc. • Lexical sets can be populated by cluster analysis of corpora. • Subvalency items (quantifiers, determiners, etc.): • ‘Something took place’ vs. • ‘Something took its place’.
Clause Roles • S – Subject • P – Predicator (the verb + auxiliaries, etc.) • O – Object (one, two, or none) • C – Complement (co-referential with S or O) • I am happy | a lexicographer: C(S) • Being here makes me happy | a tourist: C(O) • A – Adverbial • I came to Leiden | on Wednesday • They treated me well | respectfully |with respect
Implicatures: taking stereotypes seriously When a pilot files {a flight plan}, he or she informs ground control of the intended route and obtains permission to begin flying. …If someone files {a lawsuit}, they activate a procedure asking a court for justice. When a group of people file {into a room or other place}, they walk in one behind the other. (There are 14 such patterns for file, verb.)
The CPA method • Create a sample concordance for each word • 300-500 examples • from a ‘balanced’ corpus (i.e. general language) [We use the British National Corpus, 100 million words, and the Associated Press Newswire for 1991-3, 150 million words] • Classify every line in the sample, on the basis of its context. • Take further samples if necessary to establish that a particular phraseology is conventional • Check results against corpus-based dictionaries. • Use introspection to interpret data, but not to create data.
In CPA, every line in the sample must be classified The classes are: • Norms (normal uses in normal contexts) • Exploitations (e.g. ad-hoc metaphors) • Alternations • e.g. [[Doctor]] treat [[Patient]] <> [[Medicine]] treat [[Patient]] • Names (Midnight Storm: name of a horse, not a storm) • Mentions (to mention a word or phrase is not to use it) • Errors • Unassignables
Sample from a concordance (unsorted) incessant noise and bustle had abated. It seemed everyone was up after dawn the storm suddenly abated.Ruth was there waiting when Thankfully, the storm had abated, at least for the moment, and storm outside was beginning to abate, but the sky was still ominous Fortunately, much of the fuss has abated, but not before hundreds of , after the shock had begun to abate, the vision of Benedict's been arrested and street violence abated, the ruling party stopped he declared the recession to be abating, only hours before the ‘soft landing’ in which inflation abatesbut growth continues moderate the threshold. The fearful noise abatedin its intensity, trailed ability.However, when the threat abated in 1989 with a ceasefire in bag to the ocean.The stormwas abatingrapidly, the evening sky ferocity of sectarian politics abatedsomewhat between 1931 and storm.By dawn the weather had abatedthough the sea was still angry the dispute showed no sign of abatingyesterday.Crews in
Sorted (1): [[Event = Storm]] abate [NO OBJ] dry kit and go again.The stormabates a bit, and there is no problem in ling.Thankfully, the storm had abated, at least for the moment, and the sting his time until the stormabated but also endangering his life, Ge storm outside was beginning to abate, but the sky was still ominously o bag to the ocean.The storm was abating rapidly, the evening sky clearin after dawn the storm suddenly abated.Ruth was there waiting when the h t he wait until the rain stormabated.She had her way and Corbett went storm.By dawn the weather had abated though the sea was still angry, i lcolm White, and the gales had abated: Yachting World had performed the he rain, which gave no sign of abating, knowing her options were limite n became a downpour that neverabatedall day.My only protection was ned away, the roar of the windabating as he drew the hatch closed behi
Sorted (2): [[Event = Problem]] abate [NO OBJ] ‘soft landing’ in which inflationabates but growth continues modera Fortunately, much of the fuss has abated, but not before hundreds of the threshold. The fearful noiseabated in its intensity, trailed incessant noise and bustle had abated. It seemed everyone was up ability.However, when the threatabated in 1989 with a ceasefire in the Intifada shows little sign of abating.It is a cliche to say that h he declared the recession to be abating, only hours before the pub he ferocity of sectarian politicsabated somewhat between 1931 and 1 been arrested and street violenceabated, the ruling party stopped b the dispute showed no sign of abating yesterday.Crews in
Part of the lexical set [[Event = Problem]] as subject of ‘abate’ From BNC:{fuss, problem,tensions,fighting, price war, hysterical media clap-trap, disruption, slump, inflation, recession, the Mozart frenzy, working-class militancy, hostility, intimidation, ferocity of sectarian politics, diplomatic isolation, dispute, …} From AP: {threat, crisis, fighting, hijackings, protests, tensions, anti-Japan fervor, violence, bloodshed, problem, crime, guerrilla attacks, turmoil, shelling, shooting, artillery duels, fire-code violations, unrest, inflationary pressures, layoffs, bloodletting, revolution, murder of foreigners, public furor, eruptions, bad publicity, outbreak, jeering, criticism, infighting, risk, crisis, …} (All these are kinds of problem.)
Part of the lexical set [[Emotion = Negative]] as subject of ‘abate’ From BNC:{anxiety, fear, emotion, rage,anger, fury, pain, agony, feelings, …} From AP: {rage,anger, panic, animosity, concern, …}
A domain-specific norm:[[Person | Action]] abate [[Nuisance]] (DOMAIN: Law.Register: Jargon) o undertake further measures to abatethe odour, and in Attorney Ge us methods were contemplated to abatethe odour from a maggot farm s specified are insufficient to abatethe odour then in any further as the inspector is striving to abatethe odour, no action will be t practicable means be taken to abate any existing odour nuisance, ll equipment to prevent, and or abateodour pollution would probabl rmation alleging the failure to abatea statutory nuisance without t I would urge you at least to abate the nuisance of bugles forthw way that the nuisance could be abated, but the decision is the dec otherwise the nuisance is to be abated.They have full jurisdiction ion, or the local authority may abatethe nuisance and do whatever
A more complicated verb: ‘take’ • 61 phrasal verb patterns, e.g. [[Person]] take [[Garment]] off [[Plane]] take off [[Human Group]] take [[Business]] over • 105light verb uses (with specific objects), e.g. [[Event]] take place [[Person]] take {photograph | photo | snaps | picture} [[Person]] take {the plunge} • 18‘heavy verb’ uses, e.g. [[Person]] take [[PhysObj]] [Adv[Direction]] • 13adverbial patterns, e.g. [[Person]] take [[TopType]] seriously [[Human Group]] take [[Child]] {into care} • TOTAL: 204 patterns, and growing (but slowly)
Finer distinctions: ‘take + place’ Presence or absence of a determiner can determine the meaning of the verb. • [[Event]] take {place}: A meeting took place. • [[Person 1]] take {[[Person 2]]’s place}: • George took Bill’s place. • [[Person]] take {[REFLDET] place}: Wilkinson took his place among the greats of the game. • [[Person=Competitor]] take {[ORDINAL] place}: The Germans took first place.
Semantic type vs. contextual role • Mr Woods sentenced Bailey to seven years | life imprisonment PATTERN:[[Human 1]] sentence [[Human 2]] {to [[Time Period | Punishment]]} • Semantic type: [[Human]] • Contextual roles: [[Human 1 = Judge]], [[Human 2 = Convicted Criminal]], seven years [Time Period = Punishment in jail]] • Semantic type is an intrinsic semantic property of a lexical item. • Contextual role is extrinsic; the meaning is imposed (activated, selected) by the context in which the word is used.
Exploitations • People don’t just say the same thing, using the same words repeatedly. • They also exploit normal usage in order to say new things, or in order to say old things in new and interesting ways. • Exploitations include: 1) anomalous arguments; metaphor and other figures of speech, 2) ellipsis, word creation. • Exploitations are a form of linguistic creativity.
Exploitation type 1: Anomalous arguments • A rally driver urging his car through the forest • Normally, you urge someone (e.g. a politician) to do something OR you urge a horse (not a car) in a certain direction
Exploitation type 2: Metaphor • The uninspiring, fragment-bearing wind that blows through much of Eliot's poetry
Norms and exploitations • We need a theory (and a procedure) that distinguishes the normal, conventional, idiomatic phraseology of each word from exploitations of those phraseological norms.
The biggest challenges currently facing CPA • Finding quick, efficient, reliable, automatic or semi-automatic ways of populating the lexical sets. • Distinguishing patterns from “the general much of goings on”. • Predicting patterns by computational analysis of data, for verbs not yet analysed. • Dealing with freshly created metaphors and other exploitations.
How is CPA different from FrameNet? CPA: • finds syntagmatic criteria (patterns) for distinguishing different meanings of polysemous words, in a “semantically shallow” way; • proceeds word by word; • When a verb has been analysed, the patterns are ready for use. FrameNet: • proceeds frame by frame, not word by word; • analyses situations in terms of frame elements; • does not explicitly study meaning differences of polysemous words; • does not analyse corpus data systematically, but goes fishing in corpora for examples in support of hypotheses; • has no established inventory of frames; • has no criteria for completeness of a lexical entry.
Goals of CPA • To create an inventory of semantically motivated syntagmatic patterns, so as to reduce the ‘lexical entropy’ of each word. • A benchmark for assigning meaning to patterns in previously unseen, untagged text • To develop procedures for populating lexical sets by computational cluster analysis of text corpora. • To collect evidence for the principles that govern the exploitations of norms.