270 likes | 298 Views
Elliptical Arguments. Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic ***. Outline of the talk. A task for e-lexicographers: identifying syntagmatic patterns (or constructions) in corpora, and establishing what they mean.
E N D
Elliptical Arguments Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic ***
Outline of the talk • A task for e-lexicographers: identifying syntagmatic patterns (or constructions) in corpora, and establishing what they mean. • Patterns can include quite a lot of variation. • We also need a lexically driven theory of language: accounting for rules governing the normaland abnormaluses of words • Abnormal uses cause problems for lexical analysts • This presentation will discuss two such problems. • Conclude with an on-line demo.
Corpus Pattern Analysis (CPA) • The lexicographical task is to establish how words are used, not just what they mean • Such an investigation must be based on corpus analysis, not guesswork and imagination • Invented examples have a tendency to distort. • BUT authenticity alone is not enough • Bizarre authentic examples also distort, e.g.: • “I hazarded various Stuartesque destinations like Florida, Bali, Crete and Western Turkey.” – J. Barnes • “Always vacuum your moose from the snout up.” – Massachusetts Journal of Taxidermy, 1986
The need for patterns • We need to establish, through painstaking corpus analysis, the patterns of usage that are associated with each word. • And we need a reliable theoretical base: • Some mixture of components such as • Herbst et al. 2004. Valency Dictionary of English. • Fillmore et al.: FrameNet • Miller, Fellbaum: WordNet • Pustejovsky. 1995: The Generative Lexicon. • Different patterns of usage around a lexeme activate different meanings. • We need to distinguish patterns from abnormal, innovative linguistic behaviour.
Empirical recogniton of patterns • When you first open a concordance, patterns leap out at you. • Collocations make patterns: one word goes with another • To see how words make meanings, we need to analyse collocations • The more you look, the more patterns you see. BUT • When you try to formalize the patterns, you start to see more and more exceptions. • The boundaries are fuzzy and there are many outlying cases. • Speakers and writers exploit the norms of language.
The linguistic ‘double-helix’ hypothesis • A language is a system of rule-governed behaviour. • Not one, but TWO (interlinked) sets of rules: • Rules governing the normal uses of words to make meanings • Rules governing the exploitation of norms
What is a pattern? • The verb is the pivot of the clause. • A pattern is a statement of the clause structure (valency) associated with a meaning of a verb, • together with typical semantic values of each argument, realized by salient collocates • Different semantic values of arguments activate different meanings of each verb.
Pattern are contrastive fire, verb • [[Human]] fire [[Firearm]] (at [[Phys Obj = Target]]) • [[Human]] fire [[Projectile]] (from [[Firearm]]) (at [[Phys Obj = Target]]) • [[Human 1]] fire [[Human 2]] • [[Anything]] fire [[Human]] {with enthusiasm} • [[Human]] fire [NO OBJ] .... • Etc.
Semantic Types and Ontology • Items in double square brackets are semantic types. • Semantic types are being gathered together into a shallow ontology. • (This is work in progress in the currect CPA project) • Preliminary outline in Pustejovsky, Rumshisky, and Hanks 2004 • Each type in the ontology will (eventually) be populated with a set of lexical items on the basis of what’s in the corpus under each relevant pattern.
Exploitations • People exploit the rules of normal usage for various purposes: • For economy and speed: • Conversation is quick • Listeners (and readers) get bored easily • Words that are ‘obvious’ can sometimes be omitted • To say new things (reporting discoveries, registering patents, ...) • To say old things in new ways • For rhetoric, humour, poetry, politics …
Anomalous collocates exploit norms • “… a brick arrived through my living room window.” —(BNC) M. Grist, 1993. Life at the tip. • Normally, people (travellers) and vehicles arrive – not bricks. • Whatever the intention, rehabilitation does punish people; in particular, it allows people to be put into institutions where they would rather not be. —(BNC) Bob Roshier, 1989. Controlling Crime. • Normally, people punish people – not procedures such as rehabilitation.
The null object alternation • Earlier in this talk, I said: • “Invented examples have a tendency to distort; Bizarre authentic examples also distort.” • Someone might ask, “distort what?” • But when I said this, I assumed you know what such examples distort – common knowledge between us – so I don’t need to say it. • Omitting – eliding – ‘unnecessary’ words is a very common pattern of linguistic behavior.
Ellipsis • Absence of an expected collocate is a type of exploitation. • The police fired [[]] into the crowd. • The police fired rubber bullets [[]]. • He gave the order and they fired [[]] [[]]. • The valency pattern of this sense of fire, v., requires SUBJECT, OBJECT, and ADVERBIAL: • [[Human]] fire [[Projectile] [Adv[Direction]] • Correct description of valency requires syntactic analysis and semantic typing of arguments.
Ellipsis and ambiguity Corpus example: Later that morning he changed. • What is the meaning of change here? a? At breakfast he was still wearing a black tie and crumpled dinner jacket from the night before. Later that morning he changed. b? At breakfast he greeted us with a cheerful grin and seemed not to have a care in the world. Later that morning he changed. c? He got on at Köln thinking that it was a through train to Berlin, but the ticket inspector told him that it would terminate at Hannover. Later that morning he changed.
Only primary norms are exploited by elision (?) • Many small farmers, unable to cultivate successfully, turned to the sale or renting of land. • BUT NOT: *He had many friends in America but in England he was unable to cultivate successfully. • We punish too much—and … we imprison too much. • BUT NOT: • He offered one to the Englishman, who declined. • “Whatever is reported as having been declined has already been named, mentioned, or indicated with sufficient clarity; so that the reader, arriving at the word declined, need be in no doubt about what would be a suitable object or infinitive clause.” –Sinclair (1991)
Types and Qualia in CPA • The apparatus needed for analysing nouns is different from that needed for verbs • Plug and socket • Verbs need event typing and argument structure • Nouns need analysis of their qualia structure [Pustejovsky’s term]: • What sort of thing is it? • What’s it for? • What properties does it have? AND their semantic prosody: is it good or bad? (and if so, for whom?) AND their verb preferences
Each argument of each verb is a complex lcp • [[Event | Human]] calm [[Animate]] • calm a hysterical patient • calm the horses • But can you *calm a cockroach? • Not part of the lcp for “calm [[Animate]]” – not a norm • Calm {[POSDET] {nerves | anxiety} [= properties of [[Animate]] ] • Calm a riot [= behaviour of [[Animate]] ] • Calm the market [[= Location = Activity in Location = Human Group Acting in Location]]
Semantic types and semantic roles • sentence, v. • PATTERN: [[Human 1 = Judge]] sentence [[Human 2 = Convicted Criminal]] to [[{Time Period | Event} = Punishment]] • IMPLICATURE: [[Human 1]] • SECONDARY IMPLICATURE: [[Time Period]] is a jail sentence • EXAMPLE: Mr Woods sentenced Bailey to 7 years. Note that the implicature is “anchored” to the pattern.
ON-LINE DEMO (?) • http://nlp.fi.muni.cz/projects/cpa • Choose Web Access • Log-in: guest • Password: guest
Shimmering lexical sets • Lexical sets are not stable – not „all and only”. • Example from Hanks and Jezek (2008): • [[Human]] attend [[Event]] • [[Event]] = meeting, wedding, funeral, etc. • But not all events: not thunderstorm, suicide. • and not only events: attend school, attend a clinic • Contrast with another pattern for attend: – [[Human 1]] attend [[Human 2 = High Status]]
Meanings and boundaries • Boundaries of all linguistic and lexical categories are fuzzy. • There are many borderline cases. • Instead of fussing about boundaries, we should focus instead on identifying prototypes • Then we can decide what goes with what • Many decision will be obvious. • Some decisions – especially about boundary cases – will be arbitrary.
The Idiom Principle (Sinclair) • In word use, there is tension between the „terminological tendency” and the „phraseological tendency”: • The terminological tendency: the tendency for words to have meaning in isolation • The phraseological tendency: the tendency for the meaning of a word to be activated by the context in which it is used.
Current work in progress • Hanks (forthcoming): Lexical Analysis: Norms and Exploitations. MIT Press • A corpus-driven, lexically based theory of meaning in language • Linked to PDEV (A Pattern Dictionary of English Verbs) by CPA (Corpus Pattern Analysis) • A basic infrastructure resource • 468 verbs analyzed and released, freely available • http://nlp.fi.muni.cz/projects/cpa • Experiments with automating the analytical procedure and applying the results for NLP (IR, MT, …) and language teaching (lexical syllabus design) • Building a shallow ontology is in progress
Semantic Frames: FrameNet • “Word Meanings must be described in relation to semantic frames—schematic representations of the conceptual structures and patterns of beliefs, practices, institutions, images, etc., that provide a foundation for meaningful interaction in a given speech community.” —Fillmore et al. in International Journal of Lexicography 16 (3): p. 235
FrameNet and Valency • “Syntactic valence information is usually specified in terms of the phrase type of the possible complements, and in terms of the grammatical functions … expressed in terms of subcategorization frames.” – ibid, p. 236 • SOME PROBLEMS WITH THIS: • Aiming at all possible complementation frames of a verb may be too ambitious • Better to aim at all normal complementation frames • In a slot-and-filler grammatical model (Halliday), not a generative model • “Subcategorization” carries theoretical assumptions that may be incompatible with empirical data analysis
A methodological problem? • “ look at examples of one particular word, [How many? How chosen?] • for each frame element that occurs with that word, look for other words with similar meanings that also take that kind of complement, • notice which complement types cluster together with groups of meaning-sharing words, • given two types of complement that both occur with the target word, if one complement regularly occurs with one group of related words, and the other with a different group …, this is strong evidence for a a sense distinction (based on a frame distinction).” —Atkins et al. in IJL 16 (3): p. 255 QUESTION: Does (should?) FrameNet proceed frame by frame? Or verb by verb? Or both at the same time?
Thanks • The late John Sinclair & colleagues (Cobuild project) • Bob Taylor, Marie-Claire van Leunen & the late Digital Equipment Corporation Systems Research Center in Palo Alto (Hector project) • James Pustejovsky, Anna Rumshisky, & Brandeis U. • Masaryk U., Brno & Karel Pala, Pavel Rychly, and Adam Rambousek • Institute of Formal and Applied Linguistics, Charles U., Prague, & Jan Hajic, Martin Holub • Various Czech agencies for funding • You, for listening