370 likes | 561 Views
Lexical and semantic selection. Options for grammar engineers and what they might mean linguistically. Outline and acknowledgements. Selection in constraint-based approaches types of selection and overview of methods used in LKB/ERG denotation The collocation problem collocation in general
E N D
Lexical and semantic selection Options for grammar engineers and what they might mean linguistically
Outline and acknowledgements • Selection in constraint-based approaches • types of selection and overview of methods used in LKB/ERG • denotation • The collocation problem • collocation in general • corpus data on magnitude adjectives • possible accounts • Conclusions • Acknowledgements: LinGO/DELPH-IN, especially Dan Flickinger, also Generative Lexicon 2005
1(i): Types of grammatical selection • syntactic: e.g., preposition among selects for an NP (like other prepositions) • lexical: e.g., spend selects for PP headed by on • Kim spent the money on a car • semantic: e.g., temporal at selects for times of day (and meals) • at 3am • at three thirty five and ten seconds precisely
Lexical selection • lexical selection requires method of specifying a lexeme • in the ERG, this is via the PRED value spend (e.g., spend the money on Kim) spend_v2 := v_np_prep_trans_le & [ STEM < "spend" >, SYNSEM [ LKEYS [ --OCOMPKEY _on_p_rel KEYREL.PRED "_spend_v_rel" ]]].
Lexical selection • ERG relies on convention that different lexemes have different relations • `lexical’ selection is actually semantic. cf Wechsler • no true synonyms assumption, or assume that grammar makes distinctions that are more fine-grained than real-world denotation justifies. • near-synonymy would have to be recorded elsewhere: ERG does (some) morphology, syntax and compositional semantics • alternatives? • orthography: but ambiguity or non-monotonic semantics • lexical identifier: requires new feature • PFORM: requires features, values
Semantic selection • Requires a method of specifying a semantically-defined phrase • In ERG, done by specifying a higher node in the hierarchy of relations: at_temp := p_temp_le & [ STEM < "at" >, SYNSEM [ LKEYS [ --COMPKEY hour_or_time_rel, KEYREL.PRED _at_p_temp_rel ]]].
Semantic selection • Semantic selection allows for indefinitely large set of alternative phrases • compositionally constructed time expressions • productive with respect to new words, but exceptions allowable • approach wouldn’t be falsified if e.g., *at tiffin • ERG lexical selection is a special case of ERG semantic selection! • could assume featural encoding of semantic properties (alternatively or in addition to hierarchy) • TFS semantic selection is relatively limited practically (see later) • also idiom mechanism in ERG
1(ii): Denotation, grammar engineering perspective • Denotation is truth-conditional, logically formalisable (in principle), refers to `real world’ (extension) • Not necessarily decomposable • Naive physics, biology, etc • Must interface with non-linguistic components • Minimising lexical complexity in broad-coverage grammars is practically necessary • Plausible input to generator: • reasonable to expect real world constraints to be obeyed (except in context) • the goat read the book • Potential disambiguation is not a sufficient condition for lexical encoding • The vet treated the rabbit and the guinea pig with dietary Vitamin C deficiency
Denotation, continued • Assume linkage to domain, richer knowledge representation language available • TFS language for syntax etc, not intended for general inference • Talmy example: the baguette lay across the road • across - Figure’s length > Ground’s width • identifying F and G and location for comparison in grammar? • coding average length of all nouns? • allowing for massive baguettes and tiny roads?
But ... • Trend in KR is towards description logics rather than richer languages. • Need to think about the denotation to justify grammaticization (or otherwise) • if temporal in/on/at have same denotation, selectional account is required for different distribution • unreasonable to expect lexical choice for in/on/at in input to generator • Linguistic criteria: denotation versus grammaticization? • effect found cross-linguistically? • predictable on basis of world knowledge? • closed class vs open class • Practical considerations about interfacing go along with linguistic criteria • non-linguists expect some information about word meaning! • allow generalisation over e.g., in/on/at in generator input, while keeping possibility of distinction
2(i) Collocation: assumptions • Significant co-occurrences of words in syntactically interesting relationships • `syntactically interesting’: for examples in this talk, attributive adjectives and the nouns they immediately precede • `significant’: statistically significant (but on what assumptions about baseline?) • Compositional, no idiosyncratic syntax etc (as opposed to multiword expression) • About language rather than the real world
Collocation versus denotation • Whether an unusually frequent word pair is a collocation or not depends on assumptions about denotation: fix denotation to investigate collocation • Empirically: investigations using WordNet synsets (Pearce, 2001) • Anti-collocation: words that might be expected to go together and tend not to • e.g., flawless behaviour (Cruse, 1986): big rain (unless explained by denotation) • e.g., buy house is predictable on basis of denotation, shake fist is not
2(ii): Distribution of `magnitude’ adjectives • some very frequent adjectives have magnitude-related meanings (e.g., heavy, high, big, large) • basic meaning with simple concrete entities • extended meaning with abstract nouns, non-concrete physical entities (high taxation, heavy rain) • extended uses more common than basic • not all magnitude adjectives – e.g. tall • nouns tend to occur with a limited subset of these extended adjectives • some apparent semantic groupings of nouns which go with particular adjectives, but not easily specified
Distribution • Investigated the distribution of heavy, high, big, large, strong, great, major with the most commonco-occurring nouns in the BNC • Nouns tend to occur with up to three of these adjectives with high frequency and low or zero frequency with the rest • My intuitive grammaticality judgments correlate but allow for some unseen combinations and disallow a few observed but very infrequent ones • big, major and great are grammatical with many nouns (but not frequent with most), strong and heavy are ungrammatical with most nouns, high and large intermediate
heavy: groupings? magnitude: dew, rainstorm, downpour, rain, rainfall, snowfall, fall, snow, shower: frost, spindrift: clouds, mist, fog: flow, flooding, bleeding, period, traffic: demands, reliance, workload, responsibility, emphasis, dependence: irony, sarcasm, criticism: infestation, soiling: loss, price, cost, expenditure, taxation, fine, penalty, damages, investment: punishment, sentence: fire, bombardment, casualties, defeat, fighting: burden, load, weight, pressure: crop: advertising: use, drinking: magnitude of verb: drinker, smoker: magnitude related? odour, perfume, scent, smell, whiff: lunch: sea, surf, swell:
high: groupings? magnitude:esteem, status, regard, reputation, standing, calibre, value, priority; grade, quality, level; proportion, degree, incidence, frequency, number, prevalence, percentage; volume, speed, voltage, pressure, concentration, density, performance, temperature, energy, resolution, dose, wind; risk, cost, price, rate, inflation, tax, taxation, mortality, turnover, wage, income, productivity, unemployment, demand magnitude of verb: earner
heavy and high • 50 nouns in BNC with the extended magnitude use of heavy with frequency 10 or more • 160 such nouns with high • Only 9 such nouns with both adjectives: price, pressure, investment, demand, rainfall, cost, costs, concentration, taxation
2(iii): Possible empirical accounts of distribution • Difference in denotation between `extended’ uses of adjectives • Grammaticized selectional restrictions/preferences • Lexical selection • stipulate Magn function with nouns (Meaning-Text Theory) • Semi-productivity / collocation • plus semantic back-off
1 - Denotation account of distribution • Denotation of adjective simply prevents it being possible with the noun. Implies that heavy and high have different denotations heavy’(x) => MF(x) > norm(MF,type(x),c) & precipitation(x) or cost(x) or flow(x) or consumption(x)... (where rain(x) -> precipitation(x) and so on) • But: messy disjunction or multiple senses, open-ended, unlikely to be tractable. • e.g., heavy shower only for rain sense, not bathroom sense • Not falsifiable, but no motivation other than distribution. • Dictionary definitions can be seen as doing this (informally), but none account for observed distribution. • Input to generator?
2 - Selectional restrictions and distribution • Assume the adjectives have the same denotation • Distribution via features in the lexicon • e.g., literal high selects for [ANIMATE false ] • cf., approach used in the ERG for in/on/at in temporal expressions • grammaticized, so doesn’t need to be determined by denotation (though assume consistency) • could utilise qualia structure • Problem: can’t find a reasonable set of cross-cutting features! • Stipulative approach possible, but unattractive.
3 - Lexical selection • MTT approach • noun specifies its Magn adjective • in Mel’čuk and Polguère (1987), Magn is a function, but could modify to make it a set, or vary meanings • could also make adjective specify set of nouns, though not directly in LKB logic • stipulative: if we’re going to do this, why not use a corpus directly?
4- Collocational account of distribution • all the adjectives share a denotation corresponding to magnitude, distribution differences due to collocation, soft rather than hard constraints • linguistically: • adjective-noun combination is semi-productive • denotation and syntax allow heavy esteem etc, but speakers are sensitive to frequencies, prefer more frequent phrases with same meaning • cf morphology and sense extension: Briscoe and Copestake (1999). Blocking (but weaker than with morphology) • anti-collocations as reflection of semi-productivity
Collocational account of distribution • computationally, • fits with some current practice: • filter adjective-noun realisations according to n-grams (statistical generation – e.g., Langkilde and Knight, recent experiments with ERG) • use of co-occurrences in WSD • back-off techniques • requires an approach to clustering
semantic spaces • acquired from corpora • generally, collect vectors of words which co-occur with the target • best known is LSA: often used in psycholinguistics • more sophisticated models incorporate syntactic relationships • currently sexy, but severe limitations!
Back-off and analogy • back-off: decision for infrequent noun with no corpus evidence for specific magnitude adjective • should be partly based on productivity of adjective: number of nouns it occurs with • default to big • back-off also sensitive to word clusters • e.g., heavy spindrift because spindrift is semantically similar to snow • semantic space models: i.e., group according to distribution with other words • hence, adjective has some correlation with semantics of the noun
Metaphor • Different metaphors for different nouns (cf., Lakoff et al) • `high’ nouns measured with an upright scale: e.g., temperature: temperature is rising • `heavy’ nouns metaphorically like burden: e.g., workload: her workload is weighing on her • Doesn’t lead to an empirical account of distribution, since we can’t predict classes. Assumption of literal denotation followed by coercion is implausible. • But: extended metaphor idea is consistent with idea that clusters for backoff are based on semantic space
Collocation and linguistic theory • Collocation plus semantic space clusters may account for some of the `messy’ bits, at least for some speakers. • in/on transport: in the car, on the bus • Talmy: presence of walkway, `ragged lower end of hierarchy’ • but trains without walkway, caravans with walkway? • in/on choice perhaps collocational, not real exception to language-independent schema elements • Potential to simplify linguistic theories considerably. • Success of ngrams, LSA models of priming. • Practically testable: assume same denotation of heavy/high or in/on, see if we can account for distribution in corpus. • Alternative for temporal in/on/at? • Experiments with machine learning temporal in/on/at (Mei Lin, MPhil thesis, 2004): very successful at predicting distribution, but used lots of Treebank-derived features.
Summary • Selection in ERG • Other aspects of ERG selection not described here: multiword expressions and idioms • Collocational models as adjunct to TFS encoding • Role of denotation is crucial • Practical considerations about grammar usability
Final remarks • Grammar usability: • A good broad-coverage grammar should have an account of denotation of closed-class words at least, but probably not within TFS encoding. • Can we use semantic web languages for non-domain-specific encoding? • Collocational techniques require much further investigation • Can semantic space models be related to denotation (e.g., somehow excluding collocational component)?
Idioms Idiom entry: stand+guard := v_nbar_idiom & [ SYNSEM.LOCAL.CONT.RELS <! [ PRED "_stand_v_i_rel" ], [ PRED "_guard_n_i_rel" ] !> ]. Idiomatic lexical entries: guard_n1_i := n_intr_nospr_le & [ STEM < "guard" >, SYNSEM [ LKEYS.KEYREL.PRED "_guard_n_i_rel“ ]]. stand_v1_i := v_np_non_trans_idiom_le & [ STEM < "stand" >, SYNSEM [ LKEYS.KEYREL.PRED "_stand_v_i_rel”]].
Idioms in ERG/LKB • Account based on Wasow et al (1982), Nunberg et al (1994). • Idiom entry specifies a set of coindexed MRS relations (coindexation specified by idiom type, e.g., v_nbar_idiom) • Relations may correspond to idiomatic lexical entries (but may be literal uses: e.g., cat out of the bag – literal out of the). • Idiom is recognised if some phrase matches the idiom entry. • Allows for modification: e.g., stand watchful guard
Messy examples • among: requires group or plural or ? • among the family (BNC) • among the chaos (BNC) • between: requires plural denoting two objects, but not group (?) • fudge sandwiched between sponge (BNC) • between each tendon (BNC) • ? the actor threw a dart between the couple • * the actor threw a dart between the audience (even if only two people in the audience)