170 likes | 349 Views
Adam Kilgarriff doesn’t believe in word senses…. *. (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997) A summary by Peter Clark Boeing Research. * and Sue Atkins, the source of the original quote. Word Sense Disambiguation.
E N D
Adam Kilgarriff doesn’t believe in word senses…. * (“I don’t believe in Word Senses”, A. Kilgarriff, in Computers and the Humanities 31 (2) pp 91-113, 1997) A summary by Peter Clark Boeing Research * and Sue Atkins, the source of the original quote.
Word Sense Disambiguation The basic idea: “Many words have more than one meaning. When a person understands a sentence with an ambiguous word in it, that understanding is built on the basis of just one of the meanings. So, as some part of the human language understanding process, the appropriate meaning has been chosen from the range of possibilities.”
Thesis • The early work: Toy examples • More recently: statistical WSD • Given the context of a word, identify the sense • Approaches: • use sense-labeled training data • use user guidance (Yarowsky): • build a concordance for a word • user selects discriminating “seed” surrounding words (1 per sense) • classify the word occurrences, using the seeds, as to the word sense • find other words correlated with the word senses extra seeds • goto 3 • choose seeds automatically (clustering + thresholding)
Two seeds Thesis (cont) For example: Treadmills attached to cranes were used to lift heavy For supplying power to cranes, hoists, and lifts above this height, a tower crane is often used. This elaborate courtship rituals cranes build a nest of vegetation are most closely related to cranes and rails. They ran low trees. At least five crane species are in danger of
Antithesis There is a computationally relevant/ useful/ interesting set of word senses in the language, approximating to those stated in a dictionary.
What is a Word Sense? Exercise: How many word senses: Have you put the money in the bank? The rabbit climbed up the bank. How many word senses: Cut the rope. Cut down your daily fat intake. The car cut to the left at the intersection. Cut along the dotted line. The coach cut two players from the team. We cut through the neighbor’s yard to get home. The boat cut the water. Cut the engine. This cuts into my earnings. This soap cuts grease well.
Find senses using ambiguity tests? • The “word-sensers”: use linguistic tests to find senses • e.g., the “crossed readings” test (for example): • “Mary arrived with a pike and so did Agnes.” • “Tom bought some beans, and so did Harry.” • But: • can’t always construct a plausible test sentence: “John ate the apple”, “John ate”= two senses? “Mary ate, and John, the apple.” ? • need to do interpretation decide on acceptability! • anomaly may not be lexical (word sense), but could be parsing or pragmatic interpretation!
Fillmore similarly seemed to be trying this… • For each lexicographically relevant unit we want straightforward ways of asking: • What does it mean? • In what contexts is it used? • What other words belong to those contexts? • What are its combinatorial properties? • What words are derivationally related to it? • different meanings in different contexts different senses • Different complementation patterns different senses • different nominalizations different senses
So how do dictionary authors do it? • Lexicographers • “lexicologists with a deadline” (and page limit) • is a highly pragmatic enterprise • working for a human audience • describes the unknown with the known • affects what “acceptable” genus and differentiae are • they often hedge on where the sense boundaries are Cŭt n. 1. Act of cutting; stroke or blow with knife etc.; ~ and thrust, use of both edge and point of sword, (fig.) lively interchange of argument. 2. Act of utterance that wounds the feelings; (Cricket, Tennis, etc.) stroke made by cutting. 3. … (Concise Oxford Dict.)
Alternative: Corpus-Based Methods • Data-driven, rather than theory-driven • Cluster lines in a concordance for a word together • Each cluster will be a word sense • No reason to expect clear boundaries • Kilgarriff: • Pattern of usage isn’t simply statistical coocurrence • Rather, involves a complex interplay between different knowledge sources • For example…
e.g., “Handbag” • 715 examples in BNC of plain uses of “handbag” • put in, take out, look for, lose, steal, find etc. • But couple of dozen stretch this • unique object • “an inimitable rendering of the handbag speech in…” • metonymy • “She moved from handbags through gifts to the flower shop” • metaphor • “with bats hanging in the trees like handbags” • Mrs Thatcher • “A mad cow with a handbag” • “sent out Mrs Thatcher with a fully-loaded handbag”
e.g., “Handbag” (cont) • handbag as weapon • “Meg swung her handbag” • “determined women armed with heavy handbags” • discos (“dance round your handbag”) • “Tensions mounted between regulars and the handbag brigade”
Predictability • The non-standard usages are predicable: • notthat you can use any word the way you like; • rather licence for usage comes from various sources • standard usage + linguistic + world knowledge + collocation e.g., “handbags at ten paces” is okay and amusing while “briefcases at ten paces” “shoulder-bags at ten paces” do not carry the weapon connotation, not so understandable • thus could argue that handbag-as-weapon should be a word sense
Word Sense Hierarchies (Autohyponomy) Perhaps we have autohyponomy here? handbag_1 (purse-like thing) handbag_2 (handbag-as-weapon) Happens all the time in dictionaries… knife_1 (bladed object) knife_3 (weapon) knife_2 (cutlery)
Word Sense Hierarchies (Autohyponomy) Interesting side note… This ambiguity never occurs in usage sanction_1 (control of some kind) sanction_2 (imposed punishment) sanction_3 (official endorsement) There are occasions where a “lowest common denominator” will be the appropriate reading make this a new word sense
So What? [Paraphrased] “A task-independent set of word senses is not a coherent concept. Word senses are simply undefined unless there is some underlying rationale for clustering, some context which classifies some distinctions as worth making and others as not worth makings. Homonyms like “pike” are a limiting case: in almost any situation is is worth making the fish/weapon distinction.”
So What? (cont) • Objective notion of “word sense” is not well-defined: • linguistic tests require human interpretation • clustering methods depends on the corpus, and some user-defined notion of similarity and sufficient distinctiveness. • Alternative: basic unit word sense. Rather: • basic unit occurrences of the word in context • word sense clusters of those units • Word senses are • define relative to a set of interests • “abstractions over clusters of word usages”