450 likes | 637 Views
Between Corpus and Dictionary. Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex. What is a word sense?. Preliminaries. What is language? What is meaning?. What is language?. What is language? In our heads. What is language? In our heads
E N D
Between Corpus and Dictionary Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex
What is a word sense? Kilgarriff, Global WordNet
Preliminaries • What is language? • What is meaning? Kilgarriff, Global WordNet
What is language? Kilgarriff, Global WordNet
What is language? • In our heads Kilgarriff, Global WordNet
What is language? • In our heads • In texts and sound signals Kilgarriff, Global WordNet
What is language? • In our heads • In texts and sound signals • Both Kilgarriff, Global WordNet
Methodology • Study language in our heads • Introspection • Semantic analysis • Experiments with human subjects • “rationalist” (Leibniz, Chomsky) • Problems: coverage, arbitrariness Kilgarriff, Global WordNet
Methodology • Study text • “empiricist” (Locke, Hume) • Physics: forces, matter • Chemistry: chemicals, bonds • Language: text, speech signals Kilgarriff, Global WordNet
It goes against the grain • What is important about a sentence? • its meaning • Corpus methodology: • Throw away individual sentence meaning • Find patterns Kilgarriff, Global WordNet
Empiricist linguistics • A new way to find out about language • 15 years of rapid ascent • Computers • Corpora • bigger and bigger data sets available • Language technology tools • lemmatizers, POS-taggers, parsers, machine learning for pattern finding Kilgarriff, Global WordNet
Rationalists vs empiricists in the age of the web • semantic web vs Google? Kilgarriff, Global WordNet
What are you? • Temperament • Complementary/alternatives • Barbu and Poesio, Keller and Lapata: comparisons, evaluations • (AK: current research project) Kilgarriff, Global WordNet
What is meaning? • Fregean • Gricean Kilgarriff, Global WordNet
Gottlob Frege (1848-1925) • Founder of modern logic • Truth values • The sentence “grass is green” is true if and only if grass is green (Tarski) • Meanings of words, phrases are such that: • Put them together in a sentence • State basic facts • Sentence computes to ‘true’ if sentence is true, ‘false’ if it is false Kilgarriff, Global WordNet
Gottlob Frege (1848-1925) • Formal semantics • Sparkling analyses for quantifiers, connectives • Montague semantics • Foundations for maths, databases, ontologies … Kilgarriff, Global WordNet
H. P. Grice (1913-1988) An agent means something by an utterance if and only if they intended the utterance to produce some effect in an audience by means of the recognition of this intention. Dictionary of Philosophy of Mind, http://philosophy.uwaterloo.ca Kilgarriff, Global WordNet
Meaning is something you do • Basis of meaning is • Meaning event • Speaker’s intention • Speaker’s expectation of interpretation of hearer • (messy, hard) Kilgarriff, Global WordNet
Strawson commentary (1970s) For the sake of a label, we might call it the conflict between the theorists of communication-intention and the theorists of formal semantics. […] A struggle on what seems to be such a central issue in philosophy should have something of a Homeric quality; and a Homeric struggle calls for gods and heroes. I can at least, though tentatively, name some living captains and benevolent shades: on the one side, say, Grice, Austin, and the later Wittgenstein; on the other, Chomsky, Frege, and the earlier Wittgenstein. Kilgarriff, Global WordNet
Battle of the two Adams? Kilgarriff, Global WordNet
Relevance to word senses • Fregean • Supports reasoning • Builds on well-defined word-meanings • Identifying word meanings: can’t help • Fall back on Grice Kilgarriff, Global WordNet
Fauconnier and Turner • “linguistics expressions prompt for meanings rather than express meanings” (AK chapter, Agirre and Edmonds WSD book) Kilgarriff, Global WordNet
Preliminaries over • What is a word sense Kilgarriff, Global WordNet
The lexicographers • They create them • Methods • Introspection • Other dictionaries • Corpus • Atkins, Hanks, Krishnamurthy Kilgarriff, Global WordNet
What is a word sense (1) • SFIP • Sufficiently frequent insufficiently predictable • (a glass of) whisky • x (a glass of) tequila Kilgarriff, Global WordNet
What is a word sense (2) homonymy analogy polysemy rules collocation Kilgarriff, Global WordNet
What is a word sense (3) • A cluster • Of instances of use • Operationalised as: corpus lines • Clustered by lexicographers Kilgarriff, Global WordNet
What is a word sense (3) Kilgarriff, Global WordNet
What is a word sense (3) Kilgarriff, Global WordNet
What is a word sense (3) Kilgarriff, Global WordNet
What is a word sense (3) Kilgarriff, Global WordNet
What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers Makes sense of Overlapping senses Different dictionaries, different senses Lumping and splitting Kilgarriff, Global WordNet
I don’t believe in word senses • Believe in: • resurrection ghost witch vampire god miracle fairy • Philosophy: • Ontological commitment • (same meaning different register) • “good entities to build belief systems on” Kilgarriff, Global WordNet
But I’m an NLP person • Automatic clustering? • Inspiration: • Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999 • You can get semantic sense from corpora+stats Kilgarriff, Global WordNet
First attempt • Longman 1994 • Abject failure • No grammar • Corpus too small and noisy • Naïve clustering • Useless programmer Kilgarriff, Global WordNet
Collocations • Easy • Most words don’t go with most other words • Then build on what we can do well • (metaphor, analogy, homonymy, rules: all much harder) Kilgarriff, Global WordNet
The Sketch Engine • 2003: programmer problem solved • Corpora • More available • Build big clean ones from web • Grammar • POS-taggers/lemmatisers available • Shallow regexp grammars if no full parser • Stats: progress (Lin, Curran, Evert …) Kilgarriff, Global WordNet
demo Kilgarriff, Global WordNet
Clustering • Word sketch • Collocates organised by grammar • Dictionary • Collocates (and other things) organised by meaning • How to re-organise • Three phases Kilgarriff, Global WordNet
Semi-automatic dictionary drafting (SADD) • Automatic clustering of collocates • Propose senses • Iterate: • Lexicographer input • Confirm/reject/edit sense inventory • Assigns collocates / corpus lines to senses • WSD • Uses seeds to build full WSD for word • Find more collocates for each sense • XML dictionary entry • Load into dictionary-editing tool Kilgarriff, Global WordNet
Atkins method for bilingual lexicography • Analyse source language • From corpus • List all expressions that might possibly have a non-predictable translation • Very fine grained • Lots of collocations • target-language-neutral; re-usable • Translate • Edit to finalise dictionary Kilgarriff, Global WordNet
New English-Irish Dictionary • Irish: • Gaelic language, some native speakers, culturally important for Ireland • Project • To replace dictionary from 1950s • Government-funded project • Lexicography MasterClass (Atkins Rundell Kilgarriff) designed project in 2003 Kilgarriff, Global WordNet
English analysis for NEID • New project, 1st Feb 2008- late 2010 • Contractor: Lexicography MasterClass • 12 lexicographers • Plan • Test SADD • If viable, use it on industrial scale Kilgarriff, Global WordNet
demo2 • http://corpora.fi.muni.cz/sadd/ Kilgarriff, Global WordNet
Thank you • Sketch Engine: • http://www.sketchengine.co.uk • Lexicom workshop • Pre-Euralex, 10-15 July, Barcelona • http://www.iula.upf.edu/agenda/lexicom • Pre-CICLING, Mexico, Feb 2009 Kilgarriff, Global WordNet