1 / 45

Between Corpus and Dictionary

Between Corpus and Dictionary. Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex. What is a word sense?. Preliminaries. What is language? What is meaning?. What is language?. What is language? In our heads. What is language? In our heads

mahala
Download Presentation

Between Corpus and Dictionary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Between Corpus and Dictionary Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex

  2. What is a word sense? Kilgarriff, Global WordNet

  3. Preliminaries • What is language? • What is meaning? Kilgarriff, Global WordNet

  4. What is language? Kilgarriff, Global WordNet

  5. What is language? • In our heads Kilgarriff, Global WordNet

  6. What is language? • In our heads • In texts and sound signals Kilgarriff, Global WordNet

  7. What is language? • In our heads • In texts and sound signals • Both Kilgarriff, Global WordNet

  8. Methodology • Study language in our heads • Introspection • Semantic analysis • Experiments with human subjects • “rationalist” (Leibniz, Chomsky) • Problems: coverage, arbitrariness Kilgarriff, Global WordNet

  9. Methodology • Study text • “empiricist” (Locke, Hume) • Physics: forces, matter • Chemistry: chemicals, bonds • Language: text, speech signals Kilgarriff, Global WordNet

  10. It goes against the grain • What is important about a sentence? • its meaning • Corpus methodology: • Throw away individual sentence meaning • Find patterns Kilgarriff, Global WordNet

  11. Empiricist linguistics • A new way to find out about language • 15 years of rapid ascent • Computers • Corpora • bigger and bigger data sets available • Language technology tools • lemmatizers, POS-taggers, parsers, machine learning for pattern finding Kilgarriff, Global WordNet

  12. Rationalists vs empiricists in the age of the web • semantic web vs Google? Kilgarriff, Global WordNet

  13. What are you? • Temperament • Complementary/alternatives • Barbu and Poesio, Keller and Lapata: comparisons, evaluations • (AK: current research project) Kilgarriff, Global WordNet

  14. What is meaning? • Fregean • Gricean Kilgarriff, Global WordNet

  15. Gottlob Frege (1848-1925) • Founder of modern logic • Truth values • The sentence “grass is green” is true if and only if grass is green (Tarski) • Meanings of words, phrases are such that: • Put them together in a sentence • State basic facts • Sentence computes to ‘true’ if sentence is true, ‘false’ if it is false Kilgarriff, Global WordNet

  16. Gottlob Frege (1848-1925) • Formal semantics • Sparkling analyses for quantifiers, connectives • Montague semantics • Foundations for maths, databases, ontologies … Kilgarriff, Global WordNet

  17. H. P. Grice (1913-1988) An agent means something by an utterance if and only if they intended the utterance to produce some effect in an audience by means of the recognition of this intention. Dictionary of Philosophy of Mind, http://philosophy.uwaterloo.ca Kilgarriff, Global WordNet

  18. Meaning is something you do • Basis of meaning is • Meaning event • Speaker’s intention • Speaker’s expectation of interpretation of hearer • (messy, hard) Kilgarriff, Global WordNet

  19. Strawson commentary (1970s) For the sake of a label, we might call it the conflict between the theorists of communication-intention and the theorists of formal semantics. […] A struggle on what seems to be such a central issue in philosophy should have something of a Homeric quality; and a Homeric struggle calls for gods and heroes. I can at least, though tentatively, name some living captains and benevolent shades: on the one side, say, Grice, Austin, and the later Wittgenstein; on the other, Chomsky, Frege, and the earlier Wittgenstein. Kilgarriff, Global WordNet

  20. Battle of the two Adams? Kilgarriff, Global WordNet

  21. Relevance to word senses • Fregean • Supports reasoning • Builds on well-defined word-meanings • Identifying word meanings: can’t help • Fall back on Grice Kilgarriff, Global WordNet

  22. Fauconnier and Turner • “linguistics expressions prompt for meanings rather than express meanings” (AK chapter, Agirre and Edmonds WSD book) Kilgarriff, Global WordNet

  23. Preliminaries over • What is a word sense Kilgarriff, Global WordNet

  24. The lexicographers • They create them • Methods • Introspection • Other dictionaries • Corpus • Atkins, Hanks, Krishnamurthy Kilgarriff, Global WordNet

  25. What is a word sense (1) • SFIP • Sufficiently frequent insufficiently predictable • (a glass of) whisky • x (a glass of) tequila Kilgarriff, Global WordNet

  26. What is a word sense (2) homonymy analogy polysemy rules collocation Kilgarriff, Global WordNet

  27. What is a word sense (3) • A cluster • Of instances of use • Operationalised as: corpus lines • Clustered by lexicographers Kilgarriff, Global WordNet

  28. What is a word sense (3) Kilgarriff, Global WordNet

  29. What is a word sense (3) Kilgarriff, Global WordNet

  30. What is a word sense (3) Kilgarriff, Global WordNet

  31. What is a word sense (3) Kilgarriff, Global WordNet

  32. What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers Makes sense of Overlapping senses Different dictionaries, different senses Lumping and splitting Kilgarriff, Global WordNet

  33. I don’t believe in word senses • Believe in: • resurrection ghost witch vampire god miracle fairy • Philosophy: • Ontological commitment • (same meaning different register) • “good entities to build belief systems on” Kilgarriff, Global WordNet

  34. But I’m an NLP person • Automatic clustering? • Inspiration: • Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999 • You can get semantic sense from corpora+stats Kilgarriff, Global WordNet

  35. First attempt • Longman 1994 • Abject failure • No grammar • Corpus too small and noisy • Naïve clustering • Useless programmer Kilgarriff, Global WordNet

  36. Collocations • Easy • Most words don’t go with most other words • Then build on what we can do well • (metaphor, analogy, homonymy, rules: all much harder) Kilgarriff, Global WordNet

  37. The Sketch Engine • 2003: programmer problem solved • Corpora • More available • Build big clean ones from web • Grammar • POS-taggers/lemmatisers available • Shallow regexp grammars if no full parser • Stats: progress (Lin, Curran, Evert …) Kilgarriff, Global WordNet

  38. demo Kilgarriff, Global WordNet

  39. Clustering • Word sketch • Collocates organised by grammar • Dictionary • Collocates (and other things) organised by meaning • How to re-organise • Three phases Kilgarriff, Global WordNet

  40. Semi-automatic dictionary drafting (SADD) • Automatic clustering of collocates • Propose senses • Iterate: • Lexicographer input • Confirm/reject/edit sense inventory • Assigns collocates / corpus lines to senses • WSD • Uses seeds to build full WSD for word • Find more collocates for each sense • XML dictionary entry • Load into dictionary-editing tool Kilgarriff, Global WordNet

  41. Atkins method for bilingual lexicography • Analyse source language • From corpus • List all expressions that might possibly have a non-predictable translation • Very fine grained • Lots of collocations • target-language-neutral; re-usable • Translate • Edit to finalise dictionary Kilgarriff, Global WordNet

  42. New English-Irish Dictionary • Irish: • Gaelic language, some native speakers, culturally important for Ireland • Project • To replace dictionary from 1950s • Government-funded project • Lexicography MasterClass (Atkins Rundell Kilgarriff) designed project in 2003 Kilgarriff, Global WordNet

  43. English analysis for NEID • New project, 1st Feb 2008- late 2010 • Contractor: Lexicography MasterClass • 12 lexicographers • Plan • Test SADD • If viable, use it on industrial scale Kilgarriff, Global WordNet

  44. demo2 • http://corpora.fi.muni.cz/sadd/ Kilgarriff, Global WordNet

  45. Thank you • Sketch Engine: • http://www.sketchengine.co.uk • Lexicom workshop • Pre-Euralex, 10-15 July, Barcelona • http://www.iula.upf.edu/agenda/lexicom • Pre-CICLING, Mexico, Feb 2009 Kilgarriff, Global WordNet

More Related