1 / 32

Lexical Relations and WordNet

Lexical Relations and WordNet. Ray Larson & Warren Sack University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Lecture author: Warren Sack. Last Time. What is Cognitive Science? What is Artificial Intelligence?

wauna
Download Presentation

Lexical Relations and WordNet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lexical Relations and WordNet Ray Larson & Warren Sack University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Lecture author: Warren Sack IS202: Information Organization and Retrieval

  2. Last Time • What is Cognitive Science? • What is Artificial Intelligence? • Knowledge Representation • Languages and Programming Paradigms • Representing Common Sense • Common Sense Interfaces • Story Understanding, Story Generation, and Common Sense IS202: Information Organization and Retrieval

  3. Cognitive Science • 10/30/01 – AI, knowledge representation and common sense • 11/01/01 – Computational Linguistics, Cognitive Psychology and Lexical Knowledge • 11/06/01 – AI and information extraction • 11/08/01 – Linguistics, Philosophy, Psychology, categories, and cognition IS202: Information Organization and Retrieval

  4. Today • Lexical relations • Linguistics • Two approaches to semantics: • Compositional • Relational • Psycholinguistics • WordNet • Description • Structure • Applications IS202: Information Organization and Retrieval

  5. Levels of Linguistic Analysis • Sentences • Phonological/Morphological analysis • Syntactic analysis • Semantic analysis • More than one sentence • Pragmatic analysis IS202: Information Organization and Retrieval

  6. Phonology/Morphology • Phonology: The study of the systems of sounds which are manifested in natural languages; the significant contrasts between sounds that are relevant to meaning. • E.g., consonants, vowels, stress, intonation, etc. • Morphology: the forms of words • E.g., word=watched; morphs=watch+ed; morphemes=watch+past IS202: Information Organization and Retrieval

  7. Syntax The syntax of a language is to be understood as a set of rules which accounts for the distribution of word forms throughout the sentences of a language. These rules codify permissible combinations of classes of word forms. IS202: Information Organization and Retrieval

  8. Semantics • Semantics is the study of linguistic meaning. • Two standard approaches to lexical semantics (cf., sentential semantics; and, logical semantics): • (1) compositional • (2) relational • Other approaches… IS202: Information Organization and Retrieval

  9. Pragmatics • Deixis • E.g., “I’ll be back in an hour” depends upon the time of the utterance. • Conversational implicature • A: “Can you tell me the time?” • B: “Well, the milkman has come.” [I don’t know exactly, but perhaps you can deduce it from some extra information I give you.] • Presupposition • “Are you still such a bad driver?” • Speech acts • Constatives vs. performatives • e.g., “I second the motion.” • Conversational Structure • E.g., turn-taking rules IS202: Information Organization and Retrieval

  10. Lexical Semantics: Compositional Approach • Compositional lexical semantics, introduced by Katz & Fodor (1963), analyzes the meaning of a word in much the same way a sentence is analyzed into semantic components. The semantic components of a word are not themselves considered to be words, but are abstract elements (semantic atoms) postulated in order to describe word meanings (semantic molecules) and to explain the semantic relations between words. For example, the representation of bachelor might be ANIMATE and HUMAN and MALE and ADULT and NEVER MARRIED. The representation of man might be ANIMATE and HUMAN and MALE and ADULT; because all the semantic components of man are included in the semantic components of bachelor, it can be inferred that bachelor  man. In addition, there are implicational rules between semantic components, e.g. HUMAN  ANIMATE, which also look very much like meaning postulates. • George Miller, “On Knowing a Word,” 1999 IS202: Information Organization and Retrieval

  11. Lexical Semantics: Relational Approach • Relational lexical semantics was first introduced by Carnap (1956) in the form of meaning postulates, where each postulate stated a semantic relation between words. A meaning postulate might look something like dog  animal (if x is a dog then x is an animal) or, adding logical constants, bachelor  man and never married [if x is a bachelor then x is a man and not(x has married)] or tall  not short [if x is tall then not(x is short)]. The meaning of a word was given, roughly, by the set of all meaning postulates in which it occurs. • George Miller, “On Knowing a Word,” 1999 IS202: Information Organization and Retrieval

  12. Psycholinguistics • The introduction of Noam Chomsky’s theory of syntax to psychologists: • Miller, G.A., Galanter, E., Pribram, K.H. (1960) Plans and the Structure of Behavior. • Some areas of psycholinguistics: • Children’s acquisition of language • First and second language learning • Artificial intelligence? (see Lyons, 1981) IS202: Information Organization and Retrieval

  13. WordNet • Started in 1985 by George Miller, students, and colleagues at the Cognitive Science Laboratory, Princeton University • Can be downloaded for free: www.cogsci.princeton.edu/~wn/ • In terms of coverage, WordNet’s goals differ little from those of a good standard college-level dictionary, and the semantics of WordNet is based on the notionof word sense that lexicographers have traditionally used in writing dictionaries. It is in the organization of that information that WordNet aspires to innovation. (Miller, 1998, chapter 1) IS202: Information Organization and Retrieval

  14. Presuppositions of WordNet project • Separability hypothesis: T The lexical component of language can be separated and studied in its own right. • Patterning hypothesis: People have knowledge of the systematic patterns and relations between word meanings. • Comprehensiveness hypothesis: Computational linguistics programs need a store of lexical knowledge that is as extensive as that which people have. IS202: Information Organization and Retrieval

  15. WordNet structure • Synsets versus Words IS202: Information Organization and Retrieval

  16. WordNet: Size POS Unique Synsets Strings Noun 107930 74488 Verb 10806 12754 Adjective 21365 18523 Adverb 4583 3612 Totals 144684 109377 IS202: Information Organization and Retrieval

  17. Structure of WordNet IS202: Information Organization and Retrieval

  18. Structure of WordNet IS202: Information Organization and Retrieval

  19. Structure of WordNet IS202: Information Organization and Retrieval

  20. Unique Beginners • { entity, something, (anything having existence (living or nonliving)) } • { psychological_feature, (a feature of the mental life of a living organism) } • { abstraction, (a general concept formed by extracting common features from specific examples) } • { state, (the way something is with respect to its main attributes; "the current state of knowledge"; "his state of health"; "in a weak financial state") } • { event, (something that happens at a given place and time) } • { act, human_action, human_activity, (something that people do or cause to happen) } • { group, grouping, (any number of entities (members) considered as a unit) } • { possession, (anything owned or possessed) } • { phenomenon, (any state or process known through the senses rather than by intuition or reasoning) } IS202: Information Organization and Retrieval

  21. Roget’s “Unique Beginners” The ontology of Roget’s is headed by six Classes. The first three Classes cover the external world: Abstract Relations deals with such ideas as number, order and time; Space is concerned with movement, shapes and sizes, while Matter covers the physical world and humankind’s perception of it by means of five senses. The remaining Classes deal with the internal world of human beings: the mind (Intellect), the will (Volition), the heart and soul (Emotion, Religion and Morality). There is a logical progression from abstract concepts, through the material universe, to mankind itself, culminating in what Roget saw as mankind’s highest achievements: morality and religion (Kirkpatrick, 1998). Class Four, Intellect, is divided into Formation of ideas and Communication of ideas, and Class Five, Volition, into Individual volition and Social volition. In practice, therefore, the Thesaurus is headed by eight Classes. A path in Roget’s ontology always begins with one of the Classes. It branches to one of the 39 Sections and then to one of the 990 Heads. Each Head is divided into paragraphs grouped by parts of speech: nouns, adjectives, verbs and adverbs. From Mario Jarmasz, Stan Szpakowicz, “Roget’s Thesaurus as an Electronic Lexical Knowledge Base,” 2000. IS202: Information Organization and Retrieval

  22. WordNet Browsers • http://www.cogsci.princeton.edu/cgi-bin/webwn • http://bogart.sip.ucm.es/~jorge/browser.htm • http://www.visualthesaurus.com/ IS202: Information Organization and Retrieval

  23. Other WordNetshttp://www.hum.uva.nl/~ewn/gwa/wordnet_table.htm • Dutch • Spanish • Italian • German • French • Czech • Estonian IS202: Information Organization and Retrieval

  24. Bengali Bulgarian Danish Greek Hebrew Hindi Hindi Kannada Latvian Moldavian Romanian Russian Slovenian Swedish Tamil Thai Turkish Yugoslavian Norwegian Icelandic Forthcoming WordNetshttp://www.hum.uva.nl/~ewn/gwa/wordnet_table.htm IS202: Information Organization and Retrieval

  25. Psycholinguistic evidence for WordNet’s structure • Bever and Rosenbaum, 1970: • A pistol is more dangerous than a rifle. • * A pistol is more dangerous than a gun. • * A gun is more dangerous than a pistol. • Resnik, 1993 • The direct object of the verb drink can be any hyponym of the noun berverage. • Collins and Quillian, 1969 • The time required to verify the statement “A robin is a bird” is shorter than the time required to verify the statement “A robin is an animal.” IS202: Information Organization and Retrieval

  26. Psycholinguistic evidence against WordNet’s structure • Smith and Medin, 1981 • The time required to verify that a chicken is a bird is significantly longer than the time required to verify that a robin is a bird, even though chick and robin stand in the same taxonomic relation to bird. • Rosch, 1973 • Ratings of “typicality” have little to do with frequency or familiarity. • Lakoff, 1987 • Concepts are represented, not by a list of distinguishing features, but by the focal instances (or prototypes) that are the best examples of the prototype. IS202: Information Organization and Retrieval

  27. WordNet Applications • Using WordNet as a data structure. Many languages used by computational linguists and natural language processing researchers now have WordNet packages. E.g., for Perl • Lingua::Wordnet, and • Lingua::Wordnet::Analysis by Dan Brian, http://search.cpan.org/search?dist=Lingua-Wordnet IS202: Information Organization and Retrieval

  28. WordNet Applications • Information Retrieval: Voorhees, 1998 • Query expansion via synsets • “sense-based” rather than “stem-based” vectors • Unfortunately, in both cases, the inability to automatically resolve word senses prevented any improvement from being made. IS202: Information Organization and Retrieval

  29. WordNet Applications • Textual Cohesion and the correction of Malapropisms: Hirst and St-Onge, 1998 Malapropism = the confounding of an intended word with another word of similar sound or similar spelling that has a quite different meaning; e.g., “Super bowl  Superb owl” IS202: Information Organization and Retrieval

  30. WordNet Applications • Temporal Indexing through lexical chaining: Al-Halimi and Kazman, 1998 • Indexing transcripts of conference meetings by topic. IS202: Information Organization and Retrieval

  31. WordNet Applications • Conversation themes in Usenet: Sack, 2000 IS202: Information Organization and Retrieval

  32. Next Time • Information Extraction, Artificial Intelligence, and “Story Understanding” Revisited IS202: Information Organization and Retrieval

More Related