10 likes | 177 Views
2. Survey and Definition Recursion
E N D
2. Survey and Definition Recursion In the survey, participants were given twenty pairs of words, and asked to rate which one was the more abstract, by moving the slider towards it, as shown in Figure 1 below. Each word was scored against several others, and from these results, each controlled vocabulary was split into concrete and abstract subsets. In order to effectively ground words whose referents have not been experienced in the physical world, one can use combinations of pre-grounded words. For example, if “horse”, “stripes”, and relevent logical operators are already known, “zebra” can be accquired: “a horse with stripes”. Definitions can be recursed through in a tree structure, showing the words that can be efffectively grounded with a given starting set. The much-simplified diagram of a “recursion tree” below shows a path through dictionary definitions, starting at the word “chair”. Fig 1: Example screenshot of concreteness/abstractness survey Fig 3: CIDE Concrete Words Fig 4: CIDE Abstract Words Fig 5: LDOCE Concrete Words Fig 6: LDOCE Abstract Words Fig 2: Example definition recursion tree starting at “chair” Fig 7: Proportion of concrete and abstract words in CIDE and LDOCE, respectively Fig 8: Mean number of wods per tree level for CIDE, starting with concrete words Fig 11: Definition Length Frequency Distribution for CIDE Fig 9: Mean number of wods per tree level for CIDE, starting with abstract words Fig 10: Mean number of wods per tree level for LDOCE, starting with concrete words Fig 12: Definition Length Frequency Distribution for LDOCE Fig 11: Mean number of wods per tree level for LDOCE, starting with abstract words Analysing Word Concreteness and Abstractness in Dictionary Definitions Graham Clark, Stevan Harnad, Les Carr Intelligence, Agents, Multimedia Group Department of Electronics and Computer Science University of Southampton 1. Introduction The Symbol Grounding Problem (Harnad 1990, Harnad 2002) indicates that vocabulary must be grounded in the real, physical world in order for the words to have meaning in one's mind. But when words have been grounded in this way, how can they develop into a full vocabulary? Looking at dictionaries which use controlled vocabularies to define all the words within them (all words used in the definitions are from a specified subset of the dictionary) could give some idea as to how new words can effectively be grounded by using a small set of pre-grounded terms. In this investigation, two corpora have been used, the Longman Dictionary of Contemporary English (Longman 1997) and the Cambridge International Dictionary of English (Cambridge 1995). A Web-based survey was conducted in order to categorise the words in the two controlled vocabularies as “concrete” or “abstract”. Concrete words are those which refer to things that can be seen, felt or touched, for example, "tree", "bird" or "flower". Abstract words are those which refer to things and properties of things that are more general or conceptual, such as "goodness", "truth" or “abstractness”. 3. Parts of Speech Figures 3-6 below show the part-of-speech “make-up” of the concrete and abstract words from the controlled vocabularies of both corpora. The majority of concrete words are nouns – these can be easily physically pointed out to someone, and hence grounded in the real world. Abstract words cover a much wider range of parts-of-speech, so more would have to be “effectively grounded” through internal processes, perhaps similar to the definition recursion described previously. 4. Concreteness and Abstractness in Recursion Five concrete and five abstract words were taken from each dictionary, and recursive definition trees were built. Figures 7-10 show that many more abstract words are used in definitions that concrete. Each point on the graphs represents the mean number of abstract, concrete or unknown words at each level of the tree. Unknown words account for those which are not present in the controlled vocabulary, or those which do not exactly match a headword. All words in the corpora were stemmed; this greatly reduced the count of unknown words. The mean number of words at each tree level has been scaled to take into account the smaller proportion of concrete words to abstract. 5. Definition Length The number of words in a definition (the definition length) is an indication of how many terms must be pre-grounded in order for it to be understood. Figures 11 and 12 show frequency distribution graphs of the definition length for the LDOCE and the CIDE. The frequencies have been scaled to take into acount the smaller proportion of concrete words to abstract. 6. References Cambridge (1995). Cambridge International Dictionary of English, CIDE+ edition (electronic version), Cambridge University Press. Harnad (1990). The symbol grounding problem. Physica, 42, 335-346. Harnad (2002). Symbol grounding and the origin of language. In Scheutz, M. (Ed.) Computationalism: New Directions. MIT Press, 143-158. Longman (1997). Longman Dictionary of Contemporary English (LDOCE), 3rd edition (electronic version), Addison Wesley Longman.