160 likes | 256 Views
The role of the Computer in Learner Lexicography. early 1970s ALD3 1974 the dictionary was turned into a printed book the computer was not used for data-gathering or editorial work 1990s: large-scale corpora and machine-readable dictionaries The British National Corpus 117m. words
E N D
The role of the Computer in Learner Lexicography • early 1970s • ALD3 1974 the dictionary was turned into a printed book • the computer was not used for data-gathering or editorial work • 1990s: large-scale corpora and machine-readable dictionaries • The British National Corpus 117m. words • The COBUILD Collins-Birmingham University International Language Database Bank of English 320m. words
From machine-readable to corpus-based dictionaries • ALD 3 1974 MRD • LDOCE 1978: information categories were ‘flagged’ so that the dictionary was a lexical database from which information could be extracted • COBUILD 1 1987: the computer was used for data-gathering, entry-preparation and compilation (selection of senses and examples)
Computer corpora • Computers speeded up the process of gathering large bodies of authentic examples • Computer corpora are collections of texts stored in machine-readable form • texts can be captured from electronic sources (newspapers, documents), scanned by an optical reading machine or keyed into a PC
The COBUILD corpus • John Sinclair at Birmingham University • The corpus was based on carefully measured samples of a range of varieties and discourse types (balanced corpus) • These texts would be relevant to international users • Spoken and written, non-technical, current, standard British English • Funded by Collins publishers Collins Cobuild English Dictionary
The balanced corpus • contains a measured sample of a range of varieties and discourse types • coverage of standard, core vocabulary • representation of subject areas of current interest • no systematic coverage of scientific and technical varieties
concordancing • data should be quickly retrievable • KWIC Key word in context • concordance software/tool TextSTAT • concordancer, a concordance, a concordance string, the node, the node word, a concordance (on-screen) display • ‘raw’ data: spelling forms • lemmatizer: a tool designed to gather together inflected forms of the same lexeme
tools • lemmatizer (give, gives, gave, giving) • grammatical tagging program: each item is given a grammatical category label so that it becomes possible to extract data according to grammatical classes (hard as an adj. or hard as an adv.) • a ‘parsed’ corpus is syntactically annotated (only parsed sub-corpora or ‘treebanks’ are available because of the complexity of parsing a text)
the lexicographic workstation • resources available to the lexicographer for dictionary-making: • a lexical database (LDB) with structured and formalized information at entry level and between entries (cross-references) • a concordanced corpus • archives • pre-existing dictionaries in machine-readable form
Impact of corpus linguistics on EFL lexicography • huge impact • importance of frequency of occurrence in a corpus for inclusion or non-inclusion in a learner’s dictionary • as a consequence, priority to core vocabulary and heavy-duty words (cf. Palmer and Hornby in 1930s)
J. Sinclair Corpus, Concordance and Collocation 1991 • importance of large-scale corpora for the retrieval of linguistic information • context as a chief determinant of meaning • open choice vs idiom principle • illustrative examples
the open-choice principle and the idiom principle • How does meaning arise from text? • Open choice principle: the combination of words in text is only governed by grammaticalness • Idiom principle: the combination of words is determined by the existence of semi-preconstructed phrases that constitute single choices (I see) • Palmer had come to the same conclusion 60 years before
Authentic and made-up examples • Hornby strongly supported invented examples because they can be better shaped to meet learners’ needs • Sinclair strongly supported authentic examples as they better illustrate usage and guide composition (encoding) – they both explain the meaning and serve as models for speaking and writing • Authentic examples must be adapted anyway and adjusted to the physical limits of the dictionary
Problems with authentic examples • they often reveal their full meaning with reference to a wider context • they may contain words that are difficult to understand or more difficult than the item being defined • e.g. “The children hadn’t been well, cooped up in a London flat she had procured at short notice”
Usability of invented/edited examples • Palmer (GEW) had introduced the ‘listing’ of alternative words or phrases (e.g. a historic spot/event/speech) • Hornby had used ‘simplification’ in ISED: the reduction of a predicate or phrase pattern to a structural minimum (e.g. to repay kindness, to shiver with cold) • a full sentence example may provide superflous details in the name of authenticity
composed allowed dark (attributive/predicative) company book/reserve twitter remind/remember Do you mind…? adjective patterns leave n. (collocations) rain (collocations) approximately, about, roughly research n. v. nearly/almost even though/even if whole look/see false friends (eventually) activities
Notes on the final paper • NB (for students who did not attend the course) • The activities suggested in the previous slide are just ideas for the final paper. A skeleton sample paper is presented in the file composed_activity (“The syntactic pattern of the lexical item ‘composed’”) • A corpus search can be done using the British National Corpus (the COCA or the TIME corpus) which can be accessed at http://corpus.byu.edu/bnc/