150 likes | 388 Views
Atkins-Rundell. The Oxford Guide to Practical Lexicography 2008 Part I Pre-lexicography Dictionary types and dictionary users The birth of a dictionary Types of dictionary Types of dictionary users Tailoring the entry to the user who needs it. The birth of a dictionary (p. 18).
E N D
Atkins-Rundell The Oxford Guide to Practical Lexicography 2008 Part I Pre-lexicography Dictionary types and dictionary users The birth of a dictionary Types of dictionary Types of dictionary users Tailoring the entry to the user who needs it
The birth of a dictionary (p. 18) hugely expensive to produce from scratch pre-lexicography involves decisions taken by senior editor and publisher See table 2.2 (major publisher’s project) pag. 19 Academic projects: Dizionario di Anglicismi in italiano Dictionary of Bioethics Dictionary of Rum
Marketing research (p. 30) www.macmillandictionaries.com free material for teachers to use in the classroom ‘Word of the Week’ e-zine online questionnaires monitoring log-files to check what people have looked up research on dictionary use
Lexicographic evidence (p. 46) • introspection: based on our mental lexicon, necessarily partial, subjective • objective evidence: observing language in use • Rationalism (Chomsky): describe linguistic ‘competence’ (p. 49) • Empiricism (corpus linguists): describe linguistic ‘performance’, or typical, frequent and well-dispersed patterns of language
OED and the collection of citations • www.sba.unito.it blogosphere, stakeholder, hub, spoke • recruiting and training readers • collect slips with citations • storing the data in a computer database
The central role of a corpus (p. 53) • objective evidence of language is a fundamental prerequisite for a reliable dictionary • "A corpus is a collection of pieces of language text in electronic form, selected according to external criteria to represent, as far as possible, a language or language variety as a source of data for linguistic research" (Sinclair 2005: 16) • COBUILD: Spoken and written, non-technical, current, standard British English
Descriptivism vs prescriptivism • Samuel Johnson (1755): ‘to preserve the purity… of our English idiom’ • only ‘writers of the first reputation’ would be Johnson’s data • today a dictionary must provide a genuine snapshot of a language (cf. “dizionario dell’uso”, ‘usage dictionary’) • a dictionary is a bridge between norm (the received rules) and usage (the realization of the rules in authentic language use)
Does a corpus favour high-quality language? • the lexicographer is a historian, not a critic • the BNC is the best pre-web corpus of English • the BNC is a ‘gold standard’ for corpus linguists • 90% written language, 10% spoken • 75% informative 25% imaginative • spoken: 42% demographic, 58% context-governed • British English only • 100 million words
Zipf’s Law 1930s • Harvard linguist involved in word-frequency • “the frequency with which a word appears in a collection of texts is inversely proportional to its ranking in a frequency table” • e.g. was 10th 923,957 hits in BNC at 20th 478,177 The 100 most frequent words in English make up 45% of BNC’s 100 million words
Corpus size • you need a very large corpus to obtain information about rare words • the more data we have, the more information we have • the larger the corpus, the better the lexical profile
Corpus design: how large? • Brown Corpus AmE (1960s) 1 million words • LOB BrE (1960s) • Bank of English (Cobuild) 20 million words • BNC 100 million words • ukWaC 2 billion words (2.000 million words) • Il Giornale del Turismo 150,000 words
Corpus representativeness • A balanced corpus is the ideal objective for lexicographic work • A balanced corpus reflects the diversity of the target language and contains texts that cover the full repertoire of ways in which people use the language • Spoken data: • demographic approach (gender, social class, age, religion, etc.) • context-governed (conversational, educational, business, political, leisure, etc.)
Corpus representativeness • A right- or left-sorted corpus of 100 m. words clearly shows most of the normal patterns of usage for all words except the very rare • to break someone’s service (12 hits in the BNC) • mucosa and unfortunate have the same number of hits in the BNC • a case of skewing: a feature is over- or under-represented (the larger the corpus
Parallel corpora • Translation corpus, e.g. the EU documents • Parallel corpus, e.g. ICE (International Corpus of English) 15 corpora od varieties of English (New Zealand, Indian, Jamaica, etc.)
Collection of corpus data • style (e.g. journalistic) • medium: written, spoken • a corpus consisting of single type of texts will reflect only the stylistic and subject-matter features of that particular genre • web ukWaC Corpus 2 billion words www.sketchengine.co.uk