100 likes | 382 Views
Corpus Linguistics. What is corpus linguistics?. Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or weakening of hypotheses through statistics -> quantitative textual analysis. A little history.
E N D
What is corpus linguistics? • Method / Theory in Linguistics • Analysis of collections of texts (corpora) • Verifying/ Strengthening or weakening of hypotheses through statistics -> quantitative textual analysis
A little history • 1897: Käding analyzed „manually“ a corpus of 11 million words • 1967er: Henry Cucera and Nelson Francis analyzed the Brown Corpus of American English (computer-assisted) • 1970-90: The number of computer-assisted linguistic analyses was doubled / 5yrs • E.g. 2007, latest • update of the BNC = 100-million-word corpus
…Corpus Analysis? • Tools and concordance programs (e.g. Wordsmith, Zaira) • Concordance: a word in ist immediate co-text • Different types of corpora: general (e.g. BNC), specific (some criteria: topic, genre, author…)
Who uses it and where? • Compilation of dictionaries • Linguistist, Language teachers, etc. ->Descriptions of language and it‘s variations (e.g. sociolects, dialects…) • Discourse Analysis, i.e. linguistic phenomena in various discourse types (z.B. news papers, novels,etc.)
Corpus Analysis and CDA • Approach -> Language does not reflect reality – we construct reality through language • Langauge plays an important role in the construction and maintainance of ideologies it neutralizes and naturalizes inequalities of power and money • Corpus Analysis is used as a method to discover certain societal patterns and ideologies (within this approach)
Wordsmith • is acollection of corpus linguistics tools for looking for patterns in a language/discourse/texts(s). • Mike Scott, University of Liverpool, www.lexically.net • Wordlist: orders words in a corpus alphabetically, according to frequency and other statistical information • Concord: concordances, collocates • Keywords: compares the frequency of words within two corpora