340 likes | 515 Views
Corpus analysis (2). Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com. Outline of the session. Lecture Keyword Reference corpus Key keyword Lab WST keyword AntConc keyword Wmatrix keyword / key concept. What is keyword?.
E N D
Corpus analysis (2) Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com
Outline of the session • Lecture • Keyword • Reference corpus • Key keyword • Lab • WST keyword • AntConc keyword • Wmatrix keyword / key concept
What is keyword? • Keywords are those words whose frequency is exceptionally high (positive keywords) or low (negative keywords) in comparison with a reference corpus • Keywords usually refer to positive keywords • But negative keywords are equally interesting (see Xiao and McEnery 2005) • They appear at the very end of your listing, in a different colour in WordSmith • They are omitted automatically from a keywords database and a keyword plot
Why keyword analysis? • Indicating the ‘aboutness’ (Scott 1999) of a particular text or corpus • Contents analysis, discourse analysis • Also revealing the salient features which are functionally related to a particular genre (Xiao and McEnery 2005) • Genre analysis, stylistic analysis
How to do keyword analysis • Make a wordlist of the target corpus • Locate or make a word list of a reference corpus • Scott (2005) “In search of a bad reference corpus” • http://www.methodsnetwork.ac.uk/redist/pdf/es1_05scott.pdf • The reference corpus is usually larger than the target corpus • The appropriateness of a reference corpus depends on your research questions! • Compare the frequency of each item in the two wordlists to extract keywords – done automatically • Analyse and interpret keywords – you will do it!
Keywords in the Blair text • Target corpus – just one text • ‘Why Blair is so determined not to run into sands’ (The Times, 16th November 2005) • http://www.timesonline.co.uk/tol/news/politics/article590683.ece • Local copy available • Reference corpus • The 100-million-word BNC • Tool • WST Keyword
Wordlists of the Blair text and the BNC BNC list: www.lexically.net/downloads/version4/BNC_World.zip
Keyword extraction in progress Warning: It can take time if you have loaded two large wordlists
Keyword: Plot view Plot view
Companies of “public” The most frequent company of the keyword “public” is...?
Key clusters key clusters
Key keywords • A key keyword is one which is "key" in more than one of a number of related texts • The more texts it is "key" in, the more "key key" it is • Can avoid extracting keywords which are unusually frequent in only a small number of files • Can be created automatically and as simple to extract as keywords • n.b. Negative keywords are omitted automatically from a key keyword list
Key keywords • An "associate" is a key word that appears in the same text An "associate" is a keyword that appears in the same text key coverage of the corpus
Keyword in AntConc target corpus reference corpus
Keyword in AntConc Blair text against "Hard Cash"
Wmatrix: Keywords and key concepts • POS and semantic tagging in session 4 • Keyword / key concept analysis the manifestoes of Labour and Libdem • Labour • http://ucrel.lancs.ac.uk/wmatrix/tutorial/labour%20manifesto%202005.pdf • Libdem • http://ucrel.lancs.ac.uk/wmatrix/tutorial/libdem%20manifesto%202005.pdf • Saved as plain text files (local copies available) • Login with your account • http://ucrel.lancs.ac.uk/wmatrix2.html
“My folders” Upload and tag the Libdem text …and click on “My folders” Warning: Your folder view may look different!
Open Labour folder and select libdem in “keyword compared to” dropdown box