1 / 49

Clearing the undergrowth and marking out trails…

Clearing the undergrowth and marking out trails…. Mike Scott, School of English University of Liverpool. …challenges in investigating Keyness. Keyness in Text Conference Certosa di Pontignano, Siena 9:00-10:00, 29 June 2007

Download Presentation

Clearing the undergrowth and marking out trails…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clearing the undergrowth and marking out trails… Mike Scott, School of English University of Liverpool …challenges in investigating Keyness Keyness in Text Conference Certosa di Pontignano, Siena 9:00-10:00, 29 June 2007 This presentation is at www.lexically.net/downloads/corpus_linguistics Keyness

  2. Purpose • To explore the notion of keyness • and its implications in corpus-based study • with reference to WordSmith Keyness

  3. Overview Keyness, as a new territory, looks promising and has attracted colonists and prospectors. It generally appears to give robust indications of the text’s aboutness together with indicators of style. Keyness

  4. the text’s aboutness Keyness

  5. colonists … Keyness

  6. and prospectors Keyness

  7. Issues • the issue of text section v. text v. corpus v. sub-corpus • statistical questions: what exactly can be claimed? • how to choose a reference corpus • handling related forms such as antonyms Keyness

  8. Machine and Human KWS • Rigotti and Rocci (2002) warn that machine identification of key words omits all interpretation of the writer’s intentions, cannot get at cultural implications and does not spot the congruity of the meanings of each section with the next. Keyness

  9. metaphors • “In our view, a natural language text, slippery and vague as it may be, is not a stone soup where words float free, tied only to their multiple associations within a Foucoultian discourse” (Rigotti and Rocci, 2002) Keyness

  10. Of course it doesn’t actually understand… Keyness

  11. … or know what is “correct” Keyness

  12. … only look at what is found in text … or context … whether marked up or not … <intro>Once upon a time ….</intro> Keyness

  13. Context? Keyness

  14. Keyness

  15. If so • what is the status of the “key words” one may identify and what is to be done with them? Keyness

  16. Issues • the issue of text section v. text v. corpus v. sub-corpus • statistical questions: what exactly can be claimed? • how to choose a reference corpus • handling related forms such as antonyms • what is the status of the “key words” one may identify and what is to be done with them? Keyness

  17. text section v. text v. corpus v. sub-corpus • text section: levels 1-5 • text: level 6 • corpus: levels 7 & 8 Keyness

  18. But these are often not clearly differentiated • “text”, level 6: with or without mark-up, images, sounds? • what do we mean by section, chapter (4) and other non linguistically defined categories (Roustier: “passage”)? • is text itself mutating? Keyness

  19. Internet text Keyness

  20. Wikipedia homepage (part) Keyness

  21. Wikipedia homepage (part) Keyness

  22. Wikipedia article (3 parts of same article) Keyness

  23. Wikipedia discussion • from History of the stall article • latest contributor, “Talk” section Keyness

  24. statistical issues • p value is a well-established standard, relying on the notion of chance, random effects • but • if you run lots of comparisons some will spuriously (by chance) appear significant • if we’re operating at the level of word or cluster, text itself doesn’t consist of randomly ordered words Keyness

  25. Implication • there is no statistical defence of the whole set of KWs • but only of each one • comparing KW p values is not advisable Keyness

  26. Why? Matrix text, describing a series of troubles affecting a set of crops in a certain place. weevils and chickpeas will be much rarer words (if not rarer entities in this particular place) and will float to the top of the KW list Keyness

  27. choosing a reference corpus • using a mixed bag RC, the larger the RC the better but a moderate sized RC may suffice. • the keyword procedure is fairly robust. • KWs identified even by an obviously absurd RC can be plausible indicators of aboutness, which reinforces the conclusion that keyword analysis is robust. • genre-specific RCs identify rather different KWs • the aboutness of a text may not be one thing but numerous different ones. Scott (forthcoming) Keyness

  28. related forms • WordSmith can be asked to treat members of the same lemma as related • and can handle clusters (Biber: lexical bundles) • but otherwise ignores relations such as • synonymy • antonymy • collocation Keyness

  29. status of the KW • not intrinsic to the word/cluster but context-bound • a pointer to specific textual aboutness • and/or style • statistically arrived at but not established • sometimes pointing to a pattern Keyness

  30. status of the set of KWs • indicative of the more general aboutness of the source text(s) • and/or style • but (as a set) not statistically proven Keyness

  31. Shakespeare’s KWs Keyness

  32. KWs of Hamlet • Characters: FORTINBRAS, GERTRUDE, GUILDENSTERN, HAMLET, HAMLET'S,HORATIO, LAERTES, OPHELIA, PYRRHUS, ROSENCRANTZ • Places: DENMARK, NORWAY • Pronouns: I, IT, T, THEE, THOU • Themes, events: MADNESS, PLAY,PLAYERS • Other (“unexpected”): E'EN, LORD, MOST, MOTHER, PHRASE, VERY Keyness

  33. Most of these are obvious & probably uninteresting…. • if you know the play you already know • it concerns Hamlet and some other characters • it’s set in Denmark • Ophelia goes mad. Keyness

  34. … but some are puzzling • Why are IT, LORD and MOST positively key in Hamlet… • if they are negatively key in the other plays? • Which characters are they most key of? • Where are they found, how are these KWs dispersed throughout the play? Keyness

  35. IT in Hamlet (1) • In the plays 0.95% (1 word in 100) but • in Hamlet’s speeches 1.48%: a 50% increase in this one character’s speeches… • in Horatio’s speeches 2.33%: nearly 250% of the average in this one character’s speeches. Keyness

  36. IT in Hamlet (2) • In Hamlet’s speeches, distributed evenly: • In Horatio’s speeches: Keyness

  37. DO in Othello • Nearly twice as frequent as in the other plays • Characteristic of Iago (nearly twice as often) and Desdemona (more than 3 times as often) • DOST characteristic of Othello (more than 6 times as frequent) Keyness

  38. Iago: commanding Keyness

  39. Desdemona: conditional Keyness

  40. Othello’s DOST: questioning – suspicion Keyness

  41. Keyword Clusters • Text-initial sections of • “Hard News” (Guardian 1998-2004) • studying Hoey’s Lexical Priming theory Keyness

  42. Research Questions Using the hard news corpus, • How many 3-5 word clusters are found to be key in TISC sections? • How many are positively and how many are negatively key? • What recurrent patterns can be found in the two types of key cluster? Keyness

  43. RQs 1 & 2: Numbers of KW clusters using a p value of 0.0000001 and minimum frequency of 3 and log likelihood statistic, • 8,132 key clusters altogether (in 3.2 million words of text) • of which 7,631 were positively key • and 501 negatively key though there is repetition as these are 3-5 word n-grams Research Question 2 Keyness

  44. RQ 1: Numbers of KW clusters • Is 8 thousand a large number of distinct key text-initial clusters? • In the same amount of text there are 84 thousand 3-5 word clusters of frequency at least 5 altogether… • about one in 10 is associated with text initial position at the .0000001 level of significance Keyness

  45. RQ 1, continued • … is 1 in 10 a large number to be key? • In the case of SISC (sentences from paragraphs with only one sentence in), we get • 507 thousand clusters, of which • 2,192 are key (1,747 positively and 445 negatively) • which is about 1 in 230 Keyness

  46. IT + reporting verb – positively key IT WAS ANNOUNCED LAST NIGHT IT WAS CLAIMED LAST NIGHT IT WAS CONFIRMED LAST NIGHT IT IS REVEALED TODAY Keyness

  47. IT otherwise negatively key: IT IS A IT IS ABOUT IT IS EXPECTED IT IS GOING IT IS ONLY IT IS POSSIBLE IT SEEMS TO Keyness

  48. Conclusions • keyness is a pointer • to importance • which can be • sub-textual • textual • intertextual Keyness

  49. References • Berber Sardinha, Tony, 1999. Using Key Words in Text Analysis: practical aspects. DIRECT Papers 42, LAEL, Catholic University of São Paulo. • Berber Sardinha, Tony, 2004. Lingüística de Corpus. Barueri: Manole. • Culpeper, J. ,2002. 'Computers, language and characterisation: An Analysis of six characters in Romeo and Juliet'. In: U. Melander-Marttala, C. Östman and M. Kytö (eds.), Conversation in Life and in Literature: Papers from the ASLA Symposium, Association Suedoise de Linguistique Appliquée (ASLA), 15. Universitetstryckeriet: Uppsala, pp.11-30. • Kemppanen, Hannu 2004. Keywords and Ideology in Translated History Texts: A Corpus-based Analysis. Across Languages and Cultures 5 (1), 89-106 • Rigotti, Eddo and Andrea Rocci, 2002. From Argument Analysis to Cultural Keywords (and back again). http://www.ils.com.unisi.ch/articoli-rigotti-rocci-keywords-published.pdf (accessed May 2007). In F. H. van Eemeren et al, Proceedings of the 5th Conference of the International Society for the Study of Argumentation. Amsterdam: SicSat. pp. 903-908. • Scott, M., 1996 with new versions in 1997, 1999, 2004, Wordsmith Tools, Oxford: Oxford University Press. • Scott, M., 1997a. "PC Analysis of Key Words -- and Key Key Words", System, Vol. 25, No. 1, pp. 1-13. • Scott, M., 1997b. "The Right Word in the Right Place: Key Word Associates in Two Languages", AAA - Arbeiten aus Anglistik und Amerikanistik, Vol. 22, No. 2, pp. 239-252. • Scott, M., 2000a. ‘Focusing on the Text and Its Key Words’, in L. Burnard & T. McEnery (eds.), Rethinking Language Pedagogy from a Corpus Perspective, Volume 2. Frankfurt: Peter Lang., pp. 103-122. • Scott, M. 2000b. Reverberations of an Echo, in B. Lewandowska-Tomaszczyk & P.J. Melia (eds.) PALC’99: Practical Applications in Language Corpora. Lodz Studies in Language, Volume 1. Frankfurt: Peter Lang., pp. 49-68. • Scott, M., 2001. ‘Mapping Key Words to Problem and Solution’ in M. Scott & G. Thompson (eds.) Patterns of Text: in honour of Michael Hoey, Amsterdam: Benjamins, pp. 109-127. • Scott, M., 2002. ‘Picturing the key words of a very large corpus and their lexical upshots – or getting at the Guardian’s view of the world’ in B. Kettemann & G. Marko (eds.) Teaching and Learning by Doing Corpus Analysis, Amsterdam: Rodopi, pp. 43-50 and cd-rom within the cover of the book. • Scott, M. 2006. "The Importance of Key Words for LSP" in Arnó Macià, E., A. Soler Cervera & C. Rueda Ramos (eds.), Information Technology in Languages for Specific Purposes: issues and prospects. New York: Springer, pp. 231-243. • Scott. M. (forthcoming) In Search of a Bad Reference Corpus. AHRC Methods Network. • Scott, M. & Tribble, C., 2006. Textual Patterns: keyword and corpus analysis in language education, Amsterdam: Benjamins. • Seale C, Charteris-Black J, Ziebland S. 2006. Gender, cancer experience and internet use: a comparative keyword analysis of interviews and online cancer support groups. Social Science and Medicine. 62, 10: 2577-2590 • Tribble, Chris, 1999, "Genres, keywords, teaching: towards a pedagogic account of the language of project proposals" in L. Burnard & A. McEnery (eds.) Rethinking Language Pedagogy from a Corpus Perspective: Papers from the Third International Conference on Teaching and Language Corpora, (Lodz Studies in Language). Hamburg: Peter Lang. Keyness

More Related