300 likes | 363 Views
This presentation discusses the network patterns of keyness in texts, exploring how they relate to user needs and interests. Key words, aboutness, and distribution patterns of key words are analyzed, taking into account the importance and fractal nature of key key words. The presentation also delves into machine-identified keyness for corpus-driven research, addressing the dispersion of global and local key words within the text. The concept of key key words as associates within a corpus is explored, along with the measurement of association strength using collocation statistics.
E N D
Networks ofKey Words Mike Scott Aston University INWWCT, Trondheim October 3rd, 2011
Abstract The notion of keyness is important for document retrieval, for language learning and for study of the nature of text. Keyness, a textual not a linguistic quality, may be shared by certain words and phrases in one text, but its patterning is further distributed across text sets of various dimensions in associates (Scott, 1997) and clustering. This presentation considers the network patterns of keyness which can be investigated using quite simple software procedures and the extent to which these patternings may relate to a user’s needs and interests. Scott, M., 1997, "PC Analysis of Key Words -- and Key Key Words", System, Vol. 25, No. 1, pp. 1-13.
Keyness • Aboutness • Distribution patterns of KWs • … in texts and across corpora Key words (KWs) Issues
A fractal is "a rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole,"[1] a property called self-similarity • (Wikipedia) • [1] Mandelbrot, B.B. (1982). The Fractal Geometry of Nature. W.H. Freeman and Company. Fractal
aboutness • importance • a textual category Keyness
what the text is about • what the message is • what it all means • picture from mindreadersdictionary.com aboutness
importance centrality
simple verbatim repetition • no allowance for anaphora, synonymy, antonymy etc. • simple frequency threshold • one word, or more than one? PC Identificationof KWs
Machine-identified keyness is ideal for corpus-driven research • The researcher lets the PC suggest areas needing further chasing up • See recent work by McEnery, Baker, etc. Corpus-based or corpus-driven?
verbs appears begins puts observes replies continues says considers etc. middling burstiness
A "key key-word" is one which is "key" in more than one of a number of related texts. • The more texts it is "key" in, the more "key key" it is. Key Key Words
An "associate" of key-word X is another key-word (Y) which co-occurs with X in a number of texts. • (It may or may not co-occur in proximity to key-word X.) • Association strength measured using a standard collocation statistic, here MI3 Associates
LexisNexis database • 9,444 stories • UK press • 2010 • “climate change” Climate change
KW patterns within individual texts • within the corpus or sub-corpus • but early days, lots of questions: • are any KW patternings fractal? • do specialised corpora have specialised KKWs? Conclusions