400 likes | 838 Views
The Armchair and the Machine Corpus-Assisted Discourse Studies Alan Partington Lorient 14/09/07 Corpus-Assisted Discourse Studies ( CADS ) What does CADS do? Examples (politics & media) & Types of research questions / methodologies Teaching material? “two types of linguist”
E N D
The Armchair and the Machine Corpus-Assisted Discourse Studies Alan Partington Lorient 14/09/07
Corpus-Assisted Discourse Studies (CADS) • What does CADS do? • Examples (politics & media) & • Types of research questions / methodologies • Teaching material?
“two types of linguist” the Armchair linguist … “sits in a deep soft comfortable armchair, with his eyes closed and his hands clasped behind his head. Once in a while he opens his eyes, sits up abruptly shouting, “Wow, what a neat fact!”, grabs his pencil, and writes something down. Then he paces around for a few hours in the excitement of having come still closer to knowing what language is really like.” Introspection
“two types of linguist” the Corpus linguist … “has all the primary facts that he needs, in the form of approximately one zillion running words, and he sees his job as that of deriving secondary facts from his primary facts. At the moment he is busy determining the relative frequencies of the eleven parts of speech as the first word of a sentence” Data observation
“two types of linguist” however “These two don’t speak to each other very often, but when they do the corpus linguist says to the armchair linguist, ‘Why should I think that what you tell me is true?’, and the armchair linguist says to the corpus linguist, ‘Why should I think that what you tell me is interesting?’” (Fillmore)
Four stages of science • respect for authority (generally Scripture and Aristotle) • rationalist introspection (Descartes: cogito ergo sum - I introspect therefore I am) • “observationism” and distrust of theory (Bacon: ‘The intellect, left to itself, ought always to be suspected’) • the mutually reinforcing hermeneutic interaction of theory and observation
Four stages of science • respect for authority (generally Scripture and Aristotle) • rationalist introspection (Descartes: cogito ergo sum - I introspect therefore I am) • “observationism” and distrust of theory (Bacon: ‘The intellect, left to itself, ought always to be suspected’) • the mutually reinforcing hermeneutic interaction of theory and observation
Psycho- & Socio- …corpus linguists have so far contributed little to answering classic questions of cognitive and social theory; they have hardly considered the relevance of corpus evidence to questions about the mental lexicon and the construction of the social world (though one of Halliday’s central topics) (Stubbs 2006: 15)
Speculation Stubbs 2006: …could be related …may be reducible… may also be internally related … seems to show … might also provide … show how we could do real ‘ordinary language philosophy’ …
Interdependence: technology & theoryof machine and mind New instruments lead to New ways of observing lead to New ways of thinking
New instruments = grinding of lenses (Galileo, Spinoza) lead to New ways of observing = astronomy lead to New ways of thinking = model of universe
New instruments = radio trasmitter, receiver lead to New ways of observing = radio-telescopy lead to New ways of thinking = theory of creation
New instruments = corpora lead to New ways of observing = inductive data-driven lead to New ways of thinking = lexical grammar
What do CADS do? Investigate (and compare) discourse types(DTs): ‘Non-obvious’ meanings to “not get caught in using corpora just to tell you more about what you know already” (Sinclair 2004: 183)
It combines Corpus Linguistics Data crunching: Statistical OVERVIEW (very quickly) “Quantitative” approach (“general” language dictionaries, grammars) Discourse analysis DETAILED analysis, even single texts “Qualitative” approach
“Traditional” Corpus Linguistics vs CADS
Traditional Corpus Linguistics: • Very large ‘general’ – heterogeneric - corpora: BNC, BoE CADS: • Compile your own ‘specialized’ corpus/corpora • Comparison: Particular features of a discourse type, DT(a)? Compare DT(a) – DT(b) – DT(n) Compare DT(a) – BNC / BoE
Traditional CL: Corpus: “Black box” – Keep out!
CADS: Make friends with our corpus Detailed knowledge of DT: • Frequency Information > Concordancing • Reading / watching / listening to corpus-held DT tokens • Intuitions • “External” data (esp in political – media): interviews with protagonists; official documents;
Beginnings Hardt-Mautner (1995) Stubbs (1996; 2001) Teubert, Mahlberg ITALY: Newspool: Partington, Morley & Haarman (eds) 2004 CorDis: Morley & Bayley (eds) forthcoming Intune
FRANCE “I’ve been doing CADS for years and never knew it” (Geoffrey Williams, Siena 2006)
What’s been done? Berlusconi’s election speeches (Garzone & Santulli 2004) Word lists (WordSmith): Italia; stato; libertà Concordanced
What’s been done? Lo stato when it is run by the Left: autoritario, burocratico, invasivo, moloch, padrone, stato-partito (authoritarian, bureaucratic, invasive, moloch, bossy, a party-state)
What’s been done? Lo stato when treated to the Forza Italia cure becomes: amico, civile, di diritto, liberale, moderno (friend, civilised, lawful, liberal, modern)
What’s been done? Libertà is the third most frequent noun; but it is rarely attached to an individual in the co-text. Whose liberty?
Research question type 1 How does P achieve G with language? What does this tell us about P? Comparative: how do P1 and P2 differ?
C2001 Sept 11-18 2001 150,000 words Times - Independent - Telegraph- Guardian C2002 Sept 11-18 2002 150,000 words Times - Independent - Telegraph- Guardian WordSmithKeywords September 11th
September 11th world (468 - 136): • an attack on the whole civilised world • convinced the world is its enemy • the world will never be the same global dimension, attack on the international community, not just USA
September 11th war (351 - 60) • a totally new kind of war, acts of war, the first war of the 21st century, (or simply) this war Reaction must be: declare war on terrorism, launch an international war
September 11th enemy (106 - 20) • ghostlike global enemy, shadowyenemy, not a clearly definedenemy, absence of a tangibleenemy Collocates: semantic preference forthe unknown
September 11th in- and –un words: inconceivability: • what was once thought inconceivable • an unimaginable tragedy • the unthinkable has happened inexpressibility: • unspeakable horror of today’s inhuman terrorist attacks, unspeakable sadness • untold hundreds ... of dead and injured
September 11th • incalculable, unfathomable • incredible, incredulity • unbearable, intolerable • “…surpassing the collective ability to understand and feel” (Blair)
TYPICAL CADS METHODOLOGY • Step 1: Design, unearth, stumble upon research question • Step 2: Choose, edit or compile an appropriate corpus • Step 3: Choose, edit or compile an appropriate referencecorpus / corpora
TYPICAL CADS METHODOLOGY • Step 4: Run a Keywords comparison of the corpora • Step 5: Determine the existence of setsof key items (by eye and brain) • Step 6: Concordance interesting key items (varying quantities of co-text: sentence, ‘chunk’)