220 likes | 448 Views
English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE). Sean Wallis UCL. Barber (1964): changes in English grammar. a. A tendency to regularize irregular morphology (e.g. dreamt - dreamed );
E N D
English Corpus LinguisticsIntroducing the Diachronic Corpus of Present-Day Spoken English (DCPSE) Sean Wallis UCL
Barber (1964): changes in English grammar a. A tendency to regularize irregular morphology (e.g. dreamt- dreamed); b. A revival of the “mandative” subjunctive, probably inspired by formal US usage (we demand that she take part in the meeting); c. Elimination of shall as a future marker in the first person; • Development of new, auxiliary-like uses of certain lexical verbs (e.g. get, want – cf., e.g., The way you look, you wanna / want to see a doctor soon); • Extension of the progressive to new constructions, e.g. modal, present perfect and past perfect passive progressive (the road would not be beingbuilt/ has not been being built/ had not been being built before the generalelections); • Increase in the number and types of multi-word verbs (phrasal verbs, have/take/give a ride, etc.); • Placement of frequency adverbs before auxiliary verbs (even if no emphasis is intended – I never have said so); h.Do-support for have (have you any money? and no, I haven’t any money - do you have/ have you got any money? and no, I don’t have any money/ haven’t got any money)…
The Diachronic Corpus of Present-daySpoken English (DCPSE) • Orthographically transcribed spoken BrE • Fully parsed • every ‘sentence’ has a tree diagram • searchable with ICECUP and FTFs • 400,000+ words each from • London-Lund Corpus (aka The ‘Survey Corpus’) • ICE-GB • Balanced by text category • Not evenly distributed by year • LLC: samples from 1958-1977 • ICE-GB: 1990-1992
Tree diagrams A tree diagram for the sentence We’re getting there.
Barber on shall and will • [T]he distinctions formerly made between shall and will are being lost, and will is coming increasingly to be used instead of shall. One reason for this is that in speech we very often say neither [will] nor [shall], but just [’ll]: I’ll see you to-morrow, we’ll meet you at the station, John’ll get it for you. We cannot use this weak form in all positions (not at the end of a phrase, for example), but we use it very often; and, whatever its historical origin may have been (probably from will), we now use it indiscriminately as a weak form for either shall or will; and very often the speaker could not tell you which he had intended. There is thus often a doubt in a speaker’s mind whether will or shall is the appropriate form; and, in this doubt, it is will that is spreading at the expense of shall, presumably because will is used more frequently than shall anyway, and so is likely to be the winner in a levelling process. So people nowadays commonly say or write I will be there, we will all die one day, and so on, when they intend to express simple futurity and not volition. (Barber 1964: 134)
Denison on shall and will • During the latter part of our period [1776-present day] ... in the first person shall has increasingly been replaced by will even where there is no element of volition in the meaning. (Denison 1998: 167)
The use of shall and will in written British and American English from the 1960s and 1990s BrE LOB FLOB LL diff % will 2,798 2,723 1.2 -2.7% shall 355 200 44.3 -43.7% • Figures are normalised per million word frequencies • Log likelihood LL is performed against number of words AmE Brown Frown LL diff % will 2,702 2,402 17.3 -11.1% shall 267 150 33.1 -43.8% From: Mair and Leech (2006: 327)
Mair and Leech’s data • Simply counts tagged lexical tokens • Will = auxiliary verb, includes ’ll • Shall = auxiliary verb • Includes negative forms • Does not distinguish by grammatical position or context • Does not ask whether the choice is available, e.g. limit to first person use • Does not consider subclasses separately • Negative cases: will not/won’t vs. shall not/shan’t? • Do interrogative cases behave differently? • Is written data only • Can we do better than this?
An FTF for first person declarative shall • This FTF is limited to first person cases • The FTF requires that the NP is realised by the pronoun I or we. • Interrogative cases have a different structure • We can subtract negative (shall not)cases to exclude them.
shall will Total c2(shall) c2(will) Summary LLC 110 78 188 1.32 1.45 d% = -30.24% 20.84% ICE-GB 40 58 98 2.53 2.79 = 0.17 TOTAL 150 136 286 3.85 4.24 c2 = 8.09 Shall vs. will • Does the proportion of cases ofshall out of {shall,will}change over time? • ² for first person subject; shall vs will • d% = percentage difference (30% fall in shall between LLC and ICE-GB) • = an estimate of the size of the overall effect (a bit like d%) • c2 = 2x2 chi-square test: is this change statistically significant? c2(shall) = 2x1 goodness of fit test: does shall behave differently to average?
shall will ’ll Total c2(shall) c2(will) c2(’ll) LLC 104 69 371 544 9.98 0.13 2.33 ICE-GB 36 52 365 453 11.98 0.16 2.80 TOTAL 140 121 736 997 21.96 0.30 5.13 Shall vs. will/’ll • Does the proportion of cases ofshall out of {shall,will, ’ll}change over time? • ² for first person subject; shall vs will vs. ’ll c2(shall) = 2x1 goodness of fit test: does shall behave differently to average?
Focusing on choice • We focused on the choice of shall vs. will • Mair and Leech simply said that total cases of shall fell • But this might have happened for other reasons • For example there may have been more opportunities to use shall in the LLC data • Examining choice is a more precise way of conducting experiments than counting frequencies • It allows us to consider what variables (time, genre, other choices) affect the probability of shall being chosen • Probability is a simple fraction from 0 to 1. • p(shall) = F(shall) F(shall) + F(will)+…
Confidence intervals • Probability p(shall): 0 = no cases are of type shall 1 = all cases are of type shall • Our sample is a tiny subset of possible sentences from the same period • So we cannot say a particular observation is certain • Instead we try to estimate our confidence in an observation using error bars or confidence intervals • The more data we have supporting an observation p, the smaller the confidence interval around it • We set a confidence level, typically of 95% • we are 95% sure that the true value is within the interval
Modal meaning • Remember Barber and Denison. Not all cases of shall or will mean the same thing • Root (futurity): • I’ve got some at home so I shall take it home.[DI-A18 #30] • I will answer you in a minute.[DI-B30 #293] • Epistemic (volition): • So I shall have roughly from the twenty-ninth of June to the eighth of July on which I can spend the whole of that time on those two papers. [DL-B01 #62] • It’s certainly my long term hope that I will have some kind of companion...[DI-B53 #0257] • We should examine these choices separately • Unfortunately this means classifying cases manually
Modal meaning: statistics Root % Epistemic % Unclear % Total shall LLC 33 30.84 72 67.29 2 1.87 107 ICE-GB 22 59.46 14 37.84 sig 1 2.70 37 will LLC 44 55.70 28 35.44 7 8.86 79 ICE-GB 37 66.07 14 25.00 5 8.93 56 Total 136 128 sig15 279 • Root shall / will is stable: results are not significant • Epistemic shall / willfalls (d% = -30% 27%) • The fall in shall is not explained by the sharp fall in Epistemic modals overall - from 100 (72+28) to 28 (14+14) • This is evidence that the shift in use in C20 is concentrated within Epistemic meanings, from shall to will. • Barber and Denison: earlier shift was in Root (future) meaning.
Modal meaning: statistics Root % Epistemic % Unclear % Total shall LLC 33 30.84 72 67.29 2 1.87 107 ICE-GB 22 59.46 14 37.84 sig1 2.70 37 will LLC 44 55.70 28 35.44 7 8.86 79 ICE-GB 37 66.07 14 25.00 5 8.93 56 Total 136 128 sig 15 279 • Shall is losing its particular Epistemic meaning as a result • In the LLC data two thirds (67%) of shall uses were Epistemic. • This fell to 37% (just over one third) in ICE-GB.
Conclusions • DCPSE is • orthographically transcribed spoken English • mostly spontaneous • fully parsed and checked by linguists, uses phrase structure grammar based on Quirk et al. • searchable with ICECUP and FTFs • Even lexical studies benefit from parsing • allows us to focus on when a choice occurs • You can use DCPSE to carry out many different experiments on real English • we looked at change over (recent) time • we might also look at how decisions interact
Conclusions • Designing a Corpus Linguistic experiment means thinking carefully about your hypothesis and then attempting to test it against the corpus • We examined the shift from shall to will • We limited it to first person, declarative, positive cases • Changing baselines (including ’ll) may lead to different conclusions • Many corpus studies only consider word baselines (or pmw) • But it is often better to consider proportions of types of clause or phrase, or list specific alternative choices • Alternation (choice) studies aim to hold meaning constant so the speaker/writer is free to choose between both cases: • We focused further by subdividing data by modal meaning
Suggested further reading • On shall vs. will and the progressive: • Aarts, B. Close, J. and Wallis S.A. (forthcoming) Choices over time: methodological issues in investigating current change. In: B. Aarts et al. The changing Verb Phrase, Cambridge: CUP. • www.ucl.ac.uk/english-usage/projects/verb-phrase/book/aartsclosewallis.pdf • Barber, C. (1964) Linguistic Change in Present-Day English. Edinburgh and London: Oliver and Boyd. • Denison, D. (1998) Syntax. In: S. Romaine (ed.). The Cambridge History of the English Language. IV: 1776-1997. Cambridge: Cambridge University Press. 92-329. • Mair, C. and Leech, G. (2006) Current changes in English syntax.In: B. Aarts and A. McMahon (ed.) The Handbook of English Linguistics. Malden MA: Blackwell Publishers. 318-342. • On statistical tests, confidence intervals and other methods: • Wallis, S.A. (2010) z-squared: the origin and use of c2. Survey of English Usage, UCL. • www.ucl.ac.uk/english-usage/statspapers/z-squared.pdf