«A chi- square test showed that ...» – or did it really ?

«A chi-square test showedthat...» – or did it really? Bård Uri Jensen http://privat.hihm.no/buj/ bard.jensen@hihm.no

Allowing [statisticalsoftware] to do ourthinking is a sure recipe for disaster. (Good & Hardin, 2012, p. xi) - or did it really?

«Simple» statistical tests • chi-square (X2) test • t-test - or did it really?

Statistical hypothesis testing • Formulate a hypothesis • E.g. In Norwegian L2, Vietnamese have more TENSE errorsthan Somali. • Formulate a null-hypothesis • Vietnamese and Somalis have the same rate of TENSE errors. • «Disprove» the null-hypothesis = demonstrateitsunlikelihood • E.g. less than 5% chance for the null-hypothesis to be true • = «Significance» • Wechooseαaccording to whatweconsider an acceptable risk of false conclusions • Often 5% in linguisticresearch - or did it really?

Conditionsofuse • Independentobservations • chi-square test • t-test • Parametric assumptions • t-test • The dangersofrepeated testing • any test - or did it really?

A simple example from ornithology - or did it really?

A simple example from corpuslinguistics - or did it really?

A simple example from corpuslinguistics • The observationsshould be independent. • An importantconditionofuse for • chi-squared test • t-test • The observationsshould be of different individuals. «Chi-square is a much-abused test in secondlanguageresearch studies, and oftenoneofitsassumptions (thatofindependenceof data) is violated as a matter ofcourse.» Larson-Hall (2010, p.206) - or did it really?

Example 1: Chi-squared test, non-independentobservations • Blom & Paradis 2013 • Journal of Speech, Language, and Hearing Research • On past tense production in L2 children with language impairment • 48 children with English as L2 • Overregularization of past tense • Hypothesis: Less common in verb stems ending in /d/ or /t/ • X 2 (1) = 3.45, p (one-sided) = 0.032 • Problem: n = 85 + 140, N = 48 • Observations are not independent, so the result is invalid. - or did it really?

Example 1: Chi-squared test, non-independentobservations • Solution A: • Pick just oneobservation from eachauthor/speaker • “To exclude the author as one more relevant factor, the database was cleaned so that there is only one example for each verb from any single author.” Sokolova 2012, p. 94 - or did it really?

Example 1: Chi-squared test, non-independentobservations • Solution A: • Pick just oneobservation from eachauthor/speaker • Sokolova 2012 • Solution B: • Calculateaveragevalues for each informant • Usetheaveragevalues as independentobservations • Test significancewith an appropriate test, e.g. t-test or U-test • Gujord 2013 • Boththesesolutionsmightrequire a largercorpus! • «Solution» C: • Alter theresearchquestion • Danckaert 2011 - or did it really?

Example 1: Chi-squared test, non-independentobservations • Solution B: - or did it really?

Example 2:T-test, non-independentobservations • Klavan 2012 • PhDthesis from Tartu University • Investigationofadposition ‘peal’ and adessive case • 450 observationsofeach, from 2 corpora • t = 8.02, p < 0.001 • Conclusion: adessivephrasesare longer than ‘peal’-phrases • Problem: Observationsare not independent. • The conclusion is invalid. - or did it really?

- or did it really?

Example 3: T-test, non-normal populations • Hunter (2011, s. 48) • PhDthesis from Birmingham University • On grammaticalityjudgements by L2 students • Conclusion: • the accuracy (max. = 1) for the teacher group (M = .98, SD = .14) was significantly higher than the student group (M = .64, SD = .49), t(1) = 4.9, p < .001. • Problem: • Mean = 0.98, Maximum value = 1 • Standard deviation= 0.14 • The distribution cannot possibly be normal. • The result is invalid. - or did it really?

- or did it really?

Example 4Repeated testing • Leedham 2011 • PhDthesis, The Open University • Features in thewritingofChinese students in UK universities • Conclusion: • Therearedifferences in frequenciesofcertainphrasesbetween 3rdyear students and younger students • Problem: • Repeated testing withoutadjustingtheprobabilityvalues • Someoftheresultsare not valid. - or did it really?

CV CV - or did it really?

Moral Thereareno simple tests. • Youshould understand theconditionsofthe test. • Youshouldtaketheconditionsintoaccount. • Youshoulddocumentproperly • howyouperformthe test, • whatnumbersyouputinto it, • howtheconditionsare met. «A chi-square test showed that the difference is significant.» - or did it really?

Is it reallythatimportant? • «[C]ompared to othersocialsciences (e.g., psychology, communication, sociology, anthropology, …) or branchesoflinguistics (e.g., psycholinguistics, phonetics, sociolinguistics…), most ofcorpuslinguistics has paradoxicallyonlybegun to developthismethodologicalawareness.» Gries (forthcoming, p.1) - or did it really?

Is it reallythatimportant? • «It has become increasingly apparent over a period of several years that psychologists, taken in the aggregate, employ the chi-square test incorrectly.» Lewis and Burke (1949) - or did it really?

Whoseresponsibility is it? - or did it really?

«Corpus linguistics needs to ‘catch up’ [...]» Gries (forthcoming, p.1) - or did it really?

References (http://privat.hihm.no/buj) Boneau, A. C. (1960). The effects of violations of assumptions underlying the t test. Psychological Bulletin, 57(1), 49-64. Good, P.I. & Hardin, J.W. (2012). Common errors in statistics (and how to avoid them). Hoboken: John Wiley. Gries, S (forthcoming). Quantitative designs and statistical techniques. http://www.linguistics.ucsb.edu/faculty/stgries/research/InProgr_STG_QuantDesAndMethCorpLing_CUPHb.pdf Larson-Hall, J. (2010). A Guide to Doing Statistics in Second Language Research Using SPSS. New York: Routledge. Lewis, D., & Burke, C. J. (1949). The use and misuse of the chi-square test. Psychological Bulletin, 46(6), 433-489. Blom & Paradis (2013). Past Tense Production by English Second Language Learners With and Without Language Impairment. In Journal of Speech, Language, and Hearing Research. 56, 281-294. Danckaert, L. (2011). On the left periphery of Latin embedded clauses. Ph.D. thesis. University of Gent. Gujord, A.H. (2013). Grammatical encoding of past time in L2 Norwegian : The roles of L1 influence and verb semantics. Ph.D. thesis. University of Bergen. Hunter, J.D. (2011). A multi-method investigation of the effectiveness and utility of delayed corrective feedback in second-language oral production. Ph.D. thesis. University of Birmingham. Klavan, j. (2012). Evidence in linguistics : corpus-linguistic and experimental methods for studying grammatical synonymy. Ph.D. thesis. University of Tartu. Leedham, M. (2011). A corpus-driven study of features of Chinese students’ undergraduate writing in UK universities. Ph.D. thesis. The Open University. Sokolova, S. (2012). Asymmetries in Linguistic Construal : Russian Prefixes and the Locative Alternation. Ph.D. thesis. University of Tromsø. - or did it really?

«A chi- square test showed that ...» – or did it really ?

«A chi- square test showed that ...» – or did it really ?

Presentation Transcript

Chapter 12

Expanding Square Search Pattern

Chi-Square

COLLABORATION BOOTCAMP

Square Roots of Perfect Imperfect Squares

PCSSD SPRING 2014 SCHOOL TEST COORDINATORS’ TRAINING

Government Test 2 Review

Test Administration Training

Muscle Contraction

Number System Chapter 1

Chapter

Increasing Throughput in Component Manufacturing

Sure-Vue Urine hCG (Pregnancy) Test

In this session, we explore when to test, what to test and how to test Ajax components.

Simple and multiple regression analysis in matrix form

TEST LINK TRAINING

SCREENING TEST

AIMS Pre-Test Workshop