Application of Inferential Techniques in Statistics Education Research Literature

Application of Inferential Techniques in Statistics Education Research Literature Liu Wenqi, Yap Von Bing Department of Statistics and Applied Probability National University of Singapore

Introduction • Hypothesis tests have come under intense scrutiny. • Halsey, Curran-Everett, Vowler, Drummond (2015) The Fickle P value Generates Irreproducible Results • Amrhein, Greenland, McShane (2019) Scientists Rise up Against Statistical Significance • Hypothesis testing is a major component of statistical inference. The other is estimation. A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Underlying Assumptions • For each test, we teach the assumptions, with varying emphasis on the fundamental Random-sampling: data are generated by a random process specified partially, up to some parameters. • If the random-sampling assumption fails, the inference result is hard to interpret, i.e., may be worthless. This is pure logic, like applying theorems. • Great challenge in practice. A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Random Sampling Assumption • Crucial part of one-sample, two-sample, ANOVA procedures. It is explicitly used to derive distributions of test statistics. • For experiments, it manifests as randomised assignment of enrolled subjects. For observational studies, as random selection of subjects from a well-defined population. • For regression models, it might look unimportant. A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Random Sampling in Regression (1) • Population size N, individual i has two numerical attributes xi and yi. • Suppose the association between x and y is linear. By least-square, there are constants c and m such that for each i, yi = c + mxi + di where the deviations satisfy two conditions. • We regard c and m as population parameters, to be estimated from a sample. A convenience sample can cause wrong inference. A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Random Sampling in Regression (2) • Berk & Freedman (2001) Statistical Assumptions as Empirical Commitments “…regression models are commonly used to analyze convenience samples; as we show later, such analyses are often predicated on random sampling from imaginary populations.” • Implications on inference are similar. A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

The Reproducibility Crisis • Lots of attack on P value particularly and hypothesis testing in general. • Some suggest to use confidence intervals instead. • Little attention paid to the random-sampling assumption, which justifies conventional testing and estimation procedures. A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Lessons from Epidemiology • Usual estimation of risks from cohort studies and odds ratios from case-control studies assumes random samples from well-defined populations. Multiple and logistic regression tacitly rely on random sampling. • One success is smoking, though it can be argued numerous concordant studies counted more than statistical tests. • Less successful: Hormone Replacement Therapy • Before inference: Snow on Cholera A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Education Research • Data collection as challenging as epidemiology, if not more. Randomised trial is usually impossible. • All observational data collection process involves some random elements. Almost all the time, the random process is not well understood. • What do we do? A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Goal & Method • To find out how random sampling is perceived in statistics education research articles that apply inference. • Scanned three volumes of articles published in 2017 • Journal of Statistical Education, 25 & 26 • Statistical Education Research Journal, 16 • Series of screening: Analyse data? Use inference? Discuss sampling assumption? A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Summary As far as we could tell, in none of the 18papers is the random sampling assumption satisfied. Samples were not randomly chosen from well-defined populations. Experimental subjects were not randomly assigned to groups. A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Shinaberger JSE 2017 Components of a Flipped Classroom Influencing Student Success in an Undergraduate Business Statistics Course “The study participants were primarily undergraduate business majors at Coastal Carolina University, a public, 4-year, independent university.” “A linear mixed model was used to fit the data using SPSS MIXED version 2.1” “One limitation of the study is that it relies on a sample of convenience.” A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Hedges SERJ 2017 Statistics Student Performance and Anxiety: Comparisons in Course Delivery and Student Characteristics “Students self-selected their sections and although there were enrollment caps for the face-to-face sections on the room size, none of the sections were full.” Techniques: Chi-square test, Fisher exact test, ANOVA, MANOVA “As with all observational studies, lurking variables may be present that could explain differences in course performance and anxiety between different types of students.” A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Beckman, Delmas & Garfield SERJ 2017 Cognitive Transfer Outcomes for a Simulation-based Introductory Statistics Curriculum “Students were not randomly assigned to the different curriculum groups and the different courses were delivered to entire sections of students. Therefore, statistical analyses were based on a linear mixed effects (LME) model where students were clustered within class section.” A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Beckman, Delmas & Garfield (continued) “Lastly, instructors were not randomly selected for participation in the study, nor were students randomly assigned to each curriculum. Even if researchers originally involved in the CATALST Project attempted to represent a meaningful diversity of instructors and institutions, without random selection there is risk that an important subset of the desired population has not been adequately represented. Furthermore, without randomly assigning students to each curriculum we cannot confidently rule out that any comparison of outcomes between curricula would be meaningfully influenced by confounding variables.” A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Trial of Dr Spock 1969 • The jury was drawn from a panel of 350 persons, of whom only 102 were women, although a majority of the eligible jurors in the district were female. (from Freedman, Pisani, Purves) • What null hypothesis is appropriate here? A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

What Can We Do? • Exploratory Data Analysis & visualisation. • Explain practical significance of findings. • If using inference, set up the model explicitly, listing all substantial assumptions, and evaluate their plausibility. • Replicate study. • Refer to Freedman (2009) Statistical Models and Causal Inference: A Dialogue with the Social Sciences • Teach students carefully. Refer to Freedman, Pisani and Purves (2007) Statistics 4e. A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

FPP p562 Nowadays, tests of significance are extremely popular. One reason is that the tests are part of an impressive and well-developed mathematical theory. Another reason is that many investigators just cannot be bothered to set up chance models. The language of testing makes it easy to bypass the model, and talk about “statistically significant” results. This sounds so impressive, and there is so much mathematical machinery clanking around in the background, that tests seem truly scientific---even when they are complete nonsense. St Exupéry understood this kind of problem very well. When a mystery is too overpowering, one dare not disobey. The Little Prince A Survey on the Application of Inferential Techniques in Statistics Education Research Literature

Application of Inferential Techniques in Statistics Education Research Literature