70 likes | 89 Views
Learn how Student's t-distribution was derived in 1908 and explore Fisher's n-D interpretation, essential ideas, and implications on independence. Discover the beauty of data visualization in statistical analysis.
E N D
MV Stats News Vol. 1, Number 4, October 20, 2011 Bringing multivariate data analysis and data visualization to your breakfast table Today’s topic: Early geometric proof rediscovered Fisher uses beautiful n-D interpretation to derive distribution of s and independence from mean Michael Friendly, Staff Reporter Filed: 1/6/2020 12:10 AM
Student (1908): the t distribution • In 1908, W.S. Gosset (“Student”) published “The probable error of the mean” (Biometrika, 6, 1-25) • Established the t-test for small samples • Gave a derivation of the sampling distribution of • Proof required showing:
Student (1908): the t distribution • What Student did: • Computed the first 4 moments of the dist of s2 • Showed that these agreed with those of χ2 • “hence, it is probable that the curve found represents the theoretical distribution of s2; so that we have no actual proof, we shall assume it in what follows” • Independence: Showed only that mean and s2 were uncorrelated (not strictly independent) • Fisher (1939): “This was the most striking gap in his argument” (Why: depends on joint distribution). Independence: requires showing Pr(A,B) ~ Pr(A)*Pr(B)
Fisher (1915): Geometric proof Student → Stratton → Fisher → Pearson In 1912, Fisher wrote to Student, suggesting a geometric proof of the independence of mean, M, and s2 “... the form establishes itself instantly, when the distribution of the sample is viewed geometrically.” Essential idea: In observation space (đn), the 1D line of the mean is orthogonal to the n-1 D space of (xi – M) Pearson initially refused to publish in Biometrika: “I do not follow Mr. Fisher’s proof, and it is not the kind which appeals to me.”
Fisher (1915): these quantities have “an exceedingly beautiful interpretation in generalised space” Note: Independence is established if can show that
From: JA Hanley etal (2008), Student’s z, t, and s: What if Gosset had R, The American Statistician, 62(1), 64-69.
Not enough for Pearson? Corresponding proof for correlation coefficient: n pairs regarded as coords of a point in 2n-D space sample means, variances & covariance have “a beautiful interpretation”