380 likes | 572 Views
Advanced Statistical Methods: Beyond Linear Regression. John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009. 1. http://www.stat.usu.edu/~jrstevens/pcmi. Obs Flight Temp Damage 1 STS1 66 NO 2 STS9 70 NO 3 STS51B 75 NO
E N D
Advanced Statistical Methods:Beyond Linear Regression John R. Stevens Utah State University Notes 2. Statistical Methods I Mathematics Educators Workshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
Obs Flight Temp Damage 1 STS1 66 NO 2 STS9 70 NO 3 STS51B 75 NO 4 STS2 70 YES 5 STS41B 57 YES 6 STS51G 70 NO 7 STS3 69 NO 8 STS41C 63 YES 9 STS51F 81 NO 10 STS4 80 11 STS41D 70 YES 12 STS51I 76 NO 13 STS5 68 NO 14 STS41G 78 NO 15 STS51J 79 NO 16 STS6 67 NO 17 STS51A 67 NO 18 STS61A 75 YES 19 STS7 72 NO 20 STS51C 53 YES 21 STS61B 76 NO 22 STS8 73 NO 23 STS51D 67 NO 24 STS61C 58 YES What would your students know to do with these data?
Two Sample t-test data: Temp by Damage t = 3.1032, df = 21, p-value = 0.005383 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2.774344 14.047085 sample estimates: mean in group NO mean in group YES 72.12500 63.71429
Does the t-test make sense here? • Traditional: • Treatment Group mean vs. Control Group mean • What is the response variable? • Temperature? [Quantitative, Continuous] • Damage? [Qualitative]
Traditional Statistical Model 1 • Linear Regression: predict continuous response from [quantitative] predictors • Y=weight, X=height • Y=income, X=education level • Y=first-semester GPA, X=parent’s income • Y=temperature, X=damage (0=no, 1=yes) • Can also “control for” other [possibly categorical] factors (“covariates”): • Sex • Major • State of Origin • Number of Siblings
Traditional Statistical Model 2 • Logistic Regression: predict binary response from [quantitative] predictors • Y=‘graduate within 5 years’=0 vs. Y=‘not’=1 • X=first-semester GPA • Y=0 (no damage) vs. Y=1 (damage) • X=temperature • Y=0 (survive) vs. Y=1 (death) • X=dosage (dose-response model) • Can also “control” for other factors, or “covariates” • Race, Sex • Genotype • p = P(Y=1 | relevant factors) = prob. that Y=1, given state of relevant factors
Traditional Dose-Response Model • p = Probability of “death” at dose d: • Look at what affects the shape of the curve, LD50 (lethal dose for 50% efficacy), etc.
“Fitting” the Dose-Response Model • Why “logistic” regression? • β0 = place-holder constant • β1 = effect of “dosage” d • To estimate parameters: • Newton-Raphson iterative process to “maximize the likelihood” of the model • Compare Y=0 (no damage) with Y=1 (damage) groups
Likelihood Function (to be maximized) likelihood for obs. i multiply probabilities (independence)
Estimation by IRLS • Iteratively Reweighted Least Squares • equivalent: Newton-Raphson algorithm for iteratively solving “score” equations
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 15.0429 7.3786 2.039 0.0415 * Temp -0.2322 0.1082 -2.145 0.0320 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
What if the data were even “better”? • Complete separation of points • What should happen toour “slope” estimate?
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 928.9 913821.4 0.001 1 Temp -14.4 14106.7 -0.001 1
Failure? • Shape of likelihood function • Large Standard Errors • Solution only in 2006 • Rather than maximizing likelihood, consider a penalty:
Model fitted by Penalized ML Confidence intervals and p-values by Profile Likelihood coef se(coef) Chisq p (Intercept) 30.4129282 16.5145441 11.35235 0.0007535240 Temp -0.4832632 0.2528934 13.06178 0.0003013835
Dose-response model • Recall simple model: • pij = Pr(Y=1 | dosage level j and genotype level i) • But – when is genotype (covariate Gi) observed?
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.657e+01 8.901e+04 -2.98e-04 1 dose -7.541e-26 1.596e+07 -4.72e-33 1 G1+ -3.386e-28 1.064e+05 -3.18e-33 1 G2B -1.344e-14 1.092e+05 -1.23e-19 1 G2H -3.349e-28 1.095e+05 -3.06e-33 1 dose:G1+ 7.541e-26 1.596e+07 4.72e-33 1 dose:G2B 3.984e-12 3.075e+07 1.30e-19 1 dose:G2H 7.754e-26 2.760e+07 2.81e-33 1 G1+:G2B 1.344e-14 1.465e+05 9.17e-20 1 G1+:G2H 3.395e-28 1.327e+05 2.56e-33 1 dose:G1+:G2B -3.984e-12 3.098e+07 -1.29e-19 1 dose:G1+:G2H -7.756e-26 2.763e+07 -2.81e-33 1 Before we “fix” this, first a little detour …
A Multivariate Gaussian Mixture Component j isMVN(μj,Σj) with proportion πj
A Possible Work-Around • Keys here: • the true group memberships Δ are unknown (latent) • statisticians specialize in unknown quantities
A reasonable approach • 1. Randomly assign group memberships Δ, and estimate group means μj , covariance matrices Σj , and mixing proportions πj • 2. Given those values, calculate (for each obs.) ξj = E[Δj|θ] = P(obs. in group j) • 3. Update estimates for μj , Σj , and πj , weighting each observation by these ξ: • 4. Repeat steps 2 and 3 to convergence
The EM (Baum-Welch) Algorithm- maximization made easier with Zm = latent (unobserved) data; T = (Z,Zm) = complete data • Start with initial guesses for parameters • Expectation: At the kth iteration, compute • Maximization: Obtain estimateby maximizing over • Iterate steps 2 and 3 to convergence ($?)
Beetle Data – Notation • Observed values • Unobserved (latent) values • If Nij had been observed: • How Nij can be [latently] considered:
Likelihood Function • Parameters θ=(p,P) and complete data T=(n,N) • After simplification: • Mechanism of missing data suggests EM algorithm
Missing at Random (MAR) • Necessary assumption for usual EM applications • Covariate x is MAR if probability of observing x does not depend on x or any other unobserved covariate, but may depend on response and other observed covariates (Ibrahim 1990) • Here – genotype is observed only for survivors, and for all subjects at zero dosage
Initialization Step • Two classes of marginal information here • For all dosage levels j – observe • At zero dosage level – observe for genotype i • Allows estimate of Pi • Consider marginal distn. of missing categorical covariate (genotype) • Using zero dosage level: • This is the key – the marginal distribution of the missing categorical covariate
Expectation Step • Dropping “constants” and : • Need to evaluate: (*)
Expectation Step • Bayes Formula: • Multinomial (*)
Expectation Step • For : • Not needed for maximization – only affects EM convergence rate • Direct calculation from multinomial distn. is “possible” – but computationally prohibitive • Need to employ some approximation strategy • Second-order Taylor series about , using Binet’s formula (*)
Expectation Step • Consider Binet’s formula (like Stirling’s): • Have: • Use a second-order Taylor series approximation taken about • as a function of : (*)
Maximization Step • Portion of related to : • Portion of related to : by Lagrange multipliers by Newton-Raphson iterations, with some parameterization (*)
EM Results test statistic for H0: no dosage effect separation of points …
Topics Used Here • Calculus • Differentiation & Integration (including vector differentiation) • Lagrange Multipliers • Taylor Series Expansions • Linear Algebra • Determinants & Eigenvalues • Inverting [computationally/nearly singular] Matrices • Positive Definiteness • Probability • Distributions: Multivariate Normal, Binomial, Multinomial • Bayes Formula • Statistics • Logistic Regression • Separation of Points • [Penalized] Likelihood Maximization • EM Algorithm • Biology – a little time and communication