1 / 32

Stats 330: Lecture 32

Stats 330: Lecture 32. Course Summary. The Exam!. 25 multiple choice questions, similar to term test (60% for 330, 50% for 762) 3 “long answer” questions, similar to past exams (STATS 330) You have to do all the multiple choice questions, and 2 out of 3 “long answer” questions

talbot
Download Presentation

Stats 330: Lecture 32

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stats 330: Lecture 32 Course Summary

  2. The Exam! • 25 multiple choice questions, similar to term test (60% for 330, 50% for 762) • 3 “long answer” questions, similar to past exams (STATS 330) You have to do all the multiple choice questions, and 2 out of 3 “long answer” questions • (STATS 762) You have to do a compulsory extra question • Held on am of Wed 31th October 2012

  3. Help Schedule • I will be available in Rm 265 from 10:30 am to 12:00 on Monday, Tuesday and Wednesday in the week before the exam, my schedule permitting • Tutors: as per David Smiths’s email

  4. STATS 330: Course Summary The course was about • Graphics for data analysis • Regression models for data analysis

  5. Graphics Important ideas: • Visualizing multivariate data • Pairs plots • 3d plots • Coplots • Trellis plots • Same scales • Plots in rows and columns • Diagnostic plots for model criticism

  6. Regression models We studied 3 types of regression: • “Ordinary” (normal, least squares) regression for continuous responses • Logistic regression for binomial responses • Poisson regression for count responses (log-linear models)

  7. Normal regression • Response is assumed to be N(m,s2) • Mean is a linear function of the covariates • m = b0 + b1x1 + . . . + bkxk • Covariates can be eithercontinuous or categorical • Observations independent, same variance

  8. Logistic regression • Response ( s “successes” out of n) is assumed to be Binomial Bin(n,p) • Logit of Probability log(p/(1-p))is a linear function of the covariates • log(p/(1-p)) = b0 + b1x1 + . . . + bkxk • Covariates can be eithercontinuous or categorical • Observations independent

  9. Poisson regression • Response is assumed to be Poisson(m) • Log of mean log(m) is a linear function of the covariates (log-linear models) • log(m) = b0 + b1x1 + . . . + bkxk • (Or, equivalently • m = exp(b0 + b1x1 + . . . + bkxk) • Covariates can be eithercontinuous or categorical • Observations independent

  10. Interpretation of b -coefficients • For continuous covariates: • In normal regression, b is the increase in mean response associated with a unit increase in x • In logistic regression, b is the increase in log odds associated with a unit increase in x • In Poisson regression, b is the increase in log mean associated with a unit increase in x • In logistic regression, if x is increased by 1, the odds are increased by a factor of exp(b) • In Poisson regression, if x is increased by 1, the mean is increased by a factor of exp(b)

  11. Interpretation of b -coefficients • For categorical covariates (main effects only): • In normal regression, b is the increase in mean response relative to the baseline • In logistic regression, b is the increase in log odds relative to the baseline • In logistic regression, if we change from baseline to some level, the odds are increased by a factor of exp(parameter for that level) relative to the baseline • In Poisson regression, if we change from baseline to some level, the mean is increased by a factor of exp(parameter for that level) relative to the baseline

  12. Measures of Fit • R2 (for normal regression) • Residual Deviance (for Logistic and Poisson regression) • But not for ungrouped data in logistic, or Poisson with very small means (cell counts)

  13. Prediction • For normal regression, • Predict response at covariates x1, . . . ,xk • Estimate mean response at covariates x1, . . . ,xk • For logistic regression, • estimate log-odds at covariates x1, . . . ,xk • Estimate probability of “success” at covariates x1, . . . ,xk • For Poisson regression, • Estimate mean at covariates x1, . . . ,xk

  14. Inference • Summary table • Estimates of regression coefs • Standard errors • Test stats for coef = 0 • R2 etc (normal regression) • F-test for null model • Null and residual deviances (logistic/Poisson)

  15. Testing model vs sub-model • Use and interpretation of both forms of anova • Comparing model with a sub-model • Adding successive terms to a model

  16. Topics specific to normal regression • Collinearity • VIF’s • Correlation • Added variable plots • Model selection • Stepwise procedures: FS, BE, stepwise • All possible regressions approach • AIC, BIC, CP, adjusted R2, CV

  17. Factors (categorical explanatory variables) • Factors • Baselines • Levels • Factor level combinations • Interactions • Dummy variables • Know how to express interactions in terms of means, means in terms of interactions • Know how to interpret zero interactions

  18. Fitting and Choosing models • Fit a separate plane (mean if no continuous covariates) to each combination of factor levels • Search for a simpler submodel (with some interactions zero) using stepwise and anova

  19. Diagnostics • For non-planar data • Plot res/fitted, res/x’s, partial residual plots, gam plots, box-cox plot • Transform either x’s or response, fit polynomial terms • For unequal variance • Plot res/ fitted, look for funnel effect • Weighted least squares • Transform response

  20. Diagnostics (2) • For outliers and high-leverage points • Hat matrix diagonals • Standardised residuals, • Leave-one-out diagnostics • Independent observations • Acf plots • Residual/previous residual • Time series plot of residuals • Durbin-Watson test

  21. Diagnostics (3) • Normality • Normal plot • Weisberg-Bingham test • Box Cox (select power)

  22. Specifics for Logistic Regression Log-likelihood is

  23. Deviance Deviance = 2(log LMAX - log LMOD) • log LMAX: replace p’s with frequencies si/ni • log LMOD: replace p’s with estimated p’s from logistic model i.e. Can’t use as goodness of fit measure in ungrouped case

  24. Odds and log-odds • Probabilities p: Pr(Y=1) • Odds p /(1- p) • Log-odds: log p/(1- p)

  25. Residuals • Pearson • Deviance

  26. Topics specific to Poisson regression • Offsets • Interpretation of regression coefficients • (same as for odds in logistic regression) • Correspondence between Poisson regression (Log-linear models) and the multinomial model for contingency tables • The “Poisson trick”

  27. Contingency tables • Cells 1,2,…, m • Cell probabilities p1, . . . , pm • Counts y1, . . . , ym • Log-likelihood is

  28. Contingency tables (2) • A “model” for the table is anything that specifies the form of the probabilities, possibly up to k unknown parameters • Test if the model is OK by • Calculate Deviance = 2(log LMAX - log LMOD) log LMAX: replace p’s with table frequencies log LMOD: replace p’s with estimated p’s from the model • Model OK if deviance is small,(p-value > 0.05) • Degrees of freedom m - 1 - k • k = number of parameters in the model

  29. Independence models • Correspond to interactions being zero • Fit a “saturated” model using Poisson regression • Use anova, stepwise to see which interactions are zero • Identify the appropriate model • Models can be represented by graphs

  30. Odds Ratios • Definition and interpretation • Connection to independence • Connection with interactions • Relationship between conditional OR’s and interactions • Homogeneous association model

  31. Association graphs • Each node is a factor • Factors joined by lines if an interaction between them • Interpretation in terms of conditional independence • Interpretation in terms of collapsibility

  32. Contingency tables: final topics • Association reversal • Simpson’s paradox • When can you collapse • Product multinomial • comparing populations • populations the same if certain interactions are zero • Goodness of fit to a distribution • Special case of 1-dimensional table

More Related