E N D
1. HSRP 734: Advanced Statistical MethodsJune 12, 2008
2. General Considerations for Multivariable Analyses
3. An Effective Modeling Cycle
4. Overview Model building: applies outside of Logistic regression
Model diagnostics: specific to Logistic regression
5. Model Building
6. Model selection “Proper model selection rejects a model that is far from reality and attempts to identify a model in which the error of approximation and the error due to random fluctuations are well balanced.”
- Shibata, 1989
7. Model building Models are just that: approximating models of a truth
How best to quantify approximation?
Depends upon study goals (prediction, explanatory, exploratory)
8. Principle of Parsimony “Everything should be made as simple as possible, but no simpler.” – Albert Einstein
Choose a model with “the smallest # of paramters for adequate representation of the data.” – Box & Jenkins
9. Principle of Parsimony Bias vs. Variance trade-off as # of variables/parameters increases
Collect sample to learn about population (make inference)
Models are just that: approximating models of a truth
Balance errors of underfitting and overfitting
10. Why include multiple predictors in a model? Interaction (effect modification)
Confounding
Increase precision (reduce unexplained variance)
Method of adjustment
Exploratory for unknown correlates
11. Interpreting Coefficients When you have more than 1 variable in the model the interpretation is different
Continuous: “ß1: For a unit change in X, there is a ß1 change in Y, adjusting for the other variables in the model.”
12. Relationship between Variables
13. Interaction vs. Confounding Confounding is a BIAS we want to REMOVE
Interaction is a PROPERTY we want to UNDERSTAND
Confounding
Apparent relationship of X (exposure of interest) with Y is distorted due to the relationship of Z (confounder) with X (and Y)
Interaction
Relationship between X and Y differs by the level of Z (when X and Z interact)
14. Model building Science vs. Art
Different philosophies
Some agreement on what is worse
Not many agree on a best approach
15. Model building: Two approaches Data-based approach
Non-data based
16. How do you decide what predictor variables to include?
17. Selecting Predictor Variables
18. Rule of Model Parsimony
19. Variable Selection
20. Data-based: Using p-values Popular
(Remember Johnny from Cobra Kai?)
Selection methods:
Forward, Backwards, Stepwise
Bivariate screening, then multivariable on those initially significant
21. Automatic Selection
22. Forward Selection
23. Backwards Elimination
24. Stepwise Selection
25. Criticisms of P-value based Model Building Does not incorporate thinking into the problem/automates
Multiple comparisons issue
If multicollinearity is present, selection is made arbitrarily
ß’s, SEß’s are biased (Harrell Jr., 2001)
Test statistics don’t have right distribution (Grambsch, O’Brien, 1991)
26. Selection methods using p-values If using these methods there is some preference given to Backwards elimination selection
Some evidence of performing better than Forward selection (Mantel, 1970)
At least initial full model is accurate
27. Non P-value based Methods
28. Theoretical Considerations
29. Prior Literature Considerations
30. Information Criteria: AIC, BIC
31. Data-based: Using AIC AIC is unbiased estimator of the theoretical distance of a model to an unknown true mechanism (that actually generated the data)
32. AIC is unbiased estimator of the theoretical distance of a model to an unknown true mechanism (that actually generated the data)
How is this so???
If you are really curious… Data-based: Using AIC
33. A Gross Simplification of AIC
34. Data-based: Using AIC Useful for selecting best model out of candidate model set (not great if all are poor)
The size of 1 AIC value is not important but rather relative size to other AIC’s
Models need not be nested but have same sample size (Burnham & Anderson, 2002)
35. Treatment Effect Approach
36. Model Building for Treatment Effect Goal If we don’t include confounders or interactions that were important then that could obscure picture of outcome-exposure relationship
37. Still will consider Parsimony If we include many covariates (not confounders or interactions) perhaps some will only add “noise” to model
Noise added could obscure picture of outcome-exposure relationship
38. Data-based: Prediction goal When Parsimony matters: find most accurate model that is most parsimonious (smallest # of predictors)
When doesn’t matter: pure accuracy = goal at any cost
Example: Quality control
Plausible but not typical
39. Best Predictive Model Approach
40. Book on Model building Chapters 6, 7
Basically takes the approach of trying to accurately establish the outcome-exposure relationship
41. Book recommendations Multistage strategy:
Determine variables under study from research literature and/or that are clinically or biologically meaningful
Assess interaction prior to confounding
Assess for confounding
Additional considerations for precision
42. Book recommendations Use backwards elimination of modeling terms
Retain lower-order terms if higher-orders are significant:
Keep 2 variables if 2-way interaction if significant
Keep lower power terms if highest power is significant
43. Model building We will focus on treatment effect goal
Will consider book guidelines
44. Note about Model Building Differences between “Best” model and nearest competitors may be small
Ordering among “Very Good” models may not be robust to independent challenges with new data
45. Note about Model Building Be careful not to overstate importance of variables included in “Best” model
Remember that “Best” model odds ratios & p-values tend to be biased away from the null
Cross-validation approaches allow estimation of prediction errors associated with variable selection and also provide comparisons between sets of best models
46. SAS Lab: ICW
47. Model Diagnostics
48. After selecting a model Want to check modeling fit and diagnostics to ensure adequacy
Could be worried about:
Influential data points
Correlated predictor variables
Leaving out variables or using wrong form
Overall model fit and prediction value
49. Problems to check for Convergence problems
Model goodness-of-fit
Functional form
(confounding, interaction, higher order for continuous)
Multicollinearity
Outlier effects
50. Convergence problems SAS usually converges but sometimes will get a message:
“There is possibly a quasicomplete separation in the sample points. The ML estimate may not exist. Validity of the model fit is questionable.”
51. Convergence problems Quasi-complete separation = occurs whenever there is complete separation except for a single value of the predictor
Complete separation = some linear combination of the predictors perfectly predicts the outcome
Problem is they’re too good!
Example: CHD=1 whenever Gender=Male
52. Quasi-complete separation Typically easy to diagnose. Why?
SAS prints a log warning.
SE’s are gigantic, OR’s or CI’s are extreme.
What to do about it?
53. Quasi-complete separation Options:
If continuous, create groups
If multi-group categorical, collapse groups
If dichotomous, group another way if possible
Drop variable
Drop cases from analyses
54. Diagnostics Modeling fit:
Hosmer & Lemeshow goodness of fit
c statistic (area under ROC curve)
Generalized R-square
Residual analyses:
Examine for outliers in X space (hii’s)
Examine for odd combinations of Y, X
Examine for influential points on ?’s (on all or on specific ones)
55. Hosmer-Lemeshow Goodness-of-fit LACKFIT option in LOGISTIC
Generate predicted probabilities from the fitted model
Group into i intervals (usually 10) based on size and compare to observed frequencies
Calculate a Chi-square statistic with df = # of intervals - 2
56. Considerations for H-L GOF test Is a conservative test
Low power to detect specific types of lack of fit (e.g., nonlinearity in a predictor variable)
Highly dependent on how the observations are grouped
Caution if p-value if large in concluding model is ok
57. Generalized R-square
58. Area under ROC curve: c statistic The Receiver Operating Characteristic (ROC) curve is a plot of the proportion of correctly predicted events (Sensitivity) against 1-proportion of correctly predicted non-events (1-Specificity)
The sharper the initial rise of the ROC curve the better predicting model
59. Area under ROC curve: c statistic The c statistic is the area under the ROC curve and is a statistic that quantifies predictive ability
Examples for c (Ashton, 1995):
Good = 0.831
Bad = 0.493
60. c=0.696
61. Multicollinearity Diagnosing multicollinearity is similar to what was done for regression
This is because it is a problem of the predictor variables
One approach: can just use VIF in an analogous Linear Regression model
Better approach: weight by predicted probabilities in an initial step
62. Multicollinearity
If VIF > 7 attention is warranted
If VIF > 10 indication of multicollinearity
What do you do if you have it?
Combine variables in an index
Consider data reduction (e.g., PCA)
Drop variables
63. hii: extreme points in X space hii’s are the Leverage values; the diagonal values of the Hat matrix
Observations that are unusual in the combination of predictors can be quantified by hii’s
64. Deviance residuals: obs not explained by model well Deviance residuals can identify cases that are not explained well by the model
The sum of the squared deviance residuals is the Deviance = -2lnL
Why not plot di vs. hii ?
65. DFBETAs: influential points on ?’s Measures how much each regression coefficient changes with the ith case deleted
Actual change is divided by the SEß
If one case changes ßK substantially then observation is highly influential
66. C-bar: confidence interval displacement Measure of the overall change in all the coefficients with the ith case deleted
Similar to Cook’s distance in linear regression
If one case changes ß’s substantially then observation is highly influential
67. SAS Lab: ICW
68. Looking ahead Extensions & Advanced methods
Review with Q&A
Exam 1: June 26th