HSRP 734: Advanced Statistical Methods June 12, 2008

HSRP 734: Advanced Statistical MethodsJune 12, 2008

General Considerations for Multivariable Analyses • Variable Selection • Residuals • Influence diagnostics • Multicollinearity

(2) Candidate Model Selection (1) Preliminary Analysis (4) Collinearity and Influential Observation Detection (3) Assumption Validation Yes (6) Prediction Testing (5) Model Revision No An Effective Modeling Cycle

Overview • Model building: applies outside of Logistic regression • Model diagnostics: specific to Logistic regression

Model Building

Model selection • “Proper model selection rejects a model that is far from reality and attempts to identify a model in which the error of approximation and the error due to random fluctuations are well balanced.” - Shibata, 1989

Model building • Models are just that: approximating models of a truth • How best to quantify approximation? • Depends upon study goals (prediction, explanatory, exploratory)

Principle of Parsimony • “Everything should be made as simple as possible, but no simpler.” – Albert Einstein • Choose a model with “the smallest # of paramters for adequate representation of the data.” – Box & Jenkins

Principle of Parsimony • Bias vs. Variance trade-off as # of variables/parameters increases • Collect sample to learn about population (make inference) • Models are just that: approximating models of a truth • Balance errors of underfitting and overfitting

Why include multiple predictors in a model? • Interaction (effect modification) • Confounding • Increase precision (reduce unexplained variance) • Method of adjustment • Exploratory for unknown correlates

Interpreting Coefficients • When you have more than 1 variable in the model the interpretation is different • Continuous: “β1: For a unit change in X, there is a β1 change in Y, adjusting for the other variables in the model.”

Relationship between Variables Exposure (X) Disease (Y) Third Variable (Z) • Two main complications: • Confounding • Interaction (Effect Modification) Bias Useful Information

Interaction vs. Confounding • Confounding is a BIAS we want to REMOVE • Interaction is a PROPERTY we want to UNDERSTAND • Confounding • Apparent relationship of X (exposure of interest) with Y is distorted due to the relationship of Z (confounder) with X (and Y) • Interaction • Relationship between X and Y differs by the level of Z (when X and Z interact)

Model building • Science vs. Art • Different philosophies • Some agreement on what is worse • Not many agree on a best approach

Model building: Two approaches • Data-based approach • Non-data based

How do you decide what predictor variables to include? Well, what is the goal?

Selecting Predictor Variables • Different goals of analyses: • Estimate and test a treatment group effect • Explore which of a set of predictors are associated with an outcome • Maximize the variation explained/Best predict an outcome

Rule of Model Parsimony • Include just enough variables and no more. • Use a smaller number of variables if they accomplish the same goal (about the same c statisticor precision in treatment effect)

Variable Selection • Mechanics: • Automatic selection based on p-values • Select based on AIC or BIC • Select based on predictive ability • Select based on theoretical or prior literature considerations • Select based on changes in treatment group effects (confounding, interaction, precision)

Data-based: Using p-values • Popular (Remember Johnny from Cobra Kai?) • Selection methods: Forward, Backwards, Stepwise • Bivariate screening, then multivariable on those initially significant

Automatic Selection • Select predictor variables based on p-value cutoff rules • Cutoff rules aren’t necessarily considered at p<0.05 • Three types: Forward, Backwards, Stepwise

Forward Selection • Start off with no predictors in the model • First, add in the most significant variable with p < pcutoff • Next, add in the most significant variable with p < pcutoff , given a model with the 1st variable in the model • Stop when no addition variables have p < pcutoff

Backwards Elimination • Start off with all the predictors in the model • First, remove the least significant variable with p > pcutoff • Next, remove the least significant variable with p > pcutoff , given a reduced model with the 1st variable out of the model • Stop when no addition variables have p > pcutoff

Stepwise Selection • Start with no predictors in the model • 1st step: Proceed with Forward Selection • 2nd step: Add in the most significant variable with p<pFcutoff or remove the least significant variable in the model with p>pBcutoff • Continue until there are no more predictors with p<pFcutoff or p>pBcutoff

Criticisms of P-value based Model Building • Does not incorporate thinking into the problem/automates • Multiple comparisons issue • If multicollinearity is present, selection is made arbitrarily • β’s, SEβ’s are biased (Harrell Jr., 2001) • Test statistics don’t have right distribution (Grambsch, O’Brien, 1991)

Selection methods using p-values • If using these methods there is some preference given to Backwards elimination selection • Some evidence of performing better than Forward selection (Mantel, 1970) • At least initial full model is accurate

Non P-value based Methods • Theoretical Considerations • Prior Literature Considerations • Information Criteria: AIC, BIC

Theoretical Considerations • Adjust for theoretically associated predictors, regardless of p-value • One line of logic is, why would you want to examine the association of P with outcome, without adjusting for T?

Prior Literature Considerations • Adjust for predictors in prior literature, regardless of p-value • One line of logic is, why would you want to examine the association of P with outcome, without adjusting for L? • Example: Outcome=Survival, P=New treatment, L=Patient age

Information Criteria: AIC, BIC • Use non p-value based criteria that maximize the relative fit of competing models • Complex theoretical motivation for IC • Can be used for complicated modeling: non-nested models, functional form of same predictors, etc.

Data-based: Using AIC • AIC is unbiased estimator of the theoretical distance of a model to an unknown true mechanism (that actually generated the data)

Data-based: Using AIC • AIC is unbiased estimator of the theoretical distance of a model to an unknown true mechanism (that actually generated the data) • How is this so??? • If you are really curious…

A Gross Simplification of AIC

Data-based: Using AIC • Useful for selecting best model out of candidate model set (not great if all are poor) • The size of 1 AIC value is not important but rather relative size to other AIC’s • Models need not be nested but have same sample size (Burnham & Anderson, 2002)

Treatment Effect Approach • Adjust for any and all confounders and effect-modifiers (interactions), regardless of p-values • From theoretical & prior literature • Goal is to get most accurate and precise estimate of treatment effect

Model Building for Treatment Effect Goal • If we don’t include confounders or interactions that were important then that could obscure picture of outcome-exposure relationship

Still will consider Parsimony • If we include many covariates (not confounders or interactions) perhaps some will only add “noise” to model • Noise added could obscure picture of outcome-exposure relationship

Data-based: Prediction goal • When Parsimony matters: find most accurate model that is most parsimonious (smallest # of predictors) • When doesn’t matter: pure accuracy = goal at any cost • Example: Quality control • Plausible but not typical

Best Predictive Model Approach • Adjust for any predictors that non-trivially increase c statistic (trivial is subject specific) • P-values are not considered • Goal is to maximize predictive ability of model • Future prediction is utmost; “Manage what you measure”

Book on Model building • Chapters 6, 7 • Basically takes the approach of trying to accurately establish the outcome-exposure relationship

Book recommendations • Multistage strategy: • Determine variables under study from research literature and/or that are clinically or biologically meaningful • Assess interaction prior to confounding • Assess for confounding • Additional considerations for precision

Book recommendations • Use backwards elimination of modeling terms • Retain lower-order terms if higher-orders are significant: • Keep 2 variables if 2-way interaction if significant • Keep lower power terms if highest power is significant

Model building • We will focus on treatment effect goal • Will consider book guidelines

Note about Model Building • Differences between “Best” model and nearest competitors may be small • Ordering among “Very Good” models may not be robust to independent challenges with new data

Note about Model Building • Be careful not to overstate importance of variables included in “Best” model • Remember that “Best” model odds ratios & p-values tend to be biased away from the null • Cross-validation approaches allow estimation of prediction errors associated with variable selection and also provide comparisons between sets of best models

SAS Lab: ICW

Model Diagnostics

After selecting a model • Want to check modeling fit and diagnostics to ensure adequacy • Could be worried about: • Influential data points • Correlated predictor variables • Leaving out variables or using wrong form • Overall model fit and prediction value

Problems to check for • Convergence problems • Model goodness-of-fit • Functional form (confounding, interaction, higher order for continuous) • Multicollinearity • Outlier effects

Convergence problems • SAS usually converges but sometimes will get a message: “There is possibly a quasicomplete separation in the sample points. The ML estimate may not exist. Validity of the model fit is questionable.”

HSRP 734: Advanced Statistical Methods June 12, 2008

HSRP 734: Advanced Statistical Methods June 12, 2008

Presentation Transcript

Advanced Statistics for Interventional Cardiologists

Mutivariate statistical Analysis methods

AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

HSRP 734: Advanced Statistical Methods July 10, 2008

Statistical Analysis

Database Design

Statistical analysis methods

AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

“ Applications of trajectory statistical methods (TSM) ”

Advanced Statistical Methods in NLP Ling 572 March 6, 2012

Statistical Learning Methods

Advanced Statistics for Interventional Cardiologists

Statistical Learning Methods

How Wrong is your Model? Efficient Quantification of Model Risk

Statistical Methods Bayesian methods 4

Scientific Methods 1

HSRP 734: Advanced Statistical Methods July 31, 2008

AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

HSRP 734: Advanced Statistical Methods July 10, 2008

Scientific Methods 1