Choosing the “best” model

Choosing the “best” model (Session 08)

Learning Objectives At the end of this session, you will be able to • use a simple descriptive approach to select of the most appropriate subset of explanatory variables • apply methods of variable selection (based on statistical tests) in a meaningful way to get the “best” model • appreciate the effect on t-probabilities when x’s are added or dropped from a model • understand dangers of using automatic selection procedures

Example of choosing “best” set of x’s Consider data (fictitious) from a retrospective study of patients surviving less than 4 months after being diagnosed as having acute leukaemia. Objective: To identify factors affecting survival time. Variables were: y = survival time (days) after diagnosis x1 = no: of chemotherapy sessions x2 = total volume of blood transfused x3 = no: of days of hospital care x4 = age of patient (years).

Start with a matrix plot

Summary statistics for all regressions How many possible regression models exist? Example with x1 and x3 to show summaries: ---------+--------------------------------------- Source | SS df MS F Prob>F ---------+--------------------------------------- Model | 1488.691 2 744.346 6.07 0.0188 Residual | 1227.072 10 122.707 ---------+--------------------------------------- Total | 2715.763 12 226.314 ---------+--------------------------------------- No. of parameters fitted (p) = 3 R2p = 1488.69 / 2715.07 = 0.5482 Adjusted R2p =1 – 122.71 / 226.31 = 0.4578

Descriptive approach (all regressions)

A descriptive approach… continued Plot R2 versus no. of parameters (p) in model Which model would you select on the basis of these results?

A descriptive approach… continued Alternatively, plot residual mean square. Small residual mean square is good! Which model would you select on the basis of the residual mean square?

An inferential approach… • Use a sequential procedure to select variables that contribute most, and significantly, to the regression model. • Three popular methods exist: • Forward selection • Backward elimination • Stepwise regression

Forward selection … Select the “best” single variable - see slide 6 Ask, “Is it contributing significantly?” Answer: Yes (see below) ----------------------------------------- y | Coef. Std. Err. t P>|t| -------+--------------------------------- x4 | -.73816 .1546 -4.77 0.001 const. | 117.57 5.2622 22.34 0.000 ----------------------------------------- Now consider 2-variable models with x4.

Two-variable models with x4 ----------------------------------------- y | Coef. Std.Err. t P>|t| -------------+--------------------------- x4 | -.61395 .04864 -12.62 0.000 x1 | 1.4400 .13842 10.40 0.000 const.| 103.10 2.1240 48.54 0.000 ----------------------------------------- x4 | -.45694 .69595 -0.66 0.526 x2 | .31090 .74861 0.42 0.687 const.| 94.160 56.627 1.66 0.127 ----------------------------------------- x4 | -.72460 .07233 -10.02 0.000 x3 | -1.1999 .18902 -6.35 0.000 const.| 131.28 3.2748 40.09 0.000 -----------------------------------------

Three-variable models with x4, x1 ----------------------------------------- y | Coef. Std.Err. t P>|t| -------------+--------------------------- x4 | -.23654 .17329 -1.37 0.205 x1 | 1.4519 .11700 12.41 0.000 x2 | .41611 .18561 2.24 0.052 const. | 71.648 14.142 5.07 0.001 ----------------------------------------- x4 | -.64280 .04454 -14.43 0.000 x1 | 1.0519 .22368 4.70 0.001 x3 | -.41004 .19923 -2.06 0.070 const. | 111.68 4.5625 24.48 0.000 ----------------------------------------- Model with x1, x2 and x4 would be selected! - despite x4 now being non-significant!

Backward elimination gives x1,x2 --------------------------------------- y | Coef. Std.Err. t P>|t| -----+--------------------------------- x1 | 1.5511 .74477 2.08 0.071 x2 | .51017 .7238 0.70 0.501 x3 | .10191 .7547 0.14 0.896 x4 | -.14406 .7091 -0.20 0.844 --------------------------------------- x1 | 1.4519 .11700 12.41 0.000 x2 | .41611 .18561 2.24 0.052 x4 | -.23654 .17329 -1.37 0.205 --------------------------------------- x1 | 1.4683 .12130 12.10 0.000 x2 | .66225 .04585 14.44 0.000 ---------------------------------------

Stepwise selection procedure… This is similar to forward selection, but at each stage of the process, all x’s in the model are re-assessed to check if those that entered the model at an earlier stage still remain “important”. Note: Software packages allow automatic use of one of these with pre-specified p-values for selection and deletion of variables. Usually available only with quantitative x’s.

Discussion… in small groups • Look back at results. What do you observe with the forward and backward procedures. Do they give the same results? • Did the selection using forward seem sensible, given that for x4, the p-value =0.205? • Can you work out what model would results with a stepwise selection procedures? • Is it a good idea to use such automatic selection procedures available in software packages? If not, why not?

Discussion continued… Suppose a medical researcher told you that a model without x2 was not meaningful, how would you proceed with your model selection? What other latent (lurking) variables, measurable or non-measurable, might affect y? What further steps would you undertaken before accepting the final model?

Practical work follows to ensure learning objectives are achieved…

Choosing the “best” model

Choosing the “best” model

Presentation Transcript

Choosing and using LEL, PID and NDIR Sensors and Applications

The Keynesian Model

The RC Delay Model for Gates

3.2 Black and Cox Model

Reinventing Your Business Model

Thank you for choosing Ankeney Air-Ways

CHOOSING AN ENTERAL FEEDING FORMULA

Model Checking of Software

GRTS Model Training

MOS Model 11

DOM (Document Object Model)

Voice over IP Fundamentals

ASSURE Model of Instructional Design

Model Checking Lecture 1

DOM (Document Object Model)

SW LIFE -CYCLE MODELS