410 likes | 470 Views
Explore the concept of indicator variables in linear regression analysis through practical examples, understanding how qualitative variables impact models’ interpretation. Learn the differences in slope and how to test hypotheses in regression equations.
E N D
Chapter 8Indicator Variables Linear Regression Analysis 5E Montgomery, Peck & Vining
8.1 The General Concept of Indicator Variables • Qualitative variables – also known as categorical variables. Qualitative variables do not have a scale of measurement. • Indicator variables – a variable that assigns levels to the qualitative variable (also known as dummy variables). Linear Regression Analysis 5E Montgomery, Peck & Vining
8.1 The General Concept of Indicator Variables Example • Relate the effective life of a cutting tool (y) used on a lathe to the lathe speed in revolutions per minute (x1) and type of cutting tool used. • Tool type is qualitative and can be represented as: • If a first-order model is appropriate: Linear Regression Analysis 5E Montgomery, Peck & Vining
8.1 The General Concept of Indicator Variables Example For Tool type A this model becomes: For Tool type B this model becomes: • Changing from A to B induces a change in the intercept (slope is unchanged and identical). We assume that the variance is equal for all levels of the qualitative variable. Linear Regression Analysis 5E Montgomery, Peck & Vining
8.1 The General Concept of Indicator Variables Linear Regression Analysis 5E Montgomery, Peck & Vining
8.1 The General Concept of Indicator Variables • For qualitative variables with a levels, we would need a-1 indicator variables. For example, say there were three tool types, A, B, and C. Then two indicator variables (called x2 and x3) will be needed: Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.1 Tool Life Data Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.1 Tool Life Data Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.1 Tool Life Data • The model to be fit is where x2 = 0 indicates Tool type A, if x2 = 1 then Tool type B is used. • The least squares fit is Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.1 Tool Life Data Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.1 Tool Life Data – residual analysis – see plot of residuals versus fitted values in Fig. 8.3 (slide 8) and normal probability plot below: Linear Regression Analysis 5E Montgomery, Peck & Vining
8.1 The General Concept of Indicator Variables • Two separate models could have been fit to the data. • However, the single-model approach is preferred because the analyst has only one final equation to work with instead of two, a much simpler practical result. • Furthermore, since both straight lines are assumed to have the same slope, it makes sense to combine the data from both tool types to produce a single estimate of this common parameter. Linear Regression Analysis 5E Montgomery, Peck & Vining
8.1 The General Concept of Indicator Variables Difference in Slope • If we expect the slopes to differ, we can model this phenomenon by including an interaction term between the variables. • Consider the tool life data again, and say we believe there may be different slopes for the two tools. The model we can fit to account for the change in slope is: 8.4 Linear Regression Analysis 5E Montgomery, Peck & Vining
8.1 The General Concept of Indicator Variables Difference in Slope • If tool type A is used: • If tool type B is used: • Thus, the intercept has shifted and so has the slope. • 2 – change in the intercept caused by changing from type A to type B • 3 – change in the slope caused by changing from type A to type B Linear Regression Analysis 5E Montgomery, Peck & Vining
8.1 The General Concept of Indicator Variables • What we really have are two regression equations; one for tool type A and one for tool type B. If we wanted to test to determine if these two equations are the same, we can use the extra sum of squares method and conduct a test of hypothesis. H0: 2 = 3 = 0 vs. H1: 2 0 and/or 3 0 • The test statistic would be Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.2 The Tool Life Data Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.2 The Tool Life Data Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.3 An Indicator Variable with More Than Two Levels An electric utility is investigating the effect of the size of a single-family house and the type of air conditioning used in the house on the total electricity consumption during warm weather months. Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.3 An Indicator Variable with More Than Two Levels Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.3 An Indicator Variable with More Than Two Levels • In this problem it would seem unrealistic to assume that the slope of the regression function relating mean electricity consumption to the size of the house does not depend on the type of air conditioning system. • For example, we would expect the mean electricity consumption to increase with the size of the house, but the rate of increase should be different for a central air conditioning system than for window units because central air conditioning should be more efficient than window units for larger houses. Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.3 An Indicator Variable with More Than Two Levels • There should be an interaction between the size of the house and the type of air conditioning system. Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.3 An Indicator Variable with More Than Two Levels • The four regression models corresponding to the four types of air conditioning systems are as follows: Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.4 More than Two Indicator Variables • Suppose that in Example 8.1 a second qualitative factor, the type of cutting oil used, must be considered. • Assuming that this factor has two levels, we may define a second indicator variable, x3, as follows: Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.4 More than Two Indicator Variables Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.4 More than Two Indicator Variables Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.4 More than Two Indicator Variables Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.5 Comparing Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.5 Comparing Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.5 Comparing Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.5 Comparing Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 8.5 Comparing Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining
8.2 Some Comments on Indicator Variables • Try to avoid using a specific metric for the levels of the qualitative variable (avoid allocating codes). That is, it is dangerous to assign values to each level such as 1, 2, 3, and 4. Why? • You can substitute indicator variables for quantitative regressors. • Useful if accurate data cannot be readily attained. • Group the data into classes or intervals and then assign indicator variables. • Drawback: loss of information by not using the actual data. Quantitative information is often more useful than qualitative. Linear Regression Analysis 5E Montgomery, Peck & Vining
8.3 Regression Approach to Analysis of Variance • The analysis of varianceis a technique frequently used to analyze data from plannedor designed experiments. • Essentially, any analysis-of-variance problem can be treated as a regression problem in which all of the regressors are indicator variables. Linear Regression Analysis 5E Montgomery, Peck & Vining
8.3 Regression Approach to Analysis of Variance Linear Regression Analysis 5E Montgomery, Peck & Vining
8.3 Regression Approach to Analysis of Variance In the fixed-effects or model I case, the analysis of variance is used to test the hypothesis that all k population means are equal, or equivalently, Linear Regression Analysis 5E Montgomery, Peck & Vining
8.3 Regression Approach to Analysis of Variance Linear Regression Analysis 5E Montgomery, Peck & Vining
8.3 Regression Approach to Analysis of Variance Linear Regression Analysis 5E Montgomery, Peck & Vining
8.3 Regression Approach to Analysis of Variance Linear Regression Analysis 5E Montgomery, Peck & Vining
8.3 Regression Approach to Analysis of Variance Linear Regression Analysis 5E Montgomery, Peck & Vining
8.3 Regression Approach to Analysis of Variance • The analysis of varianceis a technique frequently used to analyze data from plannedor designed experiments. • Essentially, any analysis-of-variance problem can be treated as a regression problem in which all of the regressors are indicator variables. Linear Regression Analysis 5E Montgomery, Peck & Vining