370 likes | 384 Views
This chapter introduces the concept of simple regression analysis, including estimation methods and assumptions. It provides examples of simple and multiple regression models and discusses the objectives of studying these relationships.
E N D
Chapter 3 Simple Regression
What is in this Chapter? • This chapter starts with a linear regression model with one explanatory variable, and states the assumptions of this basic model • It then discusses two methods of estimation: the method of moments (MM) and the method of least squares (LS). • The method of maximum likelihood (ML) is discussed in the appendix
3.1 Introduction Example 1: Simple Regression y = sale x = advertising expenditures Here we try to determine the relationship between sales and advertising expenditures.
3.1 Introduction Example 2: Multiple Regression y = consumption expenditures of a family x1 = family income x2= financial assets of the family x3= family size
3.1 Introduction • There are several objectives in studying these relationships. • They can be used to: 1. Analyze the effects of policies that involve changing the individual x's. In Example 1 this involves analyzing the effect of changing advertising expenditures on sales 2. Forecast the value of y for a given set of x's. 3. Examine whether any of the x's have a significant effect on y.
3.1 Introduction • Given the way we have set up the problem until now, the variable y and the x variables are not on the same footing • Implicitly we have assumed that the x's are variables that influence y or are variables that we can control or change and y is the effect variable. • There are several alternative terms used in the literature for y and x1, x2,..., xk. • These are shown in Table 3.1.
3.1 Introduction Table 3.1 Classification of variables in regression analysis • Presdictand Predictors • Regressand Regressors • Explained variable Explanatory variables • Dependent variable Independent variables • Effect variable Causal variables • Endogenous variable Exogenous variables • Target variable Control variables
3.2 Specification of the Relationships • As mentioned in Section 3.1, we will discuss the case of one explained (dependent) variable, which we denote by y, and one explanatory (independent) variable, which we denote by x. • The relationship between y and x is denoted by y = f(x) Where f(x) is a function of x
3.2 Specification of the Relationships • Going back to equation (3.1), we will assume that the function f(x) is linear in x, that is, • And we will assume that this relationship is a stochastic relationship, that is, Where ,which is called an error or disturbance, has a known probability distribution (i.e., is a random variable).
3.2 Specification of the Relationships • In equation (3.2), is the deterministic component of y and u is the stochastic or random component. • and are called regression coefficients or regression parameters that we estimate from the data on y and x
3.2 Specification of the Relationships • Why should we add an error term u ? • What are the sources of the error term u in equation (3.2)? • There are three main sources:
3.2 Specification of the Relationships • Unpredictable element of randomness in human responses. ex. If y =consumption expenditure of a household and x = disposable income of the household, there is an unpredictable element of randomness in each household's consumption. The household does not behave like a machine.
3.2 Specification of the Relationships • Effect of a large number of omitted variables. • Again in our example x is not the only variable influencing y. The family size, tastes of the family, spending habits, and so on, affect the variable y. • The error u is a catchall for the effects of all these variables, some of which may not even be quantifiable, and some of which may not even be identifiable.
3.2 Specification of the Relationships 3. Measurement error in y. • In our example this refers to measurement error in the household consumption. That is, we cannot measure it accurately.
3.2 Specification of the Relationships • If we have n observations on y and x, we can write equation (3.2) as • Our objective is to get estimates of the unknown parameters and in equation (3.3) given the n observations on y and x.
3.2 Specification of the Relationships • To do this we have to make some assumption about the error terms . The assumptions we make are: • Zero mean. • Common variance. • Independence. and are independent for all
3.2 Specification of the Relationships • Independence of . and are independent for all i and j. This assumption automatically follows if are considered nonrandom variables. With reference to Figure 3.1, what this says is that the distribution of u does not depend on the value of x. • Normality, are normally distributed for all i. In conjunction with assumptions 1, 2, and 3 this implies that are independently and normally distributed with mean zero and a common variance . We write this as
3.2 Specification of the Relationships • These are the assumptions with which we start. We will, however, relax some of these assumptions in later chapters. • Assumption 2 is relaxed in Chapter 5. • Assumption 3 is relaxed in Chapter 6. • Assumption 4 is relaxed in Chapter 9.
3.2 Specification of the Relationships • We will discuss three methods for estimating the parameters and : • 1. The method of moments (MM). • 2. The method of least squares (LS). • 3. The method of maximum likelihood (ML).
3.3 The Method of Moments • The assumptions we have made about the error term u imply that • In the method of moments, we replace these conditions by their sample counterparts. • Let and be the estimators for and , respectively. The sample counterpart of is the estimated error (which is also called the residual), defined as
3.3 The Method of Moments • The two equations to determine and are obtained by replacing population assumptions by their sample counterparts:
3.3 The Method of Moments • In these and the following equations, denotes . Thus we get the two equations • These equations can be written as (noting that )
3.4 The Method of Least Squares • The method of least squares requires that we should choose and as estimates of and , respectively, so that is a minimum. • Q is also the sum of squares of the (within-sample) prediction errors when we predict given and the estimated regression equation. • We will show in the appendix to this chapter that the least squares estimators have desirable optimal properties.
3.4 The Method of Least Squares or or (3.6) and (3.7)
3.4 The Method of Least Squares • Let us define and
3.4 The Method of Least Squares • The residual sum of squares (to be denoted by RSS) is given by
3.4 The Method of Least Squares • But .Hence we have • is usually denoted by TSS (total sum of squares) and is usually denoted by ESS (explained sum of squares). • Thus TSS = ESS + RSS (total) (explained) (residual)
3.4 The Method of Least Squares • The proportion of the total sum of squares explained is denoted by ,where is called the correlation coefficient. • Thus and .If is high (close to 1), then x is a good “explanatory” variable for y. • The term is called the correlation determination and must fall between zero and 1 for any given regression.
3.4 The Method of Least Squares • If is close to zero, the variable x explains very little of the variation in y. If is close to 1, the variable x explains most of the variation in y. • The coefficient of determination is given by
3.9 Alternative Functional Forms for Regression Equations • For instance, for the data points depicted in Figure 3.7(a), where y is increasing more slowly than x, a possible functional form is y = α +β logx. • This is called a semi-log form, since it involves the logarithm of only one of the two variables x and y.
3.9 Alternative Functional Forms for Regression Equations • In this case, if we redefine a variable X = log x, the equation becomes y = α + βX. • Thus we have a linear regression model with the explained variable y and the explanatory variable X = log x.
3.9 Alternative Functional Forms for Regression Equations • For the data points depicted in Figure 3.7(b), where y is increasing faster than x, a possible functional from is . In this case we take logs of both sides and get another kind of semi-log specification: • If we define Y= log y and , we have which is in the form of a linear regression equation.
3.9 Alternative Functional Forms for Regression Equations • An alternative model one can use is • In this case taking logs of both sides, we get • In this case can be interpreted as an elasticity. Hence this form is popular in econometric work. This is call a double-log specification since it involves logarithms of both x and y. Now define Y= log y, X= log x, and .We have which is in the form of a linear regression equation. An illustrative example is given at the end of this section.
3.9 Alternative Functional Forms for Regression Equations • Some other functional forms that are useful when the data points are as shown in Figure 3.8 are or • In the first case we define X=1/x and in the second case we define .In both case the equation is linear in the variables after the transformation.
3.9 Alternative Functional Forms for Regression Equations • Some other nonlinearities can be handled by what is known as “search procedures.” • For instance, suppose that we have the regress equation • The estimates of , ,and are obtained by minimizing