Lecture # 8

Dummy Variables Lecture#8

Nature of “dummy” variable: • Variables that assume such “1” and “0” values • (2) Variables usually indicates the dichotomized • “presence” or “absence”, “yes” or “no”, etc. • (3) Variables indicates a “quality” or an attribute, • such as “male” or “female”, • “black” or “white”, • “urban” or non-urban” • “before” or “after” • “North” or “south”, “east” or “west” • ………..etc.

obs Dummy Dummy Years of Male Female Salary(K) teaching 1 1 0 23 1 2 0 1 19.5 1 3 1 0 24 2 4 0 1 21 2 5 1 0 25 3 6 0 1 22 3 7 1 0 26.5 4 8 0 1 23.1 4 9 0 1 25 5 10 1 0 28 5 11 1 0 29.5 6 12 0 1 26 6 13 0 1 27.5 7 14 1 0 31.5 7 15 0 1 29 6 16 1 0 22 5 17 0 1 19 2 18 1 0 18 2 19 0 1 21.7 5 20 0 1 18.5 2 21 1 0 21 4 22 1 0 20.5 4 23 0 1 17 1 24 0 1 17.5 1 25 1 0 21.2 5

Male Sample: (Gujarati-1995, Table 15.1 & 15.5) Separate male sample

Female sample: (Gujarati-1995, Table 15.1 & 15.5) Separate female sample

^ ^ ^ Salary Y Y = 1 + 2 X (male) 35 30 25 ^ ^ ^ Y = ’1+ ’2X(female) 20 Male Female 15 Linear (Male) X teaching years Linear (Female) 10 0 1 2 3 4 5 6 7 8 Two separate models: Ym = 1 +  2 Xm + um (male) Yf = ’1 + ’2 Xf + uf (female)

Assuming *2 = 2, same slope but different constant between Yi and Xi. 1st model: Yi = 1 + *1 Di + 2 Xi + ui Assuming *2  2, different slope and different constant between Yi and Xi. 2nd model: Yi = 1 + *1 Di + 2 Xi + *2 DiXi + ui Yi = annual salary (each obs.) Xi = years of teaching experience Di = 1 if male = 0 otherwise (female) control variable

Salary Y ^ ^ ^ ^ ^ ^ Y = 1 + 2 X (male) Y = ”1 + ”2 X (whole) 35 30 25 ^ ^ ^ Y = ’1+ ’2X(female) 20 Male Female 15 Two separate models: Ym = 1 + 2 Xm + um (male) Linear (Male) X teaching years Linear (Female) Yf = ’1 + ’2 Xf + uf (female) 10 0 1 2 3 4 5 6 7 8

D1 + D2 = 1 D1 = 1 - D2 Each dummy identify two different categories, but when sum up two dummies it cannot identify which is male or female

If we introduce two dummy variables in one model to identify two categories of one qualitative variable such as Yi = 1+ *1 D1i + **1 D2i + 2 Xi + ui where D1i = 1 if female = 0 otherwise D1 = 1 - D2 where D2i = 1 if male = 0 otherwise or D2 = 1 - D1 or D1 + D2 = 1 ( Perfect collinearity ) This model cannot be estimated because of perfect collinearity between D1 and D2 Caution in the use of Dummy variables (Dummy variable trap)

If a qualitative variable has “m” categories, introduce only “m-1” dummy variables. 1 Qualitative variable age m 1 10 20 30 40 Categories dummy => D1 D2 D3 D4 D5 … Dm-1 Use two dummy variables to identify two different qualitative categories in one model will be fall into the “Trap of perfect multi-collinearity” General rule : To avoid the perfect multicollinearity

When a category is assigned the value of zero, this category is called a control category (or omitted group). 2 Now consider different intercepts of two groups: Model: Yi = 1 + *1 D2i + 2Xi + ui D2i = 1 if male = 0 otherwise, (i.e. female) Measure the estimated result for two groups: Male: ^ ==> Yi = (1 + *1 D2i)+ 2Xi D2i = 1 ^ ^ ^ Female: ==> ^ Yi = 1 + 2Xi D2i = 0 ^ ^

In order to test whether there is any difference in the relationships between two categories Compare: ^ Check the t-statistics ^ If t-statistics is significant in *1, there is different in constant term. =>same 2 means two categories of X have the same relationship with Y ^ ^ Yi = (1 + *1 D)+ 2Xi ^ ^ ^ Yi = 1 + 2Xi ^ ^

If t* > tc ==> reject H0: *1 = 0 H0 : *1 = 0 H1 : *1 > 0 or H1 : *1  0 This part is testing whether any difference in slope of two categories ^ Y = 1 + *1Di+ 2 Xi + *2DiXi ^ ^ ^ ^ Appropriate test is the t-test on *0 ^ = = This part is testing the difference of intercept Check t-statistics Check t-statistics Compare the critical tc(α/2, n-k) and the estimated t*

Female Male Separate Examples for female and male: The two regression results performed differently in slope and intercept. But are they really statistically different? We cannot answer from these two separate regression results unless you test the F*.

Set two dummies for the Example: Table 15.1 +15.5 D2:Male =1 others = 0 D1:Female =1 others = 0 ^ ^ ^ ^ Yi = ( ’1 +”1D1) +2 Xi Yi = (1 + *1 D2)+ 2 Xi ^ ^ ^ ^ = (16.656+1.2810) + 1.561X = (19.937-1.2810) + 1.561X

Whole Sample ^ Yi = 1 + 2Xi ^ ^ = 17.095+1.608Xi

D1: Female =1 = 18.689 + 1.373 Xm Male: Y = 1 + 2Xi ^ ^ ^ Female: Y = (1 + ’1D1)+(2 +’2D1)Xi ^ ^ ^ = 16.255 +1.677 Xf

If D2: Male =1 ^ Female: Y = 1 + 2 Xi ^ ^ Male: Y = (1+ ’1 D2)+(2+ ’2D2)X ^ ^ ^ =16.255 + 1.677 X =18.689 + 1.373 X

2 (Y) (X) (Health care) = 1 + ’1D2 + ’’1D3 + 2Income + u One qualitative variable with more than two categories D2 = 1 if high school education = 0 otherwise D3 = 1 if college education = 0 otherwise

Health care College education Y = (1 + 1” D3)+2 X D3 = 1 ^ ^ ^ ^ High school education Y = (1 + ’1 D2)+ 2X D2 = 1 ^ ^ ^ ^ Less than high school education Y = 1 + 2 X ^ ^ ^ ’1 1 ’’1 ^ ^ ^ income

D2 = 1 High school = 0 otherwise D3 = 1 College = 0 otherwise ========================================= obs Y X D2D3 ========================================= 1 6.000000 40.00000 0.000000 1.000000 2 3.900000 31.00000 1.000000 0.000000 3 1.800000 18.00000 0.000000 0.000000 4 1.900000 19.00000 0.000000 0.000000 5 7.200000 47.00000 0.000000 1.000000 6 3.300000 27.00000 1.000000 0.000000 7 3.100000 26.00000 1.000000 0.000000 8 1.700000 17.00000 0.000000 0.000000 9 6.400000 43.00000 0.000000 1.000000 10 7.900000 49.00000 0.000000 1.000000 11 1.500000 15.00000 0.000000 0.000000 12 3.100000 25.00000 1.000000 0.000000 13 3.600000 29.00000 1.000000 0.000000 14 2.000000 20.00000 0.000000 0.000000 15 6.200000 41.00000 0.000000 1.000000 =========================================

Measuring the estimated results of different groups: Less than high school: Yi = -1.2859 + 0.1722 Xi ^ High school: Yi = (-1.2859 - 0.068 ) + 0.1722 Xi ^ = -1.3539 + 0.1722 X If t value of D2 is statistically significant = -1.2859 + 0.1722 X College: Yi = (-1.2859 + 0.447 ) + 0.1722 Xi ^ = -0.8389 + 0.1722 Xi If t value of D3 is statistically significant = -1.2859 + 0.1722 X If t-test is not statistically significant

Example : An estimate model on three different age’s medical care expenditure Yi = 1 + ’1 D1 + ’’1 D2 + 2 Xi + ui (t-value) (t-value) where D1 = 1 if 55 > age > 25 = 0 otherwise A1 + A2 1 D2 = 1 if age > 55 = 0 otherwise 0 A1 =1 A2 =1 25 55 One Qualitative variable with many categories :

measure the estimated models are : age below 25 Y = 1 + 2 X ^ ^ ^ 25 < age < 55 Y = (1 + ’1D1)+ 2 X ^ ^ ^ ^ age > 55 Y = (1 + ’’1D2)+2 X ^ ^ ^ ^ H0 : ’1 = 0, ’’1 = 0 t1* Compare to tc(α/2, n-k) H1 : ’1  0, ’’1  0 t2* Qualitative variable with many categories :(Cont.)

In scatter diagram : Y Y = ( 1 + ’1)+ 2X Y = ( 1+ ”1)+2X ^ ^ ^ ^ ^ ^ ^ ^ age > 55 25 < age < 55 Y = (1 ) + 2 X ^ ^ ^ age < 25 ’’0 ^ ’0 ^ 0 ^ X

Example : An estimate model on four different age’s medical care expenditure Y = 1 + ’1D1 + ”1D2 + ”’1D3 + 2 X + u D1 = 1 if age > 55 = 0 otherwise where D2 = 1 if 35 < age  55 = 0 otherwise D3 = 1 if 15 < age  35 = 0 otherwise One Qualitative variable with many categories :

Measure the estimated models are : age  15 Y = 1 + 2 X ^ ^ ^ 15 < age  35 Y = (1 + ’’’1D1)+ 2 X Y = (1 + ’1D3) + 2 X Y = (1 + ”2D2)+ 2 X ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 35 < age  55 age > 55 Qualitative variable with many categories :(Cont.)

(Y) Salary = 1 + ’1D1 + ”1 D2 + 2X + u orY= 1+ ’1D1+ ”1D2 + 2X + ’2D1*X + ”2D2*X + u’ D1 = 1 if male = 0 otherwise sex D2 = 1 if white = 0 otherwise race (1) Mean salary for “non-white”female teacher: ^ Y = 1 + 2X that isD1 = 0, D2 = 0 ^ ^ (2) Mean salary for “non-white”male teacher: ^ ^ Y = (1 + ’1 D1) + (2+ ’2D1)Xthat isD1 = 1, D2 = 0 ^ ^ ^ Two qualitative variables

(3) Mean salary for “white”female teacher: ^ ^ ^ Y = (1 + ’’1 D2) + 2 X + ”2D2Xthat isD1 = 0, D2 = 1 ^ (4) Mean salary for “white”male teacher: ^ Y = (1 + ’1 D1 +”1D2)+ (2+ ’2D1+ ”2D2)Xthat isD1 = 1, D2 = 1 ^ ^ ^ ^ ^ ^

Different types of dummy regression: 1. Identical regression: 2. Parallel regression: Y = 1 + 2 X + ’1D + ’2D*X Y = 1 + 2 X + ’1D + ’2D*X H0 : ’1 = 0 and ’2 = 0 D = 1 if 1970-1981 = 0 otherwise (1982-1995) H0 : ’1 = 0 4. Dissimilar regression: 3. Concurrent regression: Y = 1 + 2 X + ’1D + ’2D*X Y = 1 + 2 X + ’1D + ’2D*X H0 : ’1  0 and ’2  0 H0 : ’2 = 0

Y Y B2 1 A2 A2 = B2 B1 1 1 A1 A1 = B1 X A1 B1, A2 = B2 X Identical regressions Parallel regressions (1970-1981): Yt = A1 + A2 Xt + u1t (1982-1995: Yt = B1 + B2 Xt + u2t

Y Y A1 A1 1 B1 B1 1 1 1 B0 A0 = B0 A0 X X A0= B0, A1  B1 A0 B0, A1  B1 Concurrent regressions dissimilar regressions

Interactive effects between the two qualitative variables Spending(Y) = 1 + ’1D1 + ”1D2+ 2 income(X) + u D1 = 1 if female = 0 otherwise sex Interaction effect: D2 = 1 if college graduate = 0 otherwise education Spending(Y) = 1 + ’1D1 + ”1D2 + ’”1D1*D2 + 2income(X) + u ’1 = different effect of being a female ”1= different effect of being a college graduate ”’1 = different effect of being a female with college graduate

Example : how can we test the hypothesis that the gasoline spending is different between a new car and a used car ? Let us assume that at the begin mile, there is no different between used car and new car. ^ ^ gas spending ^ used car Y = 1+ 2X ^ Y ^ ^ ^ Y = 1+ (2 +  ’2)X ^ ^ ^ o o o o o o New car Y = 1 + 2 X o o o o * * * * * * * * * * * 0 ^ X miles running Concurrent model (or Covariance, or Slope shift model)

Let 2= 2 + ’2D where D= 1 if used car = 0 otherwise Now in one model : Yi = 1 + (2 + ’2D) Xi + ui multiplicative dummy variable = 1 + 2 Xi + ’2D*Xi +ui = 1 + 2 Xi + ’2Zi+ ui The estimated relations are : new car : Yi = 1 + 2 Xi ^ ^ ^ == == used car : Yi = 1 + (2 +  ’2D) Xi whereD = 1 ^ ^ ^ ^ or Yi = 1 + 2 Xi ^ ^ ^ ^ If ’2 0, means the estimated slopes for cars is different.

^ Test whether  ’2 = 0 or not ? ^ ^ ^ (i) Compare : (a) Y = 1 + 2 X Two separate models ^ ^ ^ (b) Y = 1 +  2X (ii) use t-test on  ’2:Y = 1 +2 Xi +  ’2Z ^ ^ ^ ^ ^ compare tc(α, N-3) and t* H0 : ’2 = 0 H1 : ’2 > 0 ^ ^ If t* > tc（α, N-3） or (’2  0) => reject H0

…... …... …... …... …... ^ ^ ^ ^ Y = 1 + 2 Xi + ’2Zi Check the t-value

Example: Estimating Seasonal effects : E = 1 + 2 T + u E : electricity consumption T : temperature To capture effect of seasonal factors E = 1 + ’1D1 +”1D2 + ’’’1D3 + 2T + u D2 = 1 if spring 0 otherwise D1 = 1 if winter 0 otherwise D3 = 1 if summer 0 otherwise where Q1 Q2 Q3 Q4 spring summer fall winter Shifts in both intercept and slope

Measure the basic difference of four seasonal results : Fall E = 1 + 2 T ^ ^ ^ Winter E = (1 + ’1)+ 2 T ^ ^ ^ ^ Spring E = (1 + ”1)+ 2 T ^ ^ ^ ^ ^ Summer E = ( 1+”’1)+ 2 T ^ ^ ^ ^ ^ ^ ^ E=(1+ ”’1)+ 2T (Summer) E ^ E = (1 + ”1) + 2 T (Spring) ^ ^ ^ ^ ^ E = (1 + ’1) + 2 T (winter) ^ ^ ^ ’’’1 ^ ’’1 ’1 ^ ^ E = 1 + 2T (Fall) ^ ^ 1 ^ T

Estimating Seasonal effects :(Cont.) Also consider the slope in different seasons Let *2 = 2 + ’2D1 + ’’2D2 + ’’’2D3 Thus, the full general specification is E = [1+ ’1D1 + ”1D2+ ”’1D3] + 2T + ’2 D1 T + ”2D2 T + ”’2D3 T + u Z1 Z2 Z3

Measure the four seasonal results : Fall E = 1 + 2 T ^ ^ ^ Winter E = (1 + ’1)+ (2 + ’2) T ^ ^ ^ ^ ^ Spring E = (1 + ”1)+(2 + ”1) T ^ ^ ^ ^ ^ ^ Summer E = ( 1+”’1)+ (2 + ”’2) T ^ ^ ^ ^ E ^ ^ E=(1+ ”’1)+(2+ ”’2)T(Summer) ^ ^ ^ E = (1 + ”1)+(2 + ”2)T (Spring) ^ ^ ^ ^ ^ E = (1 + ’1)+(2 +’2)T(winter) ^ ^ ^ ^ ^ ^ ’’’1 E = 1 + 2T (Fall) ^ ^ ^ ’’1 ’1 ^ 1 ^ T

D1 = 1 1st Quarter = 0 otherwise D2 = 1 2nd Quarter = 0 otherwise D3 = 1 3rd Quarter = 0 otherwise Quarterly effect is same as seasonal effect Control quarter is the 4th quarter

1. Set the seasonal dummy = 1 if there is the 1st quarter = 0 otherwise

How does the quarterly dummy variable look like?

Basic model 1989 1960 1974 Yt = 1 + 2 Xt + ut Define a dummy variable : D = 1 for the period 1974 onward = 0 otherwise To test whether the structures of two periods are different, the specification must assume that *1 = 1 + ’1 D *2 = 2 + ’2 D Dummy regression: Yt = 1 + ’1 D + 2 Xt + ’2D Xt + ut (2) Structural Test based on Dummy variables

_ Dependent Var. Constant CAPt R2 F RSS n Sample : 60 - 89 unemplt 30.0 -0.293 0.761 93.6 17.15 30 (12.1) (9.7) RSSR ^ Sample : 60 - 73 unemplt 19.64 -0.175 0.59 19.7 4.69 14 (5.9) (4.4) RSS1 ^ Sample : 74 - 89 unemplt 30.63 -0.296 0.871 102.1 3.29 16 (13.1) (10.1) RSS2 ^ Note : t-values are in parentheses The Chow test on the Unemployment rate-capacity utilization rate

For the unrestricted model : RSSu = RSS1 + RSS2 = 4.69 + 3.29 = 7.98 (RSSR - RSSu) / k (17.15 - 7.98) / 2 F* = = = 14.9 RSSu / (T - 2k) 7.98 / (30 - 4) Fc 0.01, k, T -2k = Fc0.01= 5.53 = 3.37 0.05 0.05, 2, 26 Restriction F-test procedures: H0: No structural change H1: yes F* > Fc ==> reject H0

Using the dummy variable to identify the structural change The unemployment rate - capacity utilization rate Sample : 1960 - 1989 Dt = 1 1974 to 1980 = 0 prior to 1974 unempl = 19.6 + 11.0 Dt - 0.175 CAPt - 0.121 (Dt*CAPt) ^ (6.7) (2.7) (5.0) (2.5) _ R2 = 0.88 SEE = 0.554 F = 72.2 n = 30 The estimated of 1960-1973: unempl = 19.6 - 0.175 CAP ^ The estimated of 1974-1980: unempl = (19.6+11.0) - (0.175+0.121)CAP = 30.6 - 0.296 CAP ^

Lecture # 8

Lecture # 8

Presentation Transcript

LECTURE

Lecture 25 Lecture 26

Lecture

Lecture

Lecture VIII Lecture IX

Lecture

Lecture 10 Lecture 10 Lecture 11 Lecture 11 Lecture 11 Lecture 11

Lecture S1: Sample Lecture

Lecture