Lecture #7

Lecture #7 Studenmund(2006) Chapter 7 Objective: Applications of Dummy Independent Variables

Qualitative information Gender: male and female Regional: HK Island, Kowloon & NT Zone: East, South, West, North, Center Time/period: peace and war, before & after crisis Age: young, middle, elder Education: Post-graduate, College, High, Element Others:

Example: Gender issue of whether discrimination is existing for salary obs Dummy Dummy Years of Male Female Salary(K) teaching 1 1 0 23 1 2 0 1 19.5 1 3 1 0 24 2 4 0 1 21 2 5 1 0 25 3 6 0 1 22 3 7 1 0 26.5 4 8 0 1 23.1 4 9 0 1 25 5 10 1 0 28 5 11 1 0 29.5 6 12 0 1 26 6 13 0 1 27.5 7 14 1 0 31.5 7 15 0 1 29 6 16 1 0 22 5 17 0 1 19 2 18 1 0 18 2 19 0 1 21.7 5 20 0 1 18.5 2 21 1 0 21 4 22 1 0 20.5 4 23 0 1 17 1 24 0 1 17.5 1 25 1 0 21.2 5

Male Sample: (Gujarati-1995, Table 15.1 & 15.5) Total # obs: 12 Separate sample of male

Total # obs: 13 Separate sample of female Female sample: (Gujarati-1995, Table 15.1 & 15.5)

^ ^ ^ Salary Y Y = 0 + 1 X (male) 35 30 25 ^ ^ ^ Y = *0+ 2X(female) 20 Male Female 15 Linear (Male) X teaching years Linear (Female) 10 0 1 2 3 4 5 6 7 8 Two separate models: Yi = 0 +  1 Xi + i (male) Yj = *0 + 2 Xj + j (female)

Assuming  1 = 2, same slope but different constant between Y and X. 1st model: Yi = 0 + ’0 Di + 1 Xi + i Assuming  1  2, different slope and different constant between Y and X. 2nd model: Yi = 0 + ’0 Di + 1 Xi + ’1 DiXi + i Yi = annual salary Xi = years of teaching experience Di = 1 if male = 0 otherwise (female) control variable

Salary Y ^ ^ ^ ^ ^ ^ Y = 0 +  1X (male) Y = *0 + * 1X (whole) 35 30 25 ^ ^ ^ Y = *0+ 2X(female) 20 Male Female 15 Two separate models: Yi = 0 +  1 Xi + i (male) Linear (Male) X teaching years Linear (Female) Yj = *0 + 2 Xj + j (female) 10 0 1 2 3 4 5 6 7 8

D1 + D2 = 1 D1 = 1 - D2 Each dummy identify two different categories, but when sum up two dummies it cannot identify which is male or female

If we introduce two dummy variables in one model to identify two categories of one qualitative variable such as Yi = 0 + ’0 D1i + ’’0 D2i + 1 Xi + i where D1i = 1 if male = 0 otherwise D1 = 1 - D2 where D2i = 1 if female = 0 otherwise or D2 = 1 - D1 or D1 + D2 = 1 ( Perfect collinearity ) This model cannot be estimated because of perfect collinearity between D1 and D2 (Dummy variable trap)

If a qualitative variable has “m” categories, introduce only “m-1” dummy variables. 1 Qualitative variable age m 1 10 20 30 40 Categories dummy => D1 D2 D3 D4 D5 … Dm-1 Use two dummy variables to identify two different qualitative categories in one model will be fall into the trap of perfect multicollinearity. General rule : To avoid the perfect multicollinearity

When a category is assigned the value of zero, this category is called a control category (or omitted group). 2 Now consider different intercepts of two groups: Model: Yi = 0 + ’0 Di + 1Xi + i Di = 1 if male = 0 otherwise, (i.e. female) Measure the estimated result for two groups: Male: ^ ==> Yi = (0 + ’0 Di)+ 1Xi Di = 1 ^ ^ ^ Female: ==> ^ Yi = 0 + 1Xi Di = 0 ^ ^

In order to test whether there is any difference in the relationships between two categories Compare: ^ Check the t-value ^ If t-statistics is significant in ’0, there is different in constant term. =>same 1 means two categories of X have the same relationship with Y ^ ^ Yi = (0 + ’0 D)+ 1Xi ^ ^ ^ Yi = 0 + 1Xi ^ ^

If t* > tc ==> reject H0: ’0 = 0 H0 : ’0 = 0 H1 : ’0 > 0 or H1 : ’0  0 This part is testing The difference of slope in two categories ^ Y = 0 + ’0 D+ 1 Xi + ’1DX ^ ^ ^ ^ Appropriate test is the t-test on ’0 ^ = = This part is testing the difference of intercept Compare tc and t*, N-K Check t-statistics Check t-statistics  2

Female Male Separate Examples for female and male: The two regression results performed differently in slope and intercept. But are they really statistically different? We cannot answer from these two separate regression results unless you test with the F*.

D2:Male =1 others = 0 D1:Female =1 others = 0 ^ ^ ^ ^ Yi = (0 + ’0 D)+ Xi Yi = (0 + ’0 D)+ Xi ^ ^ ^ ^ = (16.656+1.2810) + 1.561X = (17.937-1.2810) + 1.561X =16.656  If the dummy were significant  =17.937 Set two different dummies for the Example

Whole Sample ^ Yi = 0 + 1Xi ^ ^ = 17.095+1.608X

D1: Female =1 = 18.689 + 1.373 X Male: Y = 0 + 1 Xi ^ ^ =0 =0 ^ Female: Y = (0 + ’0 D)+ (1 + ’1D)X ^ ^ ^ = 16.255 +1.677 X = 18.689 + 1.373 X

D2: Male =1 =16.255 + 1.677 X Female: Y = 0 + 1 Xi ^ ^ =0 =0 Male: Y = (0 + ’0 D)+(1 + ’1D)X ^ =18.689 + 1.373 X =16.255 + 1.677 X ^ ^ ^

2 (Y) (X) (Health care) = 0 + ’0D2 + ’’0D3 + Income +  One qualitative variable with more than two categories D2 = 1 if high school education = 0 otherwise D3 = 1 if college education = 0 otherwise

Health care College education Y = (0 + 0 D’’3)+ X D3 = 1 ^ ^ ^ ^ High school education Y = (0 + ’0 D2)+  X D2 = 1 ^ ^ ^ ^ Less than high school education Y = 0 +  X ^ ^ ^ ’0 0 ’’0 ^ ^ ^ income

D2 = 1 High school = 0 otherwise D3 = 1 College = 0 otherwise ========================================= obs Y X D2D3 ========================================= 1 6.000000 40.00000 0.000000 1.000000 2 3.900000 31.00000 1.000000 0.000000 3 1.800000 18.00000 0.000000 0.000000 4 1.900000 19.00000 0.000000 0.000000 5 7.200000 47.00000 0.000000 1.000000 6 3.300000 27.00000 1.000000 0.000000 7 3.100000 26.00000 1.000000 0.000000 8 1.700000 17.00000 0.000000 0.000000 9 6.400000 43.00000 0.000000 1.000000 10 7.900000 49.00000 0.000000 1.000000 11 1.500000 15.00000 0.000000 0.000000 12 3.100000 25.00000 1.000000 0.000000 13 3.600000 29.00000 1.000000 0.000000 14 2.000000 20.00000 0.000000 0.000000 15 6.200000 41.00000 0.000000 1.000000 =========================================

Less than high school: Yi = -1.2859 + 0.1722 Xi ^ High school: Yi = (-1.2859 - 0.068 ) + 0.1722 Xi ^ = -1.3539 + 0.1722 X When t-value of D2 is statistically significant = -1.2859 + 0.1722 X College: Yi = (-1.2859 + 0.447 ) + 0.1722 Xi ^ = -0.8389 + 0.1722 Xi When t-value of D3 is statistically significant = -1.2859 + 0.1722 X When t-value is not statistically significant

Example : An estimate model on three different age’s medical care expenditure Yi = 0 + ’0 A1 + ’’0 A2 +  Xi + i (t-value) (t-value) where A1 = 1 if 55 > age > 25 = 0 otherwise A1 + A2 1 A2 = 1 if age > 55 = 0 otherwise 0 A1 =1 A2 =1 25 55 One Qualitative variable with many categories :

then the estimated models are : age below 25 Y = 0 +  X ^ ^ ^ 25 < age < 55 Y = (0 + ’0A1)+  X ^ ^ ^ ^ age > 55 Y = (0 + ’’0A2)+ X ^ ^ ^ ^ H0 : ’0 = 0, ’’0 = 0 t1* Compare to tcp, n-k H1 : ’0  0, ’’0  0 t2* Qualitative variable with many categories :(Cont.)

In scatter diagram : Y Y = ( 0 + ’0)+  X Y = ( 0+ ’’0)+ X ^ ^ ^ ^ ^ ^ ^ ^ age > 55 25 < age < 55 Y = ( 0 ) +  X ^ ^ ^ age < 25 ’’0 ^ ’0 ^ 0 ^ X

Example : An estimate model on four different age’s medical care expenditure Y = 0 + ’0A1 + ’’0A2 + ’’’0A3 + 1 X +  A1 = 1 if age > 55 = 0 otherwise where A2 = 1 if 35 < age  55 = 0 otherwise A3 = 1 if 15 < age  35 = 0 otherwise One Qualitative variable with many categories :

The estimated models are : age  15 Y = 0 + 1 X ^ ^ ^ 15 < age  35 Y = (0 + ’’’0A1)+ 1 X Y = (0 + ’0A3) + 1 X Y = (0 + ’’0A2)+ 1 X ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 35 < age  55 age > 55 Qualitative variable with many categories :(Cont.)

(Y) Salary = 0 + ’0D1 + ’’0 D2 + 1 X +  orY= 0+’0D1+ ’’0D2 + 1 X + ’1D1*X + ’’1D2*X + ’ D1 = 1 if male = 0 otherwise sex D2 = 1 if white = 0 otherwise race (1) Mean salary for “black”female teacher: ^ Y = 0 + 1 X that areD1 = 0, D2 = 0 ^ ^ (2) Mean salary for “black”male teacher: ^ ^ Y = (0 + ’0 D1) + (1+ 1D1)Xthat areD1 = 1, D2 = 0 ^ ^ ^ Two qualitative variables

^ ^ Y = (0 + ’0 D0 +’’0D2)+ (1+ ’1D1 + ’’1D2)X that areD1 = 1, D2 = 1 ^ ^ ^ ^ ^ (3) Mean salary for “white”female teacher: ^ ^ ^ Y = (0 + ’’0 D2) + 1 X + 1D2Xthat areD1 = 0, D2 = 1 ^ (4) Mean salary for “white”male teacher:

Different types of dummy regression: 1. Identical regression: 2. Parallel regression: Y = 0 + 1 X + ’0D + ’1D*X Y = 0 + 1 X + ’0D + ’1D*X H0 : ’0 = 0 and ’1 = 0 D = 1 if 1946-1954 = 0 otherwise (1955-1963) H0 : ’1 = 0 4. Dissimilar regression: 3. Concurrent regression: Y = 0 + 1 X + ’0D + ’1D*X Y = 0 + 1 X + ’0D + ’1D*X H0 : ’0  0 and ’1  0 H0 : ’0 = 0

Y Y B1 1 A1 A1 = B1 B0 1 1 A0 A0 = B0 X A0 B0, A1 = B1 X Identical regressions Parallel regressions Reconstruction (46-54): Yt = A0 + A1 Xt + 1t Pastreconstruction (55-63): Yt = B0 + B1 Xt +2t

Y Y A1 A1 1 B1 B1 1 1 1 B0 A0 = B0 A0 X X A0= B0, A1  B1 A0 B0, A1  B1 Concurrent regressions dissimilar regressions

Interactive effects between the two qualitative variables Spending(Y) = 0 + ’0D1 + ’’0D2+ 1 income(X) +  D1 = 1 if female = 0 otherwise sex Interaction effect: D2 = 1 if college graduate = 0 otherwise education Spending(Y) = 0 + ’0D1 + ’’0D2 + ’’’0D1*D2 + 1income(X) +  ’0 = different effect of being a female ’’0= different effect of being a college graduate ’’’0 = different effect of being a female with college graduate

Example : how can we test the hypothesis that the gasoline spending is different between a new car and a used car ? Let us assume that at the begin mile, there is no different between used car and new car. ^ ^ gas spending ^ used car Y = 0+ 1X ^ Y ^ ^ ^ Y = 0+ (1 +  ’1)X ^ ^ ^ o o o o o o New car Y = 0 + 1 X o o o o * * * * * * * * * * * 0 ^ X miles running Concurrent model (or Covariance, or Slope shift model)

Let 1= 1 + ’1D where D= 1 if used car = 0 otherwise Now in one model : Yi = 0 + (1 + ’1D) Xi +i multiplicative dummy variable = 0 + 1 Xi + ’1D*Xi +i = 0 + 1 Xi + ’1Zi+i The estimated relations are : new car : Yi = 0 + 1 Xi ^ ^ ^ == == used car : Yi = 0 + (1 +  ’1D) Xi whereD = 1 ^ ^ ^ ^ or Yi = 0 + 1 Xi ^ ^ ^ ^ If ’1 0, means the estimated slopes for cars is different.

^ Test whether  ’1 = 0 or not ? ^ ^ ^ (i) Compare : (a) Y = 0 + 1 X Two separate models ^ ^ ^ (b) Y = 0 +  1X (ii) use t-test on  ’1:Y = 0 +1 Xi +  ’1 Z ^ ^ ^ ^ ^ compare tcP, N-3 and t* H0 : ’1 = 0 H1 : ’1 > 0 ^ ^ If t* > tcP, N-3 or (’1  0) => reject H0

…... …... …... …... …... ^ ^ ^ ^ Y = 0 + 1 Xi + ’1 Zi Check the t-value

Example: Estimating Seasonal effects : E = 0 + 1 T +  E : electricity consumption T : temperature To capture effect of seasonal factors E = 0 + ’0D1 +’’0D2 + ’’’0D3 + 1T +  D2 = 1 if spring 0 otherwise D1 = 1 if winter 0 otherwise D3 = 1 if summer 0 otherwise where Control group Q1 Q2 Q3 Q4 spring summer fall writer Shifts in both intercept and slope

The estimated models : Fall E = 0 + 1 T ^ ^ ^ Winter E = (0 + ’0)+ (1 + ’1) T ^ ^ ^ ^ ^ Spring E = (0 + ’’0)+(1 + ’’1) T ^ ^ ^ ^ ^ ^ Summer E = (0 ’’’0)+(1 + ’’’1) T ^ ^ ^ ^ E ^ ^ E=(0+ ’’’0)+(1+ ’’’1)T(Summer) ^ ^ ^ E = (0 + ’’0)+(1 + ’’1)T (Spring) ^ ^ ^ ^ ^ E = (0 + ’0)+(1 +’1)T(winter) ^ ^ ^ ^ ^ ^ ’’’0 E = 0 + 1T (Fall) ^ ^ ^ ’’0 ’0 ^ 0 ^ T

Estimating Seasonal effects :(Cont.) Also consider the slope in different seasons Let  = 0 + ’0D1 + ’’0D2 + ’’’0D3 Thus, the full general specification is E = [0+ ’0D1 + ’’0D2+’’’0D3]+1T + ’1D1 T+’’1D2 T + ’’’1D3 T +  Z1 Z2 Z3

D1 = 1 1st Quarter = 0 otherwise D2 = 1 2nd Quarter = 0 otherwise D3 = 1 3rd Quarter = 0 otherwise Quarterly effect is same as seasonal effect Control quarter is the 4th quarter

1. Set the seasonal dummy = 1 if there is the 1st quarter = 0 otherwise

How does the quarterly dummy variable look like?

Basic model 1989 1960 1974 YT = 0 + 1 XT + T Define a dummy variable : D = 1 for the period 1974 onward = 0 otherwise To test whether the structures of two periods are different, the specification must assume that * = 0 + ’0 D * =  1 + ’1 D Dummy regression: YT = 0 + ’0 D +  1 XT + ’1D XT + T (2) Structural Test based on Dummy variables

_ Dependent Var. Constant CAPt R2 F RSS n Sample : 60 - 89 unemplt 30.0 -0.293 0.761 93.6 17.15 30 (12.1) (9.7) RSSR ^ Sample : 60 - 73 unemplt 19.64 -0.175 0.59 19.7 4.69 14 (5.9) (4.4) RSS1 ^ Sample : 74 - 89 unemplt 30.63 -0.296 0.871 102.1 3.29 16 (13.1) (10.1) RSS2 ^ Note : t-values are in parentheses The Chow test on the Unemployment rate-capacity utilization rate

For the unrestricted model : RSSu = RSS1 + RSS2 = 4.69 + 3.29 = 7.98 (RSSR - RSSu) / k+1 (17.15 - 7.98) / 2 F* = = = 14.9 RSSu / (N - 2k-2) 7.98 / (30 - 4) Fc 0.01, k, T -2k = Fc0.01= 5.53 = 3.37 0.05 0.05, 2, 26 Restriction F-test procedures: H0:No structural change H1:Yes F* > Fc ==> reject H0

Using the dummy variable to identify the structural change The unemployment rate - capacity utilization rate Sample : 1960 - 1989 Dt = 1 1974 to 1980 = 0 prior to 1974 unempl = 19.6 + 11.0 Dt - 0.175 CAPt - 0.121 (Dt*CAPt) ^ (6.7) (2.7) (5.0) (2.5) _ R2 = 0.88 SEE = 0.554 F = 72.2 n = 30 The estimated of 1960-1973: unempl = 19.6 - 0.175 CAP ^ The estimated of 1974-1980: unempl = (19.6+11.0) - (0.175+0.121)CAP = 30.6 - 0.296 CAP ^

Observed data Year Ut CAPt Dt Dt*CAPt 60 4.20 5.70 0 0 61 0 0 62 0 0 63 0 0 … … ... 68 0 0 69 0 0 70 0 0 71 0 0 72 0 0 73 0 0 74 1 75 1 76 1 77 1 ... 1 ... 1 ... 1 89 1 D = 1 if t  74 = 0 otherwise …………….…… ……...….…….... 10.5 10.5 11.2 11.2 Ut = 0 + 1 CAPt + ’0Dt + 2 Dt*CAPt

Lecture #7

Lecture #7

Presentation Transcript

LECTURE

Lecture 25 Lecture 26

Lecture

Lecture VIII Lecture IX

Lecture 6 Lecture 7

Lecture 10 Lecture 10 Lecture 11 Lecture 11 Lecture 11 Lecture 11

Lecture: Density (Mikey’s Lecture)

Lecture S1: Sample Lecture