680 likes | 808 Views
Lecture #7. Studenmund(2006) Chapter 7. Objective:. Applications of Dummy Independent Variables. Qualitative information. Gender: male and female Regional: HK Island, Kowloon & NT Zone: East, South, West, North, Center Time/period: peace and war, before & after crisis
E N D
Lecture #7 Studenmund(2006) Chapter 7 Objective: Applications of Dummy Independent Variables
Qualitative information Gender: male and female Regional: HK Island, Kowloon & NT Zone: East, South, West, North, Center Time/period: peace and war, before & after crisis Age: young, middle, elder Education: Post-graduate, College, High, Element Others:
Example: Gender issue of whether discrimination is existing for salary obs Dummy Dummy Years of Male Female Salary(K) teaching 1 1 0 23 1 2 0 1 19.5 1 3 1 0 24 2 4 0 1 21 2 5 1 0 25 3 6 0 1 22 3 7 1 0 26.5 4 8 0 1 23.1 4 9 0 1 25 5 10 1 0 28 5 11 1 0 29.5 6 12 0 1 26 6 13 0 1 27.5 7 14 1 0 31.5 7 15 0 1 29 6 16 1 0 22 5 17 0 1 19 2 18 1 0 18 2 19 0 1 21.7 5 20 0 1 18.5 2 21 1 0 21 4 22 1 0 20.5 4 23 0 1 17 1 24 0 1 17.5 1 25 1 0 21.2 5
Male Sample: (Gujarati-1995, Table 15.1 & 15.5) Total # obs: 12 Separate sample of male
Total # obs: 13 Separate sample of female Female sample: (Gujarati-1995, Table 15.1 & 15.5)
^ ^ ^ Salary Y Y = 0 + 1 X (male) 35 30 25 ^ ^ ^ Y = *0+ 2X(female) 20 Male Female 15 Linear (Male) X teaching years Linear (Female) 10 0 1 2 3 4 5 6 7 8 Two separate models: Yi = 0 + 1 Xi + i (male) Yj = *0 + 2 Xj + j (female)
Assuming 1 = 2, same slope but different constant between Y and X. 1st model: Yi = 0 + ’0 Di + 1 Xi + i Assuming 1 2, different slope and different constant between Y and X. 2nd model: Yi = 0 + ’0 Di + 1 Xi + ’1 DiXi + i Yi = annual salary Xi = years of teaching experience Di = 1 if male = 0 otherwise (female) control variable
Salary Y ^ ^ ^ ^ ^ ^ Y = 0 + 1X (male) Y = *0 + * 1X (whole) 35 30 25 ^ ^ ^ Y = *0+ 2X(female) 20 Male Female 15 Two separate models: Yi = 0 + 1 Xi + i (male) Linear (Male) X teaching years Linear (Female) Yj = *0 + 2 Xj + j (female) 10 0 1 2 3 4 5 6 7 8
D1 + D2 = 1 D1 = 1 - D2 Each dummy identify two different categories, but when sum up two dummies it cannot identify which is male or female
If we introduce two dummy variables in one model to identify two categories of one qualitative variable such as Yi = 0 + ’0 D1i + ’’0 D2i + 1 Xi + i where D1i = 1 if male = 0 otherwise D1 = 1 - D2 where D2i = 1 if female = 0 otherwise or D2 = 1 - D1 or D1 + D2 = 1 ( Perfect collinearity ) This model cannot be estimated because of perfect collinearity between D1 and D2 (Dummy variable trap)
If a qualitative variable has “m” categories, introduce only “m-1” dummy variables. 1 Qualitative variable age m 1 10 20 30 40 Categories dummy => D1 D2 D3 D4 D5 … Dm-1 Use two dummy variables to identify two different qualitative categories in one model will be fall into the trap of perfect multicollinearity. General rule : To avoid the perfect multicollinearity
When a category is assigned the value of zero, this category is called a control category (or omitted group). 2 Now consider different intercepts of two groups: Model: Yi = 0 + ’0 Di + 1Xi + i Di = 1 if male = 0 otherwise, (i.e. female) Measure the estimated result for two groups: Male: ^ ==> Yi = (0 + ’0 Di)+ 1Xi Di = 1 ^ ^ ^ Female: ==> ^ Yi = 0 + 1Xi Di = 0 ^ ^
In order to test whether there is any difference in the relationships between two categories Compare: ^ Check the t-value ^ If t-statistics is significant in ’0, there is different in constant term. =>same 1 means two categories of X have the same relationship with Y ^ ^ Yi = (0 + ’0 D)+ 1Xi ^ ^ ^ Yi = 0 + 1Xi ^ ^
If t* > tc ==> reject H0: ’0 = 0 H0 : ’0 = 0 H1 : ’0 > 0 or H1 : ’0 0 This part is testing The difference of slope in two categories ^ Y = 0 + ’0 D+ 1 Xi + ’1DX ^ ^ ^ ^ Appropriate test is the t-test on ’0 ^ = = This part is testing the difference of intercept Compare tc and t*, N-K Check t-statistics Check t-statistics 2
Female Male Separate Examples for female and male: The two regression results performed differently in slope and intercept. But are they really statistically different? We cannot answer from these two separate regression results unless you test with the F*.
D2:Male =1 others = 0 D1:Female =1 others = 0 ^ ^ ^ ^ Yi = (0 + ’0 D)+ Xi Yi = (0 + ’0 D)+ Xi ^ ^ ^ ^ = (16.656+1.2810) + 1.561X = (17.937-1.2810) + 1.561X =16.656 If the dummy were significant =17.937 Set two different dummies for the Example
Whole Sample ^ Yi = 0 + 1Xi ^ ^ = 17.095+1.608X
D1: Female =1 = 18.689 + 1.373 X Male: Y = 0 + 1 Xi ^ ^ =0 =0 ^ Female: Y = (0 + ’0 D)+ (1 + ’1D)X ^ ^ ^ = 16.255 +1.677 X = 18.689 + 1.373 X
D2: Male =1 =16.255 + 1.677 X Female: Y = 0 + 1 Xi ^ ^ =0 =0 Male: Y = (0 + ’0 D)+(1 + ’1D)X ^ =18.689 + 1.373 X =16.255 + 1.677 X ^ ^ ^
2 (Y) (X) (Health care) = 0 + ’0D2 + ’’0D3 + Income + One qualitative variable with more than two categories D2 = 1 if high school education = 0 otherwise D3 = 1 if college education = 0 otherwise
Health care College education Y = (0 + 0 D’’3)+ X D3 = 1 ^ ^ ^ ^ High school education Y = (0 + ’0 D2)+ X D2 = 1 ^ ^ ^ ^ Less than high school education Y = 0 + X ^ ^ ^ ’0 0 ’’0 ^ ^ ^ income
D2 = 1 High school = 0 otherwise D3 = 1 College = 0 otherwise ========================================= obs Y X D2D3 ========================================= 1 6.000000 40.00000 0.000000 1.000000 2 3.900000 31.00000 1.000000 0.000000 3 1.800000 18.00000 0.000000 0.000000 4 1.900000 19.00000 0.000000 0.000000 5 7.200000 47.00000 0.000000 1.000000 6 3.300000 27.00000 1.000000 0.000000 7 3.100000 26.00000 1.000000 0.000000 8 1.700000 17.00000 0.000000 0.000000 9 6.400000 43.00000 0.000000 1.000000 10 7.900000 49.00000 0.000000 1.000000 11 1.500000 15.00000 0.000000 0.000000 12 3.100000 25.00000 1.000000 0.000000 13 3.600000 29.00000 1.000000 0.000000 14 2.000000 20.00000 0.000000 0.000000 15 6.200000 41.00000 0.000000 1.000000 =========================================
Less than high school: Yi = -1.2859 + 0.1722 Xi ^ High school: Yi = (-1.2859 - 0.068 ) + 0.1722 Xi ^ = -1.3539 + 0.1722 X When t-value of D2 is statistically significant = -1.2859 + 0.1722 X College: Yi = (-1.2859 + 0.447 ) + 0.1722 Xi ^ = -0.8389 + 0.1722 Xi When t-value of D3 is statistically significant = -1.2859 + 0.1722 X When t-value is not statistically significant
Example : An estimate model on three different age’s medical care expenditure Yi = 0 + ’0 A1 + ’’0 A2 + Xi + i (t-value) (t-value) where A1 = 1 if 55 > age > 25 = 0 otherwise A1 + A2 1 A2 = 1 if age > 55 = 0 otherwise 0 A1 =1 A2 =1 25 55 One Qualitative variable with many categories :
then the estimated models are : age below 25 Y = 0 + X ^ ^ ^ 25 < age < 55 Y = (0 + ’0A1)+ X ^ ^ ^ ^ age > 55 Y = (0 + ’’0A2)+ X ^ ^ ^ ^ H0 : ’0 = 0, ’’0 = 0 t1* Compare to tcp, n-k H1 : ’0 0, ’’0 0 t2* Qualitative variable with many categories :(Cont.)
In scatter diagram : Y Y = ( 0 + ’0)+ X Y = ( 0+ ’’0)+ X ^ ^ ^ ^ ^ ^ ^ ^ age > 55 25 < age < 55 Y = ( 0 ) + X ^ ^ ^ age < 25 ’’0 ^ ’0 ^ 0 ^ X
Example : An estimate model on four different age’s medical care expenditure Y = 0 + ’0A1 + ’’0A2 + ’’’0A3 + 1 X + A1 = 1 if age > 55 = 0 otherwise where A2 = 1 if 35 < age 55 = 0 otherwise A3 = 1 if 15 < age 35 = 0 otherwise One Qualitative variable with many categories :
The estimated models are : age 15 Y = 0 + 1 X ^ ^ ^ 15 < age 35 Y = (0 + ’’’0A1)+ 1 X Y = (0 + ’0A3) + 1 X Y = (0 + ’’0A2)+ 1 X ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 35 < age 55 age > 55 Qualitative variable with many categories :(Cont.)
(Y) Salary = 0 + ’0D1 + ’’0 D2 + 1 X + orY= 0+’0D1+ ’’0D2 + 1 X + ’1D1*X + ’’1D2*X + ’ D1 = 1 if male = 0 otherwise sex D2 = 1 if white = 0 otherwise race (1) Mean salary for “black”female teacher: ^ Y = 0 + 1 X that areD1 = 0, D2 = 0 ^ ^ (2) Mean salary for “black”male teacher: ^ ^ Y = (0 + ’0 D1) + (1+ 1D1)Xthat areD1 = 1, D2 = 0 ^ ^ ^ Two qualitative variables
^ ^ Y = (0 + ’0 D0 +’’0D2)+ (1+ ’1D1 + ’’1D2)X that areD1 = 1, D2 = 1 ^ ^ ^ ^ ^ (3) Mean salary for “white”female teacher: ^ ^ ^ Y = (0 + ’’0 D2) + 1 X + 1D2Xthat areD1 = 0, D2 = 1 ^ (4) Mean salary for “white”male teacher:
Different types of dummy regression: 1. Identical regression: 2. Parallel regression: Y = 0 + 1 X + ’0D + ’1D*X Y = 0 + 1 X + ’0D + ’1D*X H0 : ’0 = 0 and ’1 = 0 D = 1 if 1946-1954 = 0 otherwise (1955-1963) H0 : ’1 = 0 4. Dissimilar regression: 3. Concurrent regression: Y = 0 + 1 X + ’0D + ’1D*X Y = 0 + 1 X + ’0D + ’1D*X H0 : ’0 0 and ’1 0 H0 : ’0 = 0
Y Y B1 1 A1 A1 = B1 B0 1 1 A0 A0 = B0 X A0 B0, A1 = B1 X Identical regressions Parallel regressions Reconstruction (46-54): Yt = A0 + A1 Xt + 1t Pastreconstruction (55-63): Yt = B0 + B1 Xt +2t
Y Y A1 A1 1 B1 B1 1 1 1 B0 A0 = B0 A0 X X A0= B0, A1 B1 A0 B0, A1 B1 Concurrent regressions dissimilar regressions
Interactive effects between the two qualitative variables Spending(Y) = 0 + ’0D1 + ’’0D2+ 1 income(X) + D1 = 1 if female = 0 otherwise sex Interaction effect: D2 = 1 if college graduate = 0 otherwise education Spending(Y) = 0 + ’0D1 + ’’0D2 + ’’’0D1*D2 + 1income(X) + ’0 = different effect of being a female ’’0= different effect of being a college graduate ’’’0 = different effect of being a female with college graduate
Example : how can we test the hypothesis that the gasoline spending is different between a new car and a used car ? Let us assume that at the begin mile, there is no different between used car and new car. ^ ^ gas spending ^ used car Y = 0+ 1X ^ Y ^ ^ ^ Y = 0+ (1 + ’1)X ^ ^ ^ o o o o o o New car Y = 0 + 1 X o o o o * * * * * * * * * * * 0 ^ X miles running Concurrent model (or Covariance, or Slope shift model)
Let 1= 1 + ’1D where D= 1 if used car = 0 otherwise Now in one model : Yi = 0 + (1 + ’1D) Xi +i multiplicative dummy variable = 0 + 1 Xi + ’1D*Xi +i = 0 + 1 Xi + ’1Zi+i The estimated relations are : new car : Yi = 0 + 1 Xi ^ ^ ^ == == used car : Yi = 0 + (1 + ’1D) Xi whereD = 1 ^ ^ ^ ^ or Yi = 0 + 1 Xi ^ ^ ^ ^ If ’1 0, means the estimated slopes for cars is different.
^ Test whether ’1 = 0 or not ? ^ ^ ^ (i) Compare : (a) Y = 0 + 1 X Two separate models ^ ^ ^ (b) Y = 0 + 1X (ii) use t-test on ’1:Y = 0 +1 Xi + ’1 Z ^ ^ ^ ^ ^ compare tcP, N-3 and t* H0 : ’1 = 0 H1 : ’1 > 0 ^ ^ If t* > tcP, N-3 or (’1 0) => reject H0
…... …... …... …... …... ^ ^ ^ ^ Y = 0 + 1 Xi + ’1 Zi Check the t-value
Example: Estimating Seasonal effects : E = 0 + 1 T + E : electricity consumption T : temperature To capture effect of seasonal factors E = 0 + ’0D1 +’’0D2 + ’’’0D3 + 1T + D2 = 1 if spring 0 otherwise D1 = 1 if winter 0 otherwise D3 = 1 if summer 0 otherwise where Control group Q1 Q2 Q3 Q4 spring summer fall writer Shifts in both intercept and slope
The estimated models : Fall E = 0 + 1 T ^ ^ ^ Winter E = (0 + ’0)+ (1 + ’1) T ^ ^ ^ ^ ^ Spring E = (0 + ’’0)+(1 + ’’1) T ^ ^ ^ ^ ^ ^ Summer E = (0 ’’’0)+(1 + ’’’1) T ^ ^ ^ ^ E ^ ^ E=(0+ ’’’0)+(1+ ’’’1)T(Summer) ^ ^ ^ E = (0 + ’’0)+(1 + ’’1)T (Spring) ^ ^ ^ ^ ^ E = (0 + ’0)+(1 +’1)T(winter) ^ ^ ^ ^ ^ ^ ’’’0 E = 0 + 1T (Fall) ^ ^ ^ ’’0 ’0 ^ 0 ^ T
Estimating Seasonal effects :(Cont.) Also consider the slope in different seasons Let = 0 + ’0D1 + ’’0D2 + ’’’0D3 Thus, the full general specification is E = [0+ ’0D1 + ’’0D2+’’’0D3]+1T + ’1D1 T+’’1D2 T + ’’’1D3 T + Z1 Z2 Z3
D1 = 1 1st Quarter = 0 otherwise D2 = 1 2nd Quarter = 0 otherwise D3 = 1 3rd Quarter = 0 otherwise Quarterly effect is same as seasonal effect Control quarter is the 4th quarter
1. Set the seasonal dummy = 1 if there is the 1st quarter = 0 otherwise
How does the quarterly dummy variable look like?
Basic model 1989 1960 1974 YT = 0 + 1 XT + T Define a dummy variable : D = 1 for the period 1974 onward = 0 otherwise To test whether the structures of two periods are different, the specification must assume that * = 0 + ’0 D * = 1 + ’1 D Dummy regression: YT = 0 + ’0 D + 1 XT + ’1D XT + T (2) Structural Test based on Dummy variables
_ Dependent Var. Constant CAPt R2 F RSS n Sample : 60 - 89 unemplt 30.0 -0.293 0.761 93.6 17.15 30 (12.1) (9.7) RSSR ^ Sample : 60 - 73 unemplt 19.64 -0.175 0.59 19.7 4.69 14 (5.9) (4.4) RSS1 ^ Sample : 74 - 89 unemplt 30.63 -0.296 0.871 102.1 3.29 16 (13.1) (10.1) RSS2 ^ Note : t-values are in parentheses The Chow test on the Unemployment rate-capacity utilization rate
For the unrestricted model : RSSu = RSS1 + RSS2 = 4.69 + 3.29 = 7.98 (RSSR - RSSu) / k+1 (17.15 - 7.98) / 2 F* = = = 14.9 RSSu / (N - 2k-2) 7.98 / (30 - 4) Fc 0.01, k, T -2k = Fc0.01= 5.53 = 3.37 0.05 0.05, 2, 26 Restriction F-test procedures: H0:No structural change H1:Yes F* > Fc ==> reject H0
Using the dummy variable to identify the structural change The unemployment rate - capacity utilization rate Sample : 1960 - 1989 Dt = 1 1974 to 1980 = 0 prior to 1974 unempl = 19.6 + 11.0 Dt - 0.175 CAPt - 0.121 (Dt*CAPt) ^ (6.7) (2.7) (5.0) (2.5) _ R2 = 0.88 SEE = 0.554 F = 72.2 n = 30 The estimated of 1960-1973: unempl = 19.6 - 0.175 CAP ^ The estimated of 1974-1980: unempl = (19.6+11.0) - (0.175+0.121)CAP = 30.6 - 0.296 CAP ^
Observed data Year Ut CAPt Dt Dt*CAPt 60 4.20 5.70 0 0 61 0 0 62 0 0 63 0 0 … … ... 68 0 0 69 0 0 70 0 0 71 0 0 72 0 0 73 0 0 74 1 75 1 76 1 77 1 ... 1 ... 1 ... 1 89 1 D = 1 if t 74 = 0 otherwise …………….…… ……...….…….... 10.5 10.5 11.2 11.2 Ut = 0 + 1 CAPt + ’0Dt + 2 Dt*CAPt