340 likes | 460 Views
Bootstrap Event Study Tests. Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance. An Example of an “Event”. Event (Outlier) Detection. Main Idea: y 0 is an “outlier” if it is unusual with respect to “typical circumstances”. Definitions:
E N D
Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance
Event (Outlier) Detection • Main Idea: y0 is an “outlier” if it is unusual with respect to “typical circumstances”. • Definitions: • Critical value: The threshold c that y0 must exceed to be called an outlier • a level: The probability that Y0 exceeds c under typical circumstances • p-value: The probability that Y0 exceeds the particular observed value y0 under typical circumstances
Case 1: Normal distribution, known mean (m), known variance (s2). Let Y0 is associated with an “event” if Z is large. Critical and p-values are from Z distribution. Ex: y0 = -7.13, m=-.15, s=1.0 ÞZ=-6.98. a=.05 critical value: Za/2 = 1.96. p-value = 2P(Z<-6.98) = 3E-12
Case 2: Normal distribution, unknown m, known s2. Let Y1,…,Yn denote an i.i.d. sample under typical circumstances (excluding Y0). Then
Case 3: Normal distribution, unknown m, unknown s2. Let Y1,…,Yn denote an i.i.d. sample under typical circumstances (excluding Y0). Then Critical and p-values are from tn-1 distribution. Example: n=87, y0 = -7.13, =-.14, s=1.013 ÞT=-6.86. a=.05 critical value: t87-1,a/2 = 1.99. p-value = 2P(T87<-6.86) =1E-9
Notes • The method is essentially asking, “how far into the tail of the typical distribution is y0”? (Estimation of the mean just gives a minor correction: (1+ 1/n) in the variance formula; Estimation of the variance gives another minor correction: Tn-1 instead of Z critical and p-values) • The central limit theorem does not apply since we are concerned with the distribution of Y0, not the distribution of
Case 1A: Known Distribution • Exact critical values for Z are • cL = {a/2 quantile of distribution of Z} • cU = {1-a/2 quantile of distribution of Z} • Exact P-Value: • p-value = 2 min{ P(Z£ z), P(Z³ z) }
A Simulation-Based Approach • Simulate “many” (1,000s) of Z’s at random from the pdf • Critical values: • cL is the 100(a/2) percentile of the simulated data • cU is the 100(1-a/2) percentile of the simulated data • P-value: • pL = {proportion of simulated Z’s that are smaller than z. • pU = proportion of simulated Z’s that are larger than z. • P-value = 2{min(pL, pU)}.
Case 1B: Unknown Distribution • Let Y1,…,Yn denote an i.i.d. sample under typical circumstances (excluding Y0). Then the empirical pdf approximates the true pdf if n is large (Glivenko-Cantelli Theorem). • Thus, approximate critical and p-values can be obtained by using the empirical distribution. • This is the essential nature of the “bootstrap.”
Case 1B.i: Simulation-Based Approach with known m, s Simulate 1000’s of values of Z = (Y0 – m)/s as follows: • Select a value Y01 at random from the observed data Y1,…,Yn ; let Z1 = (Y01 – m)/s • Select a value Y02 at random from the observed data Y1,…,Yn ; let Z2 = (Y02 – m)/s … B. Select a value Y0B at random from the observed data Y1,…,Yn ; let ZB = (Y0B – m)/s Use the simulated data Z1,…,ZB to determine critical and p-values.
Case 1B.ii: Unknown m, s • Use the statistic • The distribution of the statistic depends on the randomness inherent in
Extension: Multivariate Market Model The MVRM models may be expressed as Ri = Xbi + Dgi + ei, for i= 1,…,g (firms or portfolios). Observations within a row of e = [e1 | … | eg] are correlated; this is called “cross-sectional” correlation. Observations on e = [e1 | … | eg] between rows 1,…,n are assumed to be independent in the classical MVRM model. Null hypothesis: H0: [g1 | … | gg] = [0 | … | 0] This multivariate test is computed easily and automatically using standard statistical software packages, using exact (under normality) F-tests. The test is based on Wilks’ Lambda likelihood ratio criterion.
Hein, Westfall, Zhang Bootstrap Method • Fit the MVRM model. Obtain the F-statistic for testing H0 using the traditional method (assuming normality). Obtain also the ((n+1)´g) sample residual matrix e = [e1| … | eg]. • Exclude the row corresponding to event from e, leaving the (n´g) matrix e-. • Sample (n+1) row vectors, one at a time and with replacement, from e-. This gives a ((n+1)´g) matrix [ R1* | … | Rg* ]. • Fit the model Ri* = Xbi + Dgi + ei, i = 1, …, g, and obtain the test statistic F* using the same technique used to obtain the F-statistic from the original sample. • Repeat 3 and 4 NBOOT times. The bootstrap p-value of the test is the proportion of the NBOOT samples yielding an F*-statistic that is greater than or equal to the original F-statistic from step 1.
Alternative Method (Kramer,2001) Test statistic is Z = S ti/(g1/2st), where ti is the t-statistic from the univariate dummy-variable-based regression model for firm i, and st is the sample standard deviation of the g t-statistics. Algorithm: (i) create a pseudo-population of t-statistics ti* = ti - reflecting the null hypothesis case where the true mean of the t-statistics is zero, (ii) sample g values with replacement from the pseudo-population and compute Z* from these pseudo-values, (iii) repeat (ii) NBOOT times, obtaining Z1*, …, Zb*. The p-value for the test is then 2*min(pU, pL), where pL is the proportion of the NBOOT bootstrap samples yielding Zi* £ Z, and where pU is the proportion of the NBOOT samples yielding Zi* ³ Z. Assumption: The statistics are cross-sectionally independent
Modified Kramer Method • Model-Based bootstrap Kramer: Bootstrap Kramer’s Z = S ti/(g1/2st), but by resampling MVRM residual vectors as in HWZ. • Model-based sum t: Bootstrap St= Sti by resampling MVRM residual vectors as in HWZ.
Table 1. Simulated Type I error rates as a function of cross-sectional correlation.
/*--------------------------------------------------------------*//*--------------------------------------------------------------*/ /* Name: bootevnt */ /* Title: Macro to calculate bootstrap p-values for event */ /* studies */ /* Author: Peter H. Westfall, westfall@ttu.edu */ /* Release: SAS Version 6.12 or higher, requires SAS/IML */ /*--------------------------------------------------------------*/ /* Inputs: */ /* */ /* DATASET = Data set to be analyzed (required) */ /* */ /* YVARS = List of y variables used in the multivariate */ /* regression model, separated by blanks (required) */ /* */ /* XVARS = List of x variables used in the multivariate */ /* regression model, separated by blanks (required) */ /* */ /* EVENT = Name of dummy variable indicating event */ /* observation (e.g., day). This is required. */ /* */ /* EXCLUDE = Name of dummy variable indicating days that */ /* should be excluded from the resampling. If there */ /* are multiple event days in the model, then all */ /* those days should be excluded because the */ /* residuals are mathematically zero. If there are */ /* not multiple eventdays, then the EXCLUDE */ /* variable should be identical to the EVENT */ /* variable. */ /* */ /* NBOOT = Number of bootstrap samples. This input is */ /* required. Pick a number as large as possible */ /* subject to time constraints. Start with 100 */ /* and work your way up, noting the accuracy as */ /* given by the confidence interval in the output. */ /* */ /* MODELBOOT = 1 for requesting model-based bootstrap tests, */ /* = 0 to exclude them. */ /* */ /* NPBOOT = 1 to request Kramer's nonparametric bootstrap */ /* tests, =0 to exclude them. */ /* */ /* SEED = Seed value for random numbers (0 default) */ /* */ /*--------------------------------------------------------------*/ /* Output: This macro computes normality-assuming exact p- */ /* values and bootstrap approximate p-values that do not */ /* require the normality assumption. A 95% confidence interval */ /* for the true bootstrap p-value (which itself is approximate */ /* because it uses the empirical, not the true, residual */ /* distribution) also is given. */ /*--------------------------------------------------------------*/
Invocation of Macro libname fin "c:\research\coba"; data sinkey; set fin.sinkey; run; %bootevnt(dataset=sinkey, yvars=pr1 pr2 pr3 pr4, xvars=ds m1 m2 m3 dsm d2 d3 d4 d5 d6, event=d1, exclude=exclude, nboot=1000, modelboot=1, npboot=1, seed=182161);
Normality-Assuming Tests for Event TSQ F NDF DDF PVAL 15.025505 3.6957895 4 183 0.0064153 NBOOT Model-based bootstrap Binder p-value, using 20000 samples with 95% confidence limits on the true bootstrap p-value BOOTP LCL UCL 0.01115 0.0096947 0.0126053
Model-based bootstrap Kramer p-value, using 20000 samples with 95% confidence limits on the true bootstrap p-value BOOTKP LCLK UCLK 0.0609 0.0561373 0.0656627 NBOOT Model-based bootstrap Sum t p-value, using 20000 samples with 95% confidence limits on the true bootstrap p-value BOOTTSUMP LCLSUMT UCLSUMT 0.0001 -0.000096 0.000296
1.55 % of the bootstrap samples had 0 variance NBOOT Nonparametric bootstrap Kramer p-value, using 20000 samples with 95% confidence limits on the true bootstrap p-value BOOTTNP LCLNP UCLNP 0.1404 0.1333184 0.1452147
Robustness of Bootstrap to Serial Correlation • Recall that the method is essentially a comparison of Y0 to the distribution of Y1,…,Yn. • If the empirical distribution of Y1,…,Yn converges to F, then the unconditional null probability of an “event” also converges to a =F(ca/2) + (1-F(c1-a/2)). • Such convergence occurs for typical stationary time series processes.
Conclusions • We use t, not z even when n is large. Why? Because t is generally more accurate. • We should use bootstrap tests instead of traditional tests for precisely the same reason. • We must account for cross-sectional correlation in the analysis. • The recommended method is our bootstrap with a modification of Kramer’s Z (The model-based sum t method)Software is available from westfall@ba.ttu.edu