Misspecification in event studies

Journal of Corporate Finance 45 (2017) 333–341 Contents lists available at ScienceDirect Journal of Corporate Finance journal homepage: www.elsevier...

Download PDF

285KB Sizes 2 Downloads 84 Views

Report

PDF Reader
Full Text

Journal of Corporate Finance 45 (2017) 333–341

Contents lists available at ScienceDirect

Journal of Corporate Finance journal homepage: www.elsevier.com/locate/jcorpfin

Misspeciﬁcation in event studies☆ Joseph M. Marks a, Jim Musumeci b,⁎ a b

Department of Finance, AAC 273, Bentley University, Waltham, MA 02452, United States Department of Finance, MOR 107, Bentley University, Waltham, MA 02452, United States

a r t i c l e

i n f o

Article history: Received 31 December 2016 Received in revised form 2 May 2017 Accepted 8 May 2017 Available online 09 May 2017 JEL classiﬁcation: G14 C10 C15

a b s t r a c t We examine the statistical error and efﬁciency associated with two commonly used eventstudy techniques when applied to samples of various sizes. Previous research has established that the frequently used Patell (1976) test is not well speciﬁed when the event itself creates additional return variance. We ﬁnd that even under ideal conditions when the event creates no additional variance, the Patell test rejects a true null hypothesis substantially more often than the stated signiﬁcance level. In contrast, the alternate test of Boehmer et al. (1991) performs well in samples of all sizes and under all conditions we consider. © 2017 Elsevier B.V. All rights reserved.

Keywords: Event study Standardized abnormal return Misspeciﬁcation Simulation Patell test BMP test

1. Introduction Since the introduction of event-study methods in accounting (Ball and Brown, 1968) and ﬁnancial research (Fama et al., 1969), they have been the predominant method of determining whether an event is associated with a change in ﬁrm value. Typically, a researcher collects a sample of ﬁrms experiencing the event and tests whether their returns on the event day are signiﬁcantly different from what would be expected absent the event. The required statistical tests usually rely on the Central Limit Theorem (CLT),1 of which there are several versions. The most common version (Feller, 1968, for example) asserts that if {Xi}N i= 1 is a random sample from a population with ﬁnite mean and variance, then the sampling distribution of the mean of {Xi} approaches a normal distribution as N → ∞. The theorem is silent concerning the rate of convergence, although it is known that convergence is faster when the skewness and kurtosis of the underlying distribution are near their values for a normal distribution. Daily stock returns, however, are known to be both skewed and have fat tails (e.g., Mandelbrot (1963) and Fama (1965)), and this remains true of abnormal returns calculated relative to a benchmark return (Brown and Warner, 1985). Indeed, Mandelbrot and Fama speculated that the population of stock returns might not even have a ﬁnite variance, although subsequent research favored the conjecture that stock returns are drawn from a ﬁnite mixture of normal distributions (e.g., Campbell et al. (1997), ☆ The authors thank Charlie Hadlock, Mingfei Li, Yun Ling, Richard Sansing, and participants in the Bentley University seminar series for their helpful comments. The usual disclaimer applies. ⁎ Corresponding author. E-mail addresses: [email protected] (J.M. Marks), [email protected] (J. Musumeci). 1 One method that does not rely on the CLT is Corrado's (1989) non-parametric test.

http://dx.doi.org/10.1016/j.jcorpﬁn.2017.05.003 0929-1199/© 2017 Elsevier B.V. All rights reserved.

334

J.M. Marks, J. Musumeci / Journal of Corporate Finance 45 (2017) 333–341

pp. 18–19; Harris (1986); or Kon (1984)). A population variance that is not ﬁnite would imply we cannot rely on the CLT at all, but even the assumption that daily stock returns are drawn from a mixture distribution with fat tails means that we cannot assume “small” samples necessarily have the properties required to apply standard parametric tests. The size required for a sample of daily stock returns to be sufﬁciently large for an event-study test to be well speciﬁed is primarily an empirical matter. Brown and Warner (1985) and Boehmer et al. (1991) examined this issue, but both used only 250 simulated portfolios of size 50 events each. The use of only 250 simulated portfolios leads to conﬁdence intervals that are quite wide, thus making it difﬁcult to detect misspeciﬁcation of the test. For example, at a signiﬁcance level of α = 5% and when the number of simulations is only 250, the pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1:96 250ð0:05Þð0:95Þ , or (3.3%, 95% conﬁdence interval suggested by the npq binomial approximation of standard deviation is 0:05 250 1 7.7%). Given that the minimum increment to the rejection frequency is 250 = 0.4%, a method would have to reject a true null 60% more often than it should to conclude that it is misspeciﬁed. The small number of simulations was a constraint imposed by the computing power of that era, but this is not a signiﬁcant limitation today. We reexamine this issue using 10,000 simulated event portfolios. This provides tighter conﬁdence intervals and thus more powerful tests for misspeciﬁcation. We ﬁnd that even if the event does not alter the dispersion of standardized abnormal returns (abnormal return normalized by the standard deviation of the estimation period residuals), the default method in Eventus is misspeciﬁed for samples from 100 to as large as 5000 ﬁrms. Like Boehmer et al. (1991) and Harrington and Shrider (2007), we ﬁnd this misspeciﬁcation is markedly larger if the event causes an increase in variance, which Harrington and Shrider ﬁnd it necessarily does. The method of Boehmer et al. (1991), hereafter referred to as the BMP test, is based on a cross-sectional test of standardized abnormal returns and is well speciﬁed for all portfolio sizes we considered, both in the absence or presence of event-induced variance. Furthermore, while the default test statistic in Eventus is the Patell test, the BMP test statistic can just as easily be chosen by selecting the STDCSECT option. For researchers who access Eventus through Wharton Research Data Services, step 5 of the query form provides the ability to select different test statistics, including both Patell and BMP.2 Finally, because the large number of simulations might lead us to conclude that a test is statistically misspeciﬁed even though it produces good practical results, we apply Bayes' Theorem to compare the probability the null is false given either test rejects it. We ﬁnd that, compared with the BMP test, the Patell test provides lower conﬁdence that the null is false given its t-statistic is signiﬁcant.

2. Description of data Provided we had sufﬁcient data during a 120-day estimation period as described below, we used the set of all daily CRSP return observations from 1926 to 2015 and total returns on the CRSP equally weighted index as inputs for the market model to ﬁnd benchmark returns, ^R ^i þ β E Ri;E ¼ α i M;E

ð1Þ

^ the ordinary least squares (OLS) es^ and β where i denotes the ﬁrm, E the event day, M the CRSP equally weighted index, and α timates from a 120-day estimation window ending two days prior to the event day.3 In an effort to ensure meaningful estimates ^ we required a minimum of 100 observations in the 120-day estimation period preceding the event. This left us with ^ and β, of α 68,934,304 values of E(Ri,E) across 24,021 securities (PERMNOs). Next, we formed a simulated effect of an event by adding to the actual return, Ri,E, a variable Δi,E with mean Δ (0 or 0.25%) and a variance of θ (0 or 1) times the variance of the market-model residuals during the stock's estimation period. For example, Δ = 0 indicates there is no mean effect of the event, and θ = 1 means that the effect of the event has a variance equal to the estimationperiod variance of the residual. We add this simulated effect of the event to the actual return on the event day and then subtract the benchmark expected return to obtain a simulated abnormal return: h i h i ^R ^i þ β ARi;E ¼ Ri;E þ Δi;E − α i M;E

ð2Þ

We then calculate the standardized abnormal return, SARi;E ¼

ARi;E vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 ﬃ u u R −R u M;E M 1 σiu 2 t1 þ T þ ∑t RM;t −RM

ð3Þ

as per Patell (1976), with σi the standard deviation of the estimation period residuals and the remainder of the denominator an 2 For researchers who do not use Eventus, a simple Matlab function that conducts event studies and calculates the Patell and BMP test statistics has been written by the authors and is available for public download at http://atc3.bentley.edu/faculty/jmarks/event_study.zip. This function can be used directly or as a guide for writing a similar function in another language. 3 With this timing, the expected return computed in Eq. (1) is based on data independent of the actual event return (i.e. the event return and last return in the estimation window do not share a price). The results we present are unaffected by ending the estimation window on the day prior to the event date.

J.M. Marks, J. Musumeci / Journal of Corporate Finance 45 (2017) 333–341

335

adjustment for the fact that the benchmark return is an out-of-sample prediction. When Δ = θ = 0, we would expect the average abnormal return and average standardized abnormal return to have a mean of zero, and the latter to have a standard deviation of one. 3. Simulations In this section we sample without replacement from the population of 68,934,304 expected daily returns to form 10,000 randomly selected portfolios, each containing 5000 hypothetical events. For each portfolio of 5000, we then choose nested subsets of sizes 2500, 1000, 500, 250, and 100.4 Then, for each event day and each combination of Δ and θ, we calculate the abnormal return ^ RM;E ). Finally, we use the ^i þ β as the difference between the simulated return (Ri, E + Δi,E) and the market model prediction (α i estimation-period standard error to calculate the Patell (1976) test statistic (which is the default in Eventus, e.g., see Cowan (2007)), and is commonly used)5 and the BMP test statistic. Both tests normalize the event-day abnormal return in Eq. (2) by the standard deviation of the estimation-period residuals (after adjusting for the fact that the benchmark return is an out-of-sample prediction) to obtain the standardized abnormal return (SAR) of Eq. (3). The Patell test assumes the resulting SARs are drawn from a t-distribution with degrees of freedom equal to the number of days in the estimation period minus two, while the BMP test uses the cross-sectional standard deviation of the SARs to calculate its t-statistic. Thus, the Patell test implicitly assumes that the event itself creates no additional variance, while the BMP test assumes it creates a shock that has a variance proportional to the standard deviation of the residuals from the estimation period. Research cited in the introduction has found that individual stock returns are not normally distributed, but instead are skewed towards the right and have fat tails. The CLT guarantees only that, if the underlying population has a ﬁnite variance, then the sampling distribution of the mean will eventually converge to a normal distribution as sample size increases. Given this and Kothari and Warner's (2007) observation that “for small samples…one cannot rely on asymptotic results for the central limit theorem,” one could reasonably expect improved results for larger sample sizes. In the case of Δ = θ = 0, this would mean the frequency of rejections converges to the assumed signiﬁcance level as sample size increases. 3.1. Tests of speciﬁcation with no variance increase (Δ = 0, θ = 0) Individual stock returns and residuals from the market model are not normally distributed, and therefore test results might be poor when the number of events studied is small. However, the CLT suggests that the performance of the test statistics should improve as portfolio size increases. Panel A of Table 1 reports that while the Patell test indeed works poorly for small sample sizes, its misspeciﬁcation actually increases as sample size increases from 100 to 500, and then drops off only slightly after that. It rejects the null hypothesis signiﬁcantly more often than it should for any portfolio size up to 5000 at signiﬁcance levels of α = 5% and α = 1%. In particular, when α = 5% the lowest rejection rate associated with the Patell test is 7.04% (N = 100), over 40% more often than expected. The binomial approximation estimates the standard deviation of the rejection rate to be pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 10;000ð0:95Þð0:05Þ ≅ 0:218%. Thus the observed 7.04% rejection rate is about 9.36 standard deviations above the 5% theoretical rejec10;000 tion rate. Its worst performance occurs with a sample size of 500, when we rejected the null hypothesis 7.75% of the time,6 about 12.62 standard deviations above the expected 5% rate. To put this 7.75% rejection rate in context, using the Patell test is similar to rejecting a null hypothesis for a well-speciﬁed test assuming a signiﬁcance level of α = 5% not at a t-value of 1.96, but instead at 1.77. The third row of Panel A ﬁnds similar results when α = 1%. When N = 100, the rejection rate is over twice its theoretical rate at 2.15% (t = 11.56). It peaks when N = 500 (rejection rate = 2.21%, or 12.16 standard deviations above the 1% theoretical value), then falls to 1.86%, or 8.64 standard deviations above the theoretical value when N = 5000. In contrast, for the BMP test reported in Panel B, there is no evidence of misspeciﬁcation for any sample size or signiﬁcance level. In all cases, the observed frequency of rejection associated with the BMP test is within a 95% conﬁdence interval based on the binomial approximation. ^ ^i þ β To test whether outliers are causing the Patell test's misspeciﬁcation, we winsorized the event-period residuals, Ri;E −ðα i RM;E Þ, at 0.5% and 99.5% for the entire set of stock returns before forming portfolios or simulating an effect of the event. If outliers are the source of the excessive rejection rates documented in Table 1, then winsorizing should greatly mitigate the problem. Table 2 shows that the problem, while not eliminated, is substantially alleviated. For example, instead of rejecting the null hypothesis that the Patell test is well speciﬁed at α = 5% with t-values ranging from 9.36 (when N = 100) to 12.62 (when N = 500) as in Table 1, the rejection rates for the winsorized returns range from only 5.52% (t = 2.39) when N = 5000 to 5.91% (t = 4.18) when N = 1000.

4 This range is a bit wider than, but roughly comparable to, the number of events in several recent event studies. Speciﬁcally, we examined four event studies and found the actual sizes ranged from 364 events in Borokhovich et al. (2014) to 1132 in Chen et al. (2014). 5 For example, David and Ginglinger (2016) and Ang and Ismail (2015) specify use of the Patell test, and given it is the default in Eventus, we infer that expressions such as “standard event-study methodology” or “conventional event-study methodology” as in Andres and Hofbaur (2017), Dutordoir et al. (2016), and Chen et al. (2014) indicate use of the Patell test as well. 6 This is roughly consistent with the 7.6% rejection rate found by Boehmer et al., but as discussed earlier, because they used only 250 simulated portfolios, their 7.6% was not sufﬁciently large to reject the null hypothesis that the test was well speciﬁed. Brown and Warner (1985) used only one-tailed tests, so their results are not directly comparable.

336

J.M. Marks, J. Musumeci / Journal of Corporate Finance 45 (2017) 333–341

Table 1 Frequency of rejection of null hypothesis SAR = 0 with no simulated abnormal return (Δ = 0) and no event-induced variance (θ = 0). Panel A: Patell test Patell

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 t-Statistic (difference from 5%) |t| N 2.57 t-Statistic (difference from 1%)

7.04% 9.360 2.15% 11.558

7.45% 11.241 2.19% 11.960

7.75% 12.618 2.21% 12.161

7.70% 12.388 2.16% 11.658

7.69% 12.343 2.14% 11.457

7.50% 11.471 1.86% 8.643

Panel B: BMP test BMP

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 t-Statistic (difference from 5%) |t| N 2.57 t-Statistic (difference from 1%)

5.08% 0.367 1.06% 0.603

5.07% 0.321 1.07% 0.704

5.17% 0.780 1.02% 0.201

5.05% 0.229 0.97% −0.302

5.12% 0.551 0.94% −0.603

4.78% −1.009 0.90% −1.005

Rejection rates absent any actual abnormal return or variance increases. Even under these most favorable conditions, the Patell test rejects substantially more often than it should.

We emphasize that we are not suggesting returns be winsorized as a solution to the misspeciﬁcation problem; instead, we employ it here (under the condition of simulated abnormal returns) only for statistical diagnostic purposes. The winsorizing pro^ RM;E Þ, the ^i þ β cess used here would not even be possible in an actual event study because we cannot directly observe R i;E −ðα i ^ ^i þ β residual return absent the event; we can only observe the abnormal return including the event's effect, or ½Ri;E þ Δi;E −½α i RM;E . Thus in practice any winsorizing would also necessarily alter the effect of the event itself.

3.2. Test power with no variance increase (Δ = 0.25%, θ = 0) Next we examine test power for an abnormal return uniformly equal to 0.25% (Δ = 0.25% and θ = 0). The results are reported in Table 3. At ﬁrst glance it appears that the Patell test is slightly more powerful than the BMP test; for example, it rejects the null 73:51% 68:69% −1 ¼ 7:02% more frequently when α = 5% and the sample size is 500. To see whether the Patell test is indeed a more powerful test or if its higher rejection rates merely reﬂect its tendency to reject more frequently as ^ RM;E Þ, at 0.5 and 99.5%. These results ^i þ β documented in Table 1, we again winsorize the event-period residuals, Ri;E −ðα i are shown in Table 4 and demonstrate that the Patell test's advantage in power falls substantially. For example, when α = 5% and sample size is 500, winsorizing causes the Patell test to have fewer rejections (72.34% as opposed to 73.51% in the unwinsorized sample). For reasons discussed in Appendix A, the BMP test has more rejections (70.93% compared with 68.69% in the unwinsorized sample). In this winsorized case, the Patell test rejects only 72:34% 70:93% −1 ¼ 1:02% more frequently than the BMP test. This suggests the Patell test's apparently greater test power is largely driven by outliers that cause it to reject more often, whether the null is true or false. This raises the issue of conﬁdence that the null is false given either test rejects it, which is examined in greater detail in Section 4.

Table 2

^ RM;E Þ. ^ i −β Rejection rates of a true null hypothesis (Δ = 0) with no event-induced variance (θ = 0), winsorized values of [Ri;E −ðα i Panel A: Patell test Patell

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 t-Statistic (difference from 5%) |t| N 2.57 t-Statistic (difference from 1%)

5.61% 2.799 1.49% 4.925

5.86% 3.946 1.31% 3.116

5.90% 4.129 1.41% 4.121

5.91% 4.175 1.36% 3.618

5.80% 3.671 1.28% 2.814

5.52% 2.386 1.09% 0.905

Panel B: BMP test BMP

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 t-Statistic (difference from 5%) |t| N 2.57 t-Statistic (difference from 1%)

5.26% 1.193 1.10% 1.005

5.15% 0.688 1.08% 0.804

5.22% 1.009 1.06% 0.603

5.05% 0.229 1.05% 0.503

5.08% 0.367 0.94% −0.603

4.86% −0.642 0.86% −1.407

^ RM;E Þ, are winsorized before simulating an abnormal return. While the Patell test still rejects more ^i þ β Rejection rates when the event-period residuals, R i;E −ðα i often than it should, the rates are substantially lower than in Table 1, suggesting it is primarily event-period outliers that cause the Patell test to be misspeciﬁed.

J.M. Marks, J. Musumeci / Journal of Corporate Finance 45 (2017) 333–341

337

Table 3 Test power absent event-induced variance (Δ ¼ 0:25%; θ ¼ 0Þ. Panel A: Patell test Patell

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 |t| N 2.57

23.85% 10.25%

46.21% 25.86%

73.51% 51.85%

94.59% 85.13%

99.97% 99.93%

100.00% 100.00%

BMP

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 |t| N 2.57

19.84% 6.72%

41.17% 19.94%

68.69% 44.34%

92.64% 80.55%

99.81% 99.43%

99.99% 99.96%

Panel B: BMP test

Absent any event-induced variance and given a simulated abnormal performance of 0.25%, the Patell test rejects the null hypothesis more often than does the BMP test. Table 4 suggests the reason for this is unrelated to actual test power, but rather to outliers even absent an event.

3.3. Speciﬁcation and test power with a variance increase (Δ = 0 and Δ = 0.25%; θ = 1) The results from Table 3A and B indicate that even under simple conditions (i.e., when the effect of the event is identical for all ﬁrms), the Patell test is misspeciﬁed. As noted above, the Patell test implicitly assumes that there is no variance shift, while Harrington and Shrider (2007) show that cross-sectional variation in the event return necessarily leads to a variance increase. We now introduce variance in the simulated effect of the event while continuing to allow the mean effect to assume a value of either Δ = 0 or Δ = 0.25%. Because the results in this section are similar to those of Boehmer et al. (1991) and Harrington and Shrider (2007), we summarize them very brieﬂy. When the mean effect of the event is zero, but there is event-induced variance equal to that of the estimation period residuals, Table 5 indicates that the Patell test is severely misspeciﬁed for all sample sizes considered. For example, using a portfolio size of N = 500 and assuming a signiﬁcance level of 5%, the Patell test rejects a true null hypothesis of zero abnormal return in 18.49% of the portfolios. To put this 18.49% rejection rate in context, this would be analogous to using 1.33 instead of 1.96 as the critical value for rejecting the null hypothesis in a well-speciﬁed test at a signiﬁcance level of α = 5%. When α = 1%, the misspeciﬁcation is even worse, with the Patell test rejecting the null hypothesis approximately 8% of the time across sample sizes. Table 6 reports the results when Δ = 0.25% and θ = 1. It again appears that the Patell test is somewhat more powerful than the BMP test. However, as was the case in Tables 1 and 3, the Patell test seems more powerful because its misspeciﬁcation leads it to rejects more often all of the time, both when the null is true and when it is false. In the next section we explore the tradeoffs between misspeciﬁcation and test power. 4. Does the misspeciﬁcation make a substantive practical difference? How important is the degree of misspeciﬁcation found in the previous section? Statistically speaking, the Patell test is severely misspeciﬁed, but what is the practical consequence of this observation? This is an important question because, by virtue of being the default event-study method in Eventus, the Patell test is used frequently in research. If its misspeciﬁcation is not consequential, then George Box' observation (Box and Draper (1987), p. 74) “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful?” is applicable. For example, if a test purports to reject a true null 5% of the time, but actually rejects it 5.01% of the time, it is very unlikely that this minute degree of misspeciﬁcation would deter anyone from using it if it is more powerful than alternative tests. The question is not whether or not a test is misspeciﬁed, but rather how misspeciﬁed must it be for us to avoid using it?

Table 4

^ RM;E Þ]. ^ i −β Test power absent event-induced variance (Δ ¼ 0:25%; θ ¼ 0Þ, winsorized values of [Ri;E −ðα i Panel A: Patell test Patell

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 |t| N 2.57

21.86% 8.74%

43.98% 23.15%

72.34% 49.36%

94.58% 84.61%

99.98% 99.93%

100.00% 100.00%

N = 100 20.66% 7.17%

N = 250 42.46% 20.97%

N = 500 70.93% 46.91%

N = 1000 93.99% 83.01%

N = 2500 99.88% 99.60%

N = 5000 99.99% 99.99%

Panel B: BMP test BMP |t| N 1.96 |t| N 2.57

When event-period residuals are winsorized before a simulated abnormal performance of 0.25% is added, the Patell test rejects the null only very slightly more often than does the BMP test. This suggests that the Patell test appears to be more powerful not because it is better at detecting abnormal performance, but because outliers cause it to reject more frequently whether the null is true or false.

338

J.M. Marks, J. Musumeci / Journal of Corporate Finance 45 (2017) 333–341

Table 5 Test speciﬁcation with event-induced variance (Δ ¼ 0%; θ ¼ 1Þ. Panel A: Patell test Patell

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 t-Statistic (difference from 5%) |t| N 2.57 t-Statistic (difference from 1%)

18.34% 61.208 7.74% 67.740

18.47% 61.805 8.56% 75.981

18.49% 61.896 8.26% 72.966

18.06% 59.923 8.04% 70.755

18.78% 63.227 8.38% 74.172

19.34% 65.796 8.26% 72.966

Panel B: BMP test BMP

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 t-Statistic (difference from 5%) |t| N 2.57 t-Statistic (difference from 1%)

4.78% −1.009 1.06% 0.603

5.57% 2.615 1.24% 2.412

5.06% 0.275 1.01% 0.101

4.84% −0.734 1.07% 0.704

4.98% −0.092 0.97% −0.302

4.87% −0.596 1.04% 0.402

When the null hypothesis is true but the event itself creates additional variance, the Patell test is signiﬁcantly misspeciﬁed, but the BMP test rejects at the appropriate levels. Table 6 Test power with event-induced variance (Δ ¼ 0:25%; θ ¼ 1Þ. Panel A: Patell test Patell

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 |t| N 2.57

31.33% 18.08%

47.38% 31.42%

67.99% 52.00%

88.96% 79.04%

99.75% 99.04%

100.00% 100.00%

Panel B: BMP test BMP

N = 100

N = 250

N = 500

N = 1000

N = 2500

N = 5000

|t| N 1.96 |t| N 2.57

13.61% 4.08%

24.08% 9.62%

44.17% 22.92%

72.60% 48.74%

98.06% 92.11%

99.99% 99.84%

When the null hypothesis is false but the event creates additional variance, the Patell test rejects more frequently than does the BMP test. Once again, however, this does not imply it is a more powerful test, but rather that it simply rejects more frequently whether the null is true or false.

The answer to this question is necessarily subjective and relies in part on the relative costs of Type I and Type II error.7 However, we can clarify one aspect of this issue by answering the question “How conﬁdent are we that the null is really false conditional on our having rejected it?” for any tests being considered. This requires only a straightforward application of Bayes' Theorem: pðnull falsejnull rejectedÞ ¼

pðnull is rejectedjnull falseÞpðnull falseÞ pðnull is rejectedjnull falseÞpðnull falseÞ þ pðnull is rejectedjnull trueÞpðnull trueÞ

Suppose that 80% of our null hypotheses are in fact true, and assume that when the null is false, Δ = 0.25%. Using a rejection rate of 7.75% for the Patell test when Δ = 0 and θ = 0 (Panel A of Table 1, N = 500) and a rejection rate of 73.51% when Δ = 0.25% and θ = 0 (Panel A of Table 3, N = 500) gives us 0

Pattel s pðnull falsejnull rejectedÞ ¼

0:7351ð0:2Þ ¼ 70:34% 0:7351ð0:2Þ þ 0:0775ð0:8Þ

In contrast, given the BMP's 5.17% false rejection rate and its 68.69% test power, the same statistic is 0

BMP s pðnull falsejnull rejectedÞ ¼

0:6869ð0:2Þ ¼ 76:86% 0:6869ð0:2Þ þ 0:0517ð0:8Þ

Thus, absent any event-induced variance and when N = 500 ﬁrms, switching from the Patell test to the BMP test produces a ¼ 9:27% increase in the probability the null is truly false given the test rejects it.8

0:7686 0:7034 −1

7 For example, if the test is to diagnose cancer, we might well prefer a somewhat misspeciﬁed test with high power to a perfectly speciﬁed test with low power, especially if the test statistic is a 0–1 variable. See Winkler (1972) for a discussion of such issues. 8 Bayes' Theorem can also be used to illustrate the cost of data mining. Suppose, for example, that a test is well speciﬁed at α = 5% and has test power of 50%. Suppose also that when tests are carefully and thoughtfully designed, the null will still be true 80% of the time, but if a researcher adopts an approach of “let's try this and see if we get signiﬁcant results”, then the null will be true 90% of the time. An application of Bayes' Theorem based on these parameters gives us a 71.4% probability that the null is really false when the test rejects a carefully designed hypothesis, but only a 52.6% probability in the second case.

J.M. Marks, J. Musumeci / Journal of Corporate Finance 45 (2017) 333–341

339

The BMP test's advantage increases substantially when event-induced variance is present. Using the values from Tables 5 and 6, these two statistics become 0

Patell s pðnull falsejnull rejectedÞ ¼

0:6799ð0:2Þ ¼ 47:90% 0:6799ð0:2Þ þ 0:1849ð0:8Þ

and 0

BMP s pðnull falsejnull rejectedÞ ¼

0:4417ð0:2Þ ¼ 68:58% 0:4417ð0:2Þ þ 0:0506ð0:8Þ

In this case, switching from the Patell test to the BMP test produces a 0:6858 0:4790 −1 ¼ 43:17% increase in the probability the null is truly false given the test rejects it. Table 7 compares the two methods' probabilities that the null is false given that it is rejected at α = 5% for various proportions of true and false nulls being tested. 5. Conclusions Researchers who conduct event studies typically rely on the Central Limit Theorem to determine appropriate rejection regions for statistical tests. However, the fact that individual stock returns are characterized by skewness and kurtosis suggests convergence of the mean's sampling distribution to a normal distribution might be substantially slower than generally assumed. Our results suggest that even in the absence of event-induced variance, the Patell test is misspeciﬁed for samples of up to 5000 ﬁrms. In contrast, the BMP test is well-speciﬁed and only slightly less powerful. Consistent with previous research, we document that the misspeciﬁcation of the Patell test increases dramatically in the presence of event-induced variance, while the BMP test remains well-speciﬁed. Finally, we quantify the practical impact of this misspeciﬁcation by documenting substantial differences in the probability the null is truly false given that a test rejects it. By this measure, the degree of conﬁdence a researcher can have that a documented abnormal event return is truly signiﬁcant is invariably higher when using the BMP test. These ﬁndings are important because, as the default test statistic in Eventus, the Patell test is often reported. Fortunately, users of Eventus can just as easily conduct the BMP test by selecting STDCSECT as the desired test statistic. For researchers who do not use Eventus, a Matlab function that conducts event studies of the type studied in this article is available at the URL in footnote 2. Appendix A. The effect of outliers on t-statistics This appendix examines the effect one or more outliers will have on a t-test that a sample is drawn from a population with some hypothesized mean value. We begin with the case of a single outlier, and then proceed to discuss multiple outliers. 9 Any set of observations {wi}N i=1 will have a t-statistic equal to w−hypothesized value under the null w−hypothesized value under the null pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ s= N−1 ∑ðw −wÞ2 =NðN−1Þ

ðA1Þ

i

We model a single outlier by assuming the observation wN is replaced with wN + Δ, and to avoid confusion, we refer to this N distribution with an outlier as {xi}N i=1, where xi = wi for i b N and xi = wi + Δ for i = N. The t-statistic for {xi}i=1 can be written as x−hypothesized value under the null N qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ t fxi gi¼1 ¼ 2 2 ∑N i¼1 xi −Nx =½NðN−1Þ

ðA2Þ

Expressing this in terms of the original sample {wi}N i=1 and the outlier's parameter Δ gives us Δ w þ −hypothesized value under the null N N N ﬃ t fxi gi¼1 ¼ t Δ; fwi gi¼1 ¼ rhﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ i 2 2 2 ∑N−1 i¼1 wi þ ðwN þ ΔÞ −Nðw þ Δ=NÞ =½NðN−1Þ

ðA3Þ

We emphasize that we are not asserting that it is necessarily appropriate to use a standard t-test in the presence of such an outlier; we are merely determining what the t-statistic, as typically calculated, will be in such a case. We determine the effect of 9 Note that we are using standard set notation for the observations j and k, which are necessarily different whenever j ≠ k, but not the values of the jth and kth observations. In other words, x1 and x2 are necessarily different observations and so appear in {xi}N i=1 exactly once each, but they may have the same value, i.e., x1 may be 100, and so might x2.

340

J.M. Marks, J. Musumeci / Journal of Corporate Finance 45 (2017) 333–341

Table 7 Probability null is false given null is rejected. Panel A: α = 5%, N = 500, θ = 0, Δ = 0 or 0.25% Population % of null hypotheses that are false

10%

15%

20%

25%

30%

35%

40%

Patell BMP

51.31% 59.62%

62.60% 70.10%

70.34% 76.86%

75.97% 81.58%

80.26% 85.06%

83.63% 87.74%

86.35% 89.86%

Population % of null hypotheses that are false

10%

15%

20%

25%

30%

35%

40%

Patell BMP

29.01% 49.24%

39.35% 60.64%

47.90% 68.58%

55.07% 74.42%

61.18% 78.91%

66.44% 82.46%

71.03% 85.34%

Panel B: α = 5%, N = 500, θ = 1, Δ = 0 or 0.25%

This table uses Bayes Theorem to estimate the probability the null is truly false conditioned on either test's rejecting it. Panel A is based on the rejection rates from Tables 1 and 3, while Panel B is based on rejection rates from Tables 5 and 6.

the outlying parameter Δ by observing what happens if we take the limit of Eq. (A3) as Δ → ∞: Δ=N N limΔ→∞ t Δ; fxi gi¼1 ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ½Δ2 −N Δ=NÞ2 =NðN−1Þ Δ=N Δ=N ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ ¼1 Δ=N 2 Δ ðN−1Þ=N=NðN−1Þ

ðA4Þ

This is well below the standard benchmarks corresponding to α = 1%, α = 5%, or even α = 10%. Thus the common fear that a single extreme outlier (large value of Δ) would lead to rejections of a true null too frequently is unfounded. Large deviations Δ actually drive the t-statistic towards one, which is much smaller than the common benchmarks for statistical signiﬁcance. While the preceding discussion has focused on the fear of rejecting true null hypotheses too frequently, nowhere did we actually use the premise that the null hypothesis was true. What we have, then, is a proof that the t-statistic used to test any null hypothesis, true or false, will be driven towards 1 by a positive outlier.10 While the concern of excessive type I error is unwarranted, it is true that a single outlier will cause a loss of test power, i.e., lead to greater type II error. On the other hand, the presence of an extreme outlier increases the likelihood that the underlying population is characterized by heavy skewness or fat tails, either of which makes application of the CLT problematic unless the sample size is extremely large. What if there is more than one outlier? What if, for example, there are j observations that depend on Δ, but move with Δ at different rates kj? Now Eq. (A4) will become j ∑m¼1 km Δ=N j N limΔ→∞ t Δ; fkm gm¼1 ; fxi gi¼1 ¼ rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ h i 2 2 ∑km Δ −Nð∑ki Δ=NÞ2 =½NðN−1Þ j ∑m¼1 km Δ=N ﬃ ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 2 2 2 ∑km Δ −∑km Δ =N−∑m≠n km kn =N =½NðN−1Þ

¼

j ∑m¼1 km Δ=N qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 Δ=N ∑km ½N−1−∑m≠n km kn =½ðN−1Þ

j ∑m¼1 km ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 ∑km −∑m≠n km kn =ðN−1Þ

If the sample size N is large, then the second term under the radical is negligible and we have ∑ j km ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ¼ qm¼1 ∑k2m

ðA5Þ

This term is maximized when all the value of km are equal to each other, in which case four ðt ¼ p4ﬃﬃ4 ¼ 2Þ is the minimum number of outliers required for the t-statistic to exceed 1.96, and 7ðt ¼ p7ﬃﬃ7 ¼ 2:65Þ is the minimum number required for t to exceed 2.56. 10

Similarly, a single negative outlier will drive the t-statistic towards −1.

J.M. Marks, J. Musumeci / Journal of Corporate Finance 45 (2017) 333–341

341

References Andres, C., Hofbaur, U., 2017. Do what you did four quarters ago: trends and implications of quarterly dividends. J. Corp. Finan. 43, 139–158. Ang, J., Ismail, A., 2015. What premiums do target shareholders expect? Explaining negative returns upon offer announcements. J. Corp. Finan. 30, 245–256. Ball, R., Brown, P., 1968. An empirical evaluation of accounting income numbers. J. Account. Res. 6 (2), 159–178. Boehmer, E., Musumeci, J., Poulsen, A., 1991. Event study methodology under conditions of event-induced variance. J. Financ. Econ. 30, 253–272. Borokhovich, K., Boulton, T., Brunarski, K., Harman, Y., 2014. The incentives of grey directors: evidence from unexpected executive and board chair turnover. J. Corp. Finan. 28, 102–115. Box, G., Draper, N., 1987. Empirical Model-building and Response Surfaces. John Wiley & Sons. Brown, S.J., Warner, J.B., 1985. Using daily stock returns: the case of event studies. J. Financ. Econ. 14, 3–31. Campbell, J., Lo, A., MacKinlay, A., 1997. The Econometrics of Financial Markets. Princeton University Press. Chen, G., Kang, J.K., Kim, J.M., Na, H.S., 2014. Sources of value gains in minority equity investments by private equity funds: evidence from block share acquisitions. J. Corp. Finan. 29, 449–474. Corrado, C., 1989. A nonparametric test for abnormal security price performance. J. Financ. Econ. 23, 385–395. Cowan, A.R., 2007. Eventus®, Eventus User's Guide: Software Version 8.0, Standard Edition 2.1. Cowan Research L.C. David, T., Ginglinger, E., 2016. When cutting dividends is not bad news: the case of optional stock dividends. J. Corp. Finan. 40, 174–191. Dutordoir, M., Li, H., Liu, F.H., Verwijmeren, P., 2016. Convertible bond announcement effects: why is Japan different. J. Corp. Finan. 37, 76–92. Fama, E., 1965. The behavior of stock prices. J. Bus. 38 (1), 34–105. Fama, E., Fisher, L., Jensen, M., Roll, R., 1969. The adjustment of stock prices to new information. Int. Econ. Rev. 10, 1–21. Feller, W., 1968. An Introduction to Probability Theory and Its Applications. 1. John Wiley and Sons, New York. Harrington, S., Shrider, D., 2007. All events induce variance: analyzing abnormal returns when effects vary across firms. J. Financ. Quant. Anal. 42 (1), 229–256. Harris, L., 1986. Cross-sectional tests of the mixture of distributions hypothesis. J. Financ. Quant. Anal. 21 (1), 39–46. Kon, S., 1984. Models of stock return—a comparison. J. Financ. 39 (1), 147–165. Kothari, S.P., Warner, J., 2007. The econometrics of event studies. chapter in. In: Eckbo (Ed.), Handbook of Corporate Finance. 1. Elsevier. Mandelbrot, B., 1963. The variation of certain speculative prices. J. Bus. 36 (4), 394–419. Patell, J., 1976. Corporate forecasts of earnings per share and stock price behavior: empirical test. J. Account. Res. 14 (2), 246–276. Winkler, R., 1972. An Introduction to Bayesian Inference and Decision. Holt McDougal.

Misspecification in event studies

Misspecification in event studies

Recommend Documents