Journal
of Financial
Economics
28 (1990) 7-38.
North-Holland
Posterior, predictive, and utility-based approaches to testing the arbitrage pricing theory Robert McCulloch and Peter E. Rossi” University of Chicago, Chicago, IL 60437, USA Received
August
1989, final version
received
September
1990
To provide a framework for judging the economic significance of departures from the arbitrage pricing theory, we adopt a utility-based metric based on optimal portfolio choices. This measure is examined using both predictive and posterior analysis. Our predictive analysis shows very large and economically significant departures from the model restrictions. However, the high level of parameter uncertainty suggests that we cannot conclusively either affirm or reject the APT. Our conclusions differ markedly from other studies which employ traditional significance-testing procedures and, in many instances, fail to reject the APT restrictions.
1. Introduction
The introduction of the arbitrage pricing theory by Ross (1976) has inspired a large theoretical literature, extending Ross’s analysis, and a large empirical literature, attempting to test various empirical specifications of the theory. We examine a version of the APT and develop new methods for testing the restrictions implied by the theory. Our goal is to provide methods which extract more information from the sample data than the traditional *We are grateful to Nai-fu Chen, Eugene Fama, Wayne Ferson, Kenneth French, John Geweke, Michael Gibbons, Ravi Jagannathan, Bruce Lehmann, Craig Ma&inlay, Dale Poirier, Daniel Siegel, Robert Stambaugh, Arnold Zellner, and participants at workshops at the University of Chicago, Duke University, Northwestern University, University of Pennsylvania, University of Toronto, Research Triangle Econometrics workshop, and University of Western Ontario for many useful comments. We are especially grateful to Robert Korajczyk and Jay Shanken (the referee) for numerous suggestions for improvement of the paper. Phil Braun constructed the weekly returns database and Dhruv Ratra collected the Treasury bill quotes. Support from the Graduate School of Business, University of Chicago and IBM Corporation is gratefully acknowledged.
0304-405X/90/$03.50
0 1990-Elsevier
Science
Publishers
B.V. (North-Holland)
8
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
significance-testing approach. In addition, our methods are designed to assess departures from the equilibrium pricing restrictions in terms of direct utility gains rather than mere statistical significance. In the competitive equilibrium version of the APT advanced by Connor (1984) and Connor and Korajczyk (1986), excess returns on a collection of portfolios or individual assets are related to measured factors in a multivariate regression model without intercepts. In much of the finance literature, a likelihood-ratio approach to testing restrictions in multivariate systems has been employed. Huberman and Kandel (1985) and Gibbons, Ross, and Shanken (1989) derive the likelihood-ratio test statistics for problems similar to the sharp null we entertain. Connor and Korajczyk (1988a) use a modified likelihood-ratio test statistic for the same restriction we consider. An alternative to the classical significance-testing approach would be to compute the Bayesian posterior odds or posterior probabilities on the null and alternative hypotheses. Shanken (1987) computes odds ratios for the one-factor case. McCulloch and Rossi (1989b) compute posterior odds ratios for the same APT pricing restrictions considered here using a fully informative natural conjugate prior. Harvey and Zhou (1990) extend Jeffreys’ Cauchy/difl%se prior to the multivariate setting to test the same restrictions as Shanken (1987). However, without further analysis and interpretation, the Bayesian odds ratio does not shed light on the substantive significance of departures from the null hypothesis. For example, it is possible that the posterior probability of the null hypothesis may be very close to zero (resulting in a small odds ratio) even though the departures from the null hypothesis are of little economic import. Moreover, without fairly elaborate prior sensitivity anaIysis of the sort performed by McCuIloch and Rossi (1989b) it is difficult to interpret even the statistical content of odds-ratio results. The practical consequences of departures from the null hypotheses are of utmost importance in the testing of hypotheses. The traditional significancetesting approach attempts to assess the statistical significance of deviations from the null without addressing the issue of practical consequences. Gibbons, Ross, and Shanken (1987) interpret the likelihood-ratio statistic in terms of measures of relative portfolio efficiency in order to evaluate the economic significance of departures from the null hypotheses. Shanken (1987) relates the Bayes factor to measures of relative efficiency as well. To assess the practical consequences of deviations from null restrictions, it is necessary to consider the effects of those deviations on the decisions of investors. In this paper, we explore new utility-based measures of the extent of departures from the null hypothesis. Both the likelihood-ratio and odds-ratio approaches are designed to investigate the sharp null hypothesis that the APT restrictions hold exactly. In many versions of the APT, the linear pricing restrictions hold only approximately. With a finite collection of assets, one should reject the APT restric-
R. M&&och
and P.E. Rossi, Utility-based tests of the APT
9
tions with probability approaching one in large samples. In addition, the estimated factors can be viewed as only imperfect proxies for the underlying market factors. Shanken (1987b) analyzes the effect of imperfect proxies on classical tests of significance and Shanken (1987a) considers a composite null in the odds-ratio context. We use the Bayesian estimation approach in which the posterior distribution of functions of model parameters is examined to determine how much mass is placed close to the restrictions. The estimation approach is distinguished from the odds-ratio procedures by the fact that the prior in the estimation approach puts zero probability on any exact set of parameter restrictions. Any one testing procedure is not sufficient, in and of itself, to address an important hypothesis. We propose three basic methods of assessing departures from the null hypothesis. The first two methods are variants of the estimation approach, while the third is a new approach based on predictive comparisons. In section 4, the posterior distribution of the model parameters is computed and explored by a variety of graphical and analytical methods. We try to determine how much mass of the posterior probability distribution is close to the restrictions. In section 5, the posterior distribution of a utility-based metric is introduced to assess the economic significance of departures from the null model. Finally, in section 6, predictive distributions derived from the restricted and unrestricted models are compared on the basis of mean-variance efficient sets. Certainty-equivalent calculations are used to further summarize differences between the mean-variance frontiers associated with the restricted and unrestricted models. Our predictive analysis shows large and economically significant departures from the model restrictions. However, the extremely high level of parameter uncertainty suggests that we cannot either conclusively affirm or reject the APT even with long periods of weekly returns data. Our conclusions differ markedly from other studies which employ traditional significance-testing procedures and which conclude, in many instances, that it is not possible to reject the APT restrictions. The failure to reject the APT restrictions using significance-testing procedures is often interpreted as evidence in favor of the model. The utility-based method advanced here provides new measures of the economic significance of departures from model restrictions as well as a useful measure of the quantity of sample information. In addition, we show that there is no size effect, i.e., the size effect reverses and vanishes from period to period, and that the covariance structure of stock returns is dominated by, at most, three pervasive factors.
2. Theoretical
and econometric
specification
of the APT hypothesis
To clarify which one of the many empirical and theoretical variants of the APT is used in this paper, this section reviews the theoretical foundations of
10
R. McCullochand P.E. Rossi, U&y-bused testsof the APT
the competitive specification.
equilibrium version of the APT and develops the statistical
2.1. A review of the APT The fundamental idea behind the APT model is that payoffs to capital assets follow a factor structure in which a lower-dimensional set of fundamental factors is common to all assets. The original work on the APT [Ross (1976) and Huberman (1982)l uses a no-arbitrage axiom to deduce the relationship between expected returns and the matrix of factor sensitivities (denoted by B). Under conditions of no-arbitrage, asset prices (expected returns) are approximately linear in the B matrix. Chamberlain and Rothschild (1983) consider an explicitly infinite-dimensional economy and provide bounds on deviations from linearity and establish the relationship between mean-variance efficiency and exact factor pricing. We use Connor’s (1984) competitive equilibrium version of the APT. Connor considers an infinite-dimensional economy and computes the competitive equilibrium prices assuming that each investor is risk-averse and equally informed. Exact linear factor pricing holds in this economy under basic assumptions which allow investors to diversify away idiosyncratic risk. 2.2. Econometric speci$cation The basic APT model assumes that the returns on a vector of assets can be written as
E[ e,lft] = 0,
E[f,l
= 0, E[q-$l
= I/,
where rt is a N x 1 vector of returns, B is a N X k matrix of factor loadings, and f, is a k x 1 vector of factors. In order for the competitive equilibrium to hold, some restrictions are required on the sequence of B and Y matrices formed from increasing numbers of assets [see Connor and Korajczyk (1986) for details]. The competitive equilibrium version of the APT implies E[r,] = YOWL +&,, where rfr is the risk-free rate of return and L’= (1,. . . , 1). Combining the returns-generating model with the APT model equations, we can express excess returns as linear in the B matrix, r, - rftL =B(Y, +ft> + Ed.
R. McCulloch and P.E. Rossi, Utility-based tests qf the APT
We note that the risk premia, y*, can be time-varying. standard multivariate regression model notation, define
11
To write this in
Ft = rt - r ftl
and R
= [&...f,],
NXT
E=
,;,
= [h
+f1)
‘. *(YT +“&)I 7
[q...&,].
The model now takes the form R=BF+E.
If F is known (i.e., the number and composition of the factors and the risk premia are known), then the APT returns model takes the form of a restricted (no intercept) multivariate regression model. The state of empirical work on the APT has been summarized by Huberman (1987) as follows: ‘ . . . results support the APT except when it is tested against the alterna-
tive that small firms have higher mean returns than large firms even after differences in the factor loadings are accounted for.’ Huberman cites studies which find a small-firm anomaly and others which fail to find the anomaly. Recent studies by Lehmann and Modest (1988) and Connor and Korajczyk (1988a) report evidence of small-firm mispricing. In principle, the APT pricing restriction could be applied to any group of assets or portfolios. To focus on the small-firm anomaly and in order to allow the use of the multivariate normality assumption in the statistical model we construct ten size-ranked portfolios rather than using individual stocks. Portfolios of over 200 stocks are much more likely to have normally distributed returns because of the central limit theorem effect. For the sample of excess returns on the decile portfolios, a multivariate regression model is postulated, R=cu’+BF+E. R is a 10 X T matrix of excess returns on the decile portfolios. F is a k x T matrix of estimated factors extracted from the moment matrix of all listed
12
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
stocks. The APT pricing restrictions can be stated as a sharp null hypothesis: Hi: 3. Construction
ff = 0,
H,:
of portfolios
LY# 0.
and extraction
of factors
To test the zero-intercept restriction, we must construct sized-based portfolios and extract factors from information on all listed stocks. A variety of sampling intervals are employed in the empirical literature on the testing of the APT; Roll and Ross (1980) and Lehmann and Modest (1988) use daily data to estimate the factor loadings and weekly data to test implications of the theory, while Connor and Korajczyk (1988a) use monthly data. To exercise more control over the strategy used in the formation of the size-based portfolios, we constructed a weekly returns database from the Center for Research in Security Prices (CRSP) daily master files for all NYSE/AMEX stocks. The daily CRSP master file is used to compute simple returns from Wednesday to Wednesday (if Wednesday is not a trading day, Thursday prices are used). The sampling interval of one week represents a compromise between the daily and monthly intervals. We hope that the bid-ask bounce and asynchronous trading biases will not appreciably affect the weekly returns and that some information which is obscured in the monthly returns will show up in the weekly data. To insure the most accurate possible measure of market capitalization, we adjusted the CRSP outstanding shares data using the split data reported by CRSP. Our weekly database includes price, dividends, market capitalization, and simple return for each of 1304 weeks between January 1, 1963 and December 31, 1987 and for all listed NYSE/AMEX stocks. In order to compute returns in excess of the risk-free rates, weekly returns on Treasury bills are computed from quotations of T-bills in the W&l Street Journal. We use data compiled by Ferson, Kandel, and Stambaugh (1987). We extended the data from 1982 to 1987. The risk-free return is defined as the return on a T-bill maturing on the next Thursday whose price is observed on Wednesday. While this instrument is nominally riskless, there is clearly some inflation risk. 3.1. Portfolio-formation strategy The strategy for formation of the size-based portfolios must be feasible and involve manageable transactions costs. Every four weeks, firms are sorted by market value and placed into the decile portfolios. Portfolio returns are computed by value-weighting the member firms’ returns. Firms whose market value drops to zero automatically drop out of the portfolio. New firms are only allowed to enter the portfolio every four weeks. We ruled out weekly
13
R. McCutloch and P.E. Rossi, Utility-based tests of the APT
Table 1 Size-ranked
portfolio-composition statistics and unconditional moments AMEX stocks in the period 1963-87.
for all NYSE and
Portfolios are formed as a value-weighted average of weekly returns on size-ranked stocks. Every four weeks the composition of each size decile is altered to reflect changes in the market value. New listings are allowed into the portfolios only every four weeks. The weekly returns are constructed from the CRSP daily master file prices on all NYSE/AMEX stocks. All returns are computed in excess of the nominally riskless return on Treasury bills and annualized. Average number of firms
Average market value per firm (millions)
1 5 10
208 209 210
3.2 30.0 1,700.o
0.28 0.19 0.076
0.99 0.83 0.61
1 5 10
232 234 235
6.0 47.0 2,000.0
- 0.0015 - 0.016 0.032
1.8 1.4 0.93
1 5 10
256 260 261
5.1 38.0 2,000.0
0.035 0.017 - 0.058
1.8 1.5 1.3
1 5 10
231 235 236
7.4 85.0 2,900.O
0.16 0.10 0.019
1.3 1.3 1.2
1 5 10
223 235 236
9.0 110.0 5,100.o
0.011 0.0016 0.055
1.4 1.3 1.2
1 5 10
230 233 234
5.2 56.0 2,800.O
0.098 0.053 0.025
1.5 1.3 1.1
Period and decile 1963-67 Decile Decile Decile 1968-72 Decile Decile Decile 1973-77 Decile Decile Decile 1978-82 Decile Decile Decile 1983-87 Decile Decile Decile 1963-87 Decile Decile Decile
Excess Standard return deviation (annualized)
adjustment of the portfolio composition since it would involve weekly portfolio rebalancing and, possibly, a high level of transactions costs. Table 1 presents information on the composition of the size-based portfolios as well as some summary statistics. The firms in the smallest decile have an average market capitalization of $5 million while the firms in the largest decile average $2.8 billion. All returns are annualized throughout the paper. 3.2. Extraction of factors A variety of techniques have been employed to identify and extract factors from the covariance matrix of asset returns. Maximum-likelihood estimation of factor loadings is advocated by Roll and Ross (1980), Chen (1983), and
14
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
Lehmann and Modest (1988). Chamberlain and Rothschild prove that as the number of assets grows, principal-component analysis becomes equivalent to factor analysis. Connor and Korajczyk (1986) make the important observation that, if only the time series of returns on the factors is required, then the eigenvectors of the TX T matrix, R’R, will consistently estimate the time series of realized factor returns. Lehmann and Modest (1988) and Connor and Korajczyk (1988a) use information only from continuously-listed firms to extract factors. Observations from continuously-listed firms make up between 60 and 70 percent of the valid observations on any week during 5-year sample periods. However, only 473 firms are continuously listed over the 2.5-year period from 1963-1987. It is possible that factors which generate the covariance structure for continuously-listed firms are only a subset of the factors that drive all listed firms. The maximum possible amount of information should be used in extracting the factors so that the extraction error is minimized. Therefore, we adopt a method of factor extraction advocated by Connor and Korajczyk (1988b) which uses all valid observations in forming the T X T cross-product matrix. We use the factors extracted with the Connor and Korajczyk (1988b) procedure in our statistical analysis without accounting for error in extraction of the factors. Since the extraction error declines with the number of firms used, we cannot assume that the factor error is neghgible without using a very large number of firms. We employ, on average, over 2100 firms in the extraction process, over 1.5 times the number used by Connor and Korajczyk (1988a) and approximately 3 times the number used by Lehmann and Modest (1988). Specifically, our factors are based on returns from, on average, 2111, 2365, 2612, 2366, 2285, and 2348 firms in the periods 1963-67, 1968-72, 1973-77, 1978-83, 1983-87, and 1963-87, respectively. The simulation studies and analytical results of Connor and Korajczyk (1988a) confirm that factor-extraction error is very small and has almost no effect on the intercept estimates even in smaller samples than those we consider. McCulloch and Rossi (1989a) perform other checks on the magnitude of the factor-extraction error and also conclude that there is very little error. The Connor and Korajczyk (1988b) method of factor extraction presupposes that the number of factors is known, a priori. The eigenvalues of the return moment matrix provide an informal way to assess the number of factors. It is still possible that a factor pricing model might hold in which the factors account for a tiny fraction of the total return variability. However, this factor model would be very difficult to detect in the sample data. We computed the largest twenty eigenvalues for each of the subperiods. The eigenvalues drop dramatically from the first to the second eigenvalue. For example, in the 1963-1967 period the top ten eigenvalues are: 0.074,0.013,0.011,0.010,0.0091,0.0090,0.0082,0.0081,0.0079, 0.0076. Other than between the first and second eigenvalues, there are no clear break-
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
15
points. On the basis of this informal analysis, we limited attention to models with one, three, and five factors. We note that the first extracted factor has a correlation of at least 0.97 with the equal-weighted market. As we will see below, adding more than three factors does not change the performance of the asset pricing model or change the mean-variance set attainable from the set of factor portfolios. 3.3. Distributional assumptions In the basic linear factor model introduced above, three key statistical assumptions are used: 1) the residuals from the model are joint normally distributed, 2) the residuals are serially uncorrelated, and 3) the covariance structure of the residuals is constant over the sample period. Analysis of scatterplots and normal quantile-quantile plots of three-factor model residuals strongly supports the normality assumption. As we might expect, there are large differences between the autocorrelation estimates for the portfolio return series and the factor model residuals. The small-decile portfolio returns are highly autocorrelated, averaging 0.40 for the first-order autocorrelation. The autocorrelation monotonically declines from the first through tenth decile. However, the residuals from a three-factor model exhibit very little autocorrelation even in the smallest decile (all autocorrelations of three-factor model residuals are less than 0.19 in magnitude with an average magnitude of less than 0.1). Thus, even though the marginal distribution of the decile portfolios is fat-tailed and there is a high level of autocorrelation, the conditional distribution of the portfolios given extracted factors can very reasonably be assumed to be i.i.d. normal. 4. The estimation
approach: Direct examination
of the posterior distribution
In this section, we examine the posterior distribution of the model parameters to see how much of the mass of the posterior distribution of cy is piled close to zero. Posterior distribution of the intercepts As discussed in the introduction, Hi:
c-u=0
versus the alternative H,:
a#0
our goal is to test the hypothesis
16
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
in the multivariate regression model, R = CXL' + BF + E. One natural way of checking if the data supports this hypothesis is to examine the posterior distribution of the (Y vector to determine how much of its mass is close to zero. In the Bayesian approach, the posterior distribution of the model parameters is determined by combining the likelihood function for the model with a prior using Bayes theorem. We use the standard diffuse prior for the multivariate regression model [see Zellner (1971) for details]:
The marginal posterior distribution of cwis determined and X-r,
by integrating out B
The posterior distribution of the intercepts can easily be derived using known results for the multivariate regression model [see Zellner (1971) for details]. The vector of intercepts has a multivariate t form with T - k + 1 - 10 degrees of freedom. The marginal distributions of the intercepts corresponding to the 1st (smallest), 5th, and 10th decile are plotted in fig. 1. We use the boxplot as a graphical device to summarize information in each of the marginal Student-t densities. The center line in the boxplot is the median and the height of the box is the interquartile range. The ‘whiskers’ extending above and below the box delineate a 99 percent probability interval. The dots represent draws outside the 99 percent interval and serve to indicate the extreme tail behavior of the distributions. The top panel presents the distributions for one- and three-factor models in the 1963-67 period and the bottom panel shows the distributions for the 1983-87 period. A full set of plots for all five subperiods and all deciles can be found in McCulloch and Rossi (1989a). The boxplots for the 1963-67 period (top panel) display a small-firm effect in which the mass of the posterior distribution of the intercept for the smallest decile is in the positive range, while more than 75 percent of the mass of the distribution of the largest decile is in the negative range. However, in the 1983-87 period (bottom panel), the size effect is diminished (note the change is scale) and reversed, with the largest decile having an intercept with posterior mass concentrated on positive values. In both periods, some of the deciles have (Y distributions which are centered far from zero, suggesting that the APT restrictions do not hold. In all periods, there is a U-shaped pattern of posterior dispersion. The smallest and largest firms have high posterior dispersion while the middle
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
,963.6,
I
17
196367 Three
OneFactor
J
+
FaE(or~
. i
.
: ___._.... $2
. . +.._;
1
5
1 5
1 Decile
10
Decile
Portfolio
Panel B
10
Portfolio
198347 Three Factors
1983.87 One Factor .
1 Decile
Portfolio
5 Decile
10
Portfolio
Fig. 1. Marginal posterior distribution of regression intercepts, 1963-67 (N = 260) and 1983-87 (N = 260). The APT pricing restrictions imply that the intercepts in the multivariate regression model, r = 01+ Bf+ 8, E - MVN(O,Bf, are zero. Posterior distributions massed tightly around zero would be evidence consistent with the APT pricing restrictions. Boxplots display the marginal posterior distribution of the intercepts from the multivariate regression model. Each element of the (Y vector has a posterior distribution in the Student-t form. A diffuse prior is used. The center line on the boxplot is the median; the height of the box is the interquartile range. The ‘whiskers’ extending out from the box represent a 99 percent probability interval and the dots show tail observations. Panel A shows the distribution of the intercepts for the 1963-67 subperiod and decile 1, 5, and 10 portfolios, panel B for the 198347 period. The left side of each panel shows intercepts from a one-factor model and the right side for the three-factor model for decile 1, 5, and 10 portfolios. All returns are annualized excess returns over the nominally riskless Treasury bill rate.
deciles have much lower dispersion. This indicates that the amount of sample information about the small- and large-firm portfolios is less than the information pertaining to the middle-sized portfolios. By examining the posterior distribution of the intercepts, we are able to gain insights into the nature of the violation of APT restrictions. Mispricing is not confined to the smallest firms but also seems to be present in the large-firm portfolios. In fact, the ‘small-firm’ effect is illusory - present in
18
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
some periods, reversed in others, and absent from one period [this is the 1973-77 period, see McCulloch and Rossi (1989a) for details]. While exploration of the posterior distribution of (Y is very informative, it would be useful to summarize the posterior distribution with a scalar-valued measure which has economic significance. 5. Estimation
approach:
Posterior
distribution
of certainty equivalents
To develop an economic measure of how close (Yis to zero, we consider an investor with an explicit utility function. Given the model parameters (a, B, Z), the investor can calculate the joint distribution of the returns vector and choose a utility-maximizing portfolio. We compare this maximum utility with that obtained when the parameters equal (0, B, 2). The utility levels are compared by using the certainty-equivalent rate of return. Thus, we adopt a view consistent with the theoretical assumptions underlying the APT, i.e., that all economic agents actually know the parameters of the returns process (it is only the econometrician viewing the data who must make inferences regarding the parameters of the returns model). We examine the posterior distribution of the differences in certainty equivalents in order to gauge both the size of the departures from the theoretical restrictions and the level of parameter uncertainty. 5.1. The utility function We choose a negative exponential utility function and compute expected utility assuming that the end-of-period wealth is normally distributed. Some might argue that a power utility function might be more reasonable. Unfortunately, expected power utility does not always exist when the support of the distribution of wealth includes zero. Since returns are normally distributed, end-of-period wealth is also normally distributed and expected utility does not exist. Recognizing that wealth cannot be negative, we might have employed truncated normal distributions, but we decided to opt for the mathematical convenience of the exponential utility function. The negative exponential utility function over end-of-period wealth, x, takes the form, u(x) = - exp(c,x) with c, negative. Expected utility can be computed if X is normally distributed,
EWP.,,~,) = -ew(c,px +&f/2). The investor’s problem is to maximize expected utility by choosing optimal portfolio weights. The set of assets used in forming portfolios consists of the risk-free asset and the ten decile portfolios. We do not consider other
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
19
portfolios or assets because we wish to focus attention on the mispricing of the set of size-based portfolios. Let r denote the 10 x 1 random vector of excess returns on the set of risky assets available to the investor and rp denote the excess return on a portfolio formed by taking a linear combination of the Y vector. The investor’s object is to maximize the end-of-period wealth, x = (1 + rf + rp)W. Here rf is the risk-free rate, rp is the random portfolio return with E(r,) = p, var(r,) = u2, and W is the initial wealth. Expected utility can then be expressed in terms of the return mean and standard deviation as EU(~,~T) = -exp(cJV(l
+ rf+ p) + c,2W2c2/2).
Expected utility is maximized subject to the constraint that the mean and standard deviation of the portfolio with excess return rp lie on the efficient set formed by the line passing through the origin and tangent to the mean-variance efficient set formed by portfolios of the risky assets alone. This problem can easily be solved by first calculating the slope of this tangent line and then maximizing utility on this line. If the excess return vector r has moments, E(r) = y and cov(r) = V, then the (p, a> pairs on the efficient frontier composed from risky assets are given by
a*= (a -
2bp + cp’)/d,
where a=y’V-‘y,
b=y’V-‘b,
c=L’V-~L,
d=ac-b2.
L is the one vector [see Roll (1977) for a summary of efficient-set mathematics]. The mean excess return of the tangency portfolio, denoted pCLt, is given by a/b. The variance of the tangency portfolio, a?, is (a - 2bp., + cp:)/d. Therefore, the slope of the tangency line is P/O; (denoted S). The expected utility problem of the investor now becomes
maxEU(p,,a)
(FL,(T)
= -exp(c,W(l
+rf+p)
+c~W2a2/2),
subject to /_l=aS. The solution to this problem (p*, (T*) is given by P* = Sa* and V* = -S/ (c,W). The certainty-equivalent rate of return, ce, is that certain rate of return which the investor with optimal portfolio r,* is indifferent to, i.e., EU(ce, 01 = EU(p*, a*>, ce = rf + /** + (c,W~*~)/2.
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
20
To calculate certainty equivalents, we must choose reasonable values for the initial wealth W and c,. W is set to $50,000 for the 1978-82 period. For other periods we adjust the wealth figure using the GNP deflator from the Citibase tape, obtaining $23,033, $26,538, $34,335, and $73,317 for the periods 1963-67, 1968-72, 1973-77, and 1983-87, respectively. c, is set equal to - 2/W, - 8/W, and - 15/W to capture a range of risk-aversion behavior. The meaning of the choice c, can be illustrated by calculating the p such that the investor with utility function U(n) = - exp(c,x) is indifferent between the lottery which gives him $60,000 with probability p and $40,000 with probability (1 -p) and the lottery which gives him $50,000 for sure. The value of p which solves pU(60000) + (1 -p)lJ(40000) = U(50000) is 0.95 for c, = - 15/W, 0.83 for c, = -8/W, and 0.6 for c, = -2/W. The probabilities 0.95, 0.83, and 0.6 are associated with a reasonable range of relative risk-aversion values. The choice c, = - 8/W makes the relative risk aversion 8 at W which is consistent with cross-sectional studies of asset holdings reported in Friend and Blume (1977). It should be noted that the static portfolio problem considered here is very different from the dynamic consumption-investment problem. Values of the relative risk-aversion parameter which may be reasonable for the static problem are not necessarily consistent with reasonable solutions to the dynamic problem. 5.2. Return moments and risk-free rate With exponential utility and the assumption of normality, we need only the mean and variance-covariance matrix of the excess return vector along with the risk-free rate to completely specify the investor’s problem. Our model specifies that the distribution of the excess return vector r given the factor vector f is rla,B,Z-MVN(a+Bf,Z). If E(f by
1= 17 and vat-(f) = @, then the unconditional moments of y=E(r)
=a+Bv,
l/=var(r)
r are given
=_X +B@B’.
The sample moments of the estimated factors for the relevant period are used for 7 and @. In general the choice of unconditioning moments 77and @ can affect the resulting maximized utility. We have found, however, that our results are not affected by changing these values. Certainly if the restriction a = 0 holds, our measure should be small for any reasonable unconditioning strategy.
R. McCufloch and P.E. Rossi, Utili~-based tests of the APT
21
The risk-free rate used in the utility maximization problems is the sample average over the period under study. 5.3. The certainty-equivalent measure Given the parameters (a, B, _I$>[and the factor moments (q, @>I,risk-free rate rf, and the utility parameters (co, IV>, we can then compute the certainty equivalents with and without cx in the model. The certainty equivalent, which is denoted by ce(Ly,B, 21, is a measure of the utility derived from the optimal portfolio given LY,B, and 2. We write the certainty equivalent as ce(a, B, 2) to emphasize the dependence of the certainty equivalent on the parameters of the conditional distribution of r given f. Our measure of how far (Y is away from zero is then ce(a,B,Z)
-ce(O,B,S).
We annualize the difference in certainty equivalents. For example, if the difference is 0.001 percent, then we could conclude that for all practical purposes the restriction (Y= 0 holds. On the other hand if the difference is 10 percent, then clearly imposing the restriction is an action of serious consequence. In the previous section, we considered the marginal posterior distribution of CYgiven the data, model, and noninformative prior. In this section, we compute the marginal posterior distribution of the certainty-equivalent difference measure. The certainty-equivalent measure is a nonlinear function of the model parameters, ((Y,B, Xc). We simulate the posterior distribution of the difference in certainty equivalents by drawing repeatedly from the joint posterior distribution of ((u, B, 2). We follow the strategy first outlined by Geweke (1988) for drawing from the joint posterior. With the general diffuse prior, it is possible to draw a value of (CX,B, 2) for which there is no solution to the utility-maximization problem. If the mean excess return of the global minimum-variance portfolio (b/c) is less than zero, then there does not exist a solution to the utility maximization problem. To insure the solution exists, we modify the diffuse prior to limit its support to the region of the parameter space for which b/c > 0, P(~,B,X:)
a IXl- (N+1)‘21(~,B,Z;~,@),
where I(cy, B,Zlv,@)
= 1
if
b/c > 0,
= 0
otherwise.
This prior depends on the data only through the sample moments of f (7 the vector of means and @ the covariance matrix). Since the regression model
22
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
conditions on F, this does not present any problems for implementing a Bayesian posterior analysis. We can implement this prior in the simulation strategy simply by calculating b/c for each draw from the posterior of (cu, B, 2) based on the diffuse prior and throwing out the draw if b/c s 0. We note that only in the period 1973-77 are an appreciable number of draws thrown out. 5.4. Interpreting the posterior distribution of the difference in cc’s In the standard parameterization of linear statistical models, the location and scale are independent of each other. However, the difference in certainty equivalents is a nonlinear function of the fundamental regression-model parameters, (Y, B, and Z. A consequence of this nonlinearity is that the location and scale of the posterior distribution are linked. In the case of the certainty-equivalent measure, an increase in the dispersion of the distribution will be accompanied by a corresponding shift in the median. Most analayses of the distribution of nonlinear functions of random variables are conducted by resorting to asymptotic normal approximations in which the location and scale are independent. Since we provide a method for computing the exact posterior distribution, some care must be exercised in interpreting the posterior distribution. In order to illustrate the behavior of the posterior distribution of the certainty-equivalent measure, four illustrative sample data sets, comprised of a matrix of portfolio returns (R) and corresponding factor values (F), are constructed. The four cases are: (i) a small amount of sample information from a population model in which QI# 0, (ii) a small amount of sample information from a population model in which cx= 0, (iii) a large amount of sample information from a model with a # 0, and (iv) a large amount of sample information from a model with (Y= 0. The simulations are calibrated with data from the 1963-67 sample period with one factor. We use the maximum-likelihood estimates aZ, 8, and $ and the realized independent variable values to generate simulated data for each of the four cases. In the model, R = a~’ + BF + E, we set (Y= & or 0, B = B”, and draw an error vector from the MVN(0, $1 distribution. For the cases in which (Y# 0, the vector of estimates, 4, was used. The cases with a ‘small’ amount of sample information correspond to five years of weekly simulated data (T = 260). The cases with a large amount of sample information were produced by duplicating the F values for the 1963-67 period 15 times which corresponds to 75 years of weekly data (T = 3900). Using the generated data for each of the four cases, we compute the posterior distribution of the difference in certainty equivalents. Fig. 2 presents the posterior distributions for the four illustrative cases. For the case of a small amount of sample information, the posterior distributions
23
R. h4cCulloch and P.E. Rossi, Utility-based tests of the APT
-
I Small N (260) U#O
Small N (260) a-0
Large
N (3900)
Large
rr+o
Fig, 2. Bayesian assessment of APT pricing restrictions, four illustrative
N (3900) a-0
examples.
Simulated data is used to explore the effect of the quantity of sample information on a utility-based measure of the departures from APT pricing restrictions. The measure of violation of the model restrictions is defined by the utility-maximizing portfolio-choice problem, solved for factor models with and without the APT pricing restrictions. The utility levels for the restricted and unrestricted solutions are compared by computing the certainty-equivalent return. To assess the quantity of sample information available to discriminate between the restricted and unrestricted models, the posterior distribution of the differences is computed by simulation methods. Data is simulated for four basic examples: 1) small sample, APT restrictions do not hold ((w # 0, N = 260 weeks), 2) small sample, APT restrictions hold ((Y = O), 3) large sample, APT restrictions do not hold (N = 3,900 weeks or 75 years), and 4) large sample, APT restrictions hold. We can only conclude that the sample information supports the APT in the case in which the posterior distribution of differences in certainty equivalents is massed close to zero. If the posterior distribution is highly diffuse, then we must conclude that the sample information is insufficient to form any inference about the APT restrictions. The certainty-equivalent measure, cetn, B, P), is associated with the parameters (a, B, 8) from the multivariate regression model for the vector of excess returns on the portfolios under investigation, r = (Y + Bf + e, E - MVN(0, 2). Certainty equivalents are calculated from the negative exponential utility function, U(x) = - exp(c,x). ce = rr + CL*+ ~,W(a*~)/2. (p*, a*) solve the utility maximization over all sets of risky assets consistmg of the size-based portfolios with return vector r, and the risk-free asset. E(r) = (Y+ Bn, var(r) = P + B@B’. @ is the covariance matrix of the extracted factors, f. 250 draws are made from the posterior distribution of ((Y, B, 8) using the method of Geweke (1988). A diffuse prior with support on the parameter values which insure that the global minimum-variance portfolio has expected return above the risk-free rate is used.
24
R. McCulloch and F.E. Rossi, Utility-based tests of the APT
with (Y= 0 and (Yf 0 are very similar and exhibit very high dispersion. The posterior distribution in both cases puts mass on values of the certaintyequivalent differences very close to zero as well as values of over 40 percent per year. It is important to realize that with a high degree of parameter uncertainty CT = 260 with 65 or more parameters) the certainty-equivalent measure displays a high level of dispersion and a large median value due to the nonlinearity. In the case of large T, the posterior distribution for (Y# 0 is tight around a median value of over 10 percent per year. For (Y= 0 and T large, the posterior distribution puts most of its mass very close to zero. It is important to note that even with large T a dependence between the median and dispersion of the posterior distribution is still found. The case of T Iarge and a = 0 has a posterior distribution which is roughly symmetric but not centered over zero. The difference in certainty equivalents is an example of a random variable with unbounded support (the difference can be negative and frequently takes on negative values in the simulations) and a roughly symmetric distribution whose location and scale are positively linked. To summarize, a simple inspection of the posterior distribution of the difference in certainty equivalent provides information about both the economic significance of departures from the model as well as a measure of the quantity of sample information relevant to assessing these departures. If the posterior distribution is tight around zero, we can conclude that the data supports the APT restrictions. If the posterior distribution is tight around an economically meaningful nonzero value, then we may conclude that the evidence is against the APT. If the posterior distribution is very diffuse, then the only way to interpret this is that the sample evidence is inconclusive. 5.6. The results
Fig. 3 presents boxplots of the simulated posterior distribution for c, = -8/W. There are 12 boxplots, one pair for each of the six periods. The left boxplot in each pair corresponds to the one-factor model and the right to the three-factor model. For example, the first two boxplots are based on draws from the posterior distribution of the certainty-equivalent measure using the data from the 1963-67 five-year period with one and then three factors. The results are based on 500 draws. All of the return values have been annualized for ease of interpretation. The boxplots indicate clearly: (i) For the five-year periods there is considerable uncertainty; the 99 percent probability intervals cover the range of certainty-equivalent differences between 0 and over 50 percent. This enormous range indicates that the sample data is not very informative. (ii) Adding factors to the one-factor model does not shift the mass of the posterior distribution much toward zero. (iii) The certainty-equivalent difference based on the entire sample, 1963-87, has a much tighter posterior and has mass closer to 0.
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
.
: .
.
i
. :
.
!
:
i j
L-II
i
I
1963-67
1968-72 3
Factors
Fig. 3. Bayesian
assessment
1 Factors
-
I : \
. . I
i
: .
i
I
*
1
25
i
! 1978-82
1973-77 3
1 Factors
3
1
1983-87 3
Factors
of APT pricing restrictions, posterior certainty equivalents.
1
1963-87 3
Factors
distribution
1
3
Factors
of differences
in
The utility-maximizing portfolio-choice problem is solved for factor models with and without the APT pricing restrictions. The utility levels for the restricted and unrestricted solutions are compared by computing the certainty-equivalent return. To assess the quantity of sample information available to discriminate between the restricted and unrestricted models, the posterior distribution of the differences is computed by simulation methods. This figure presents boxplots of the posterior distribution of difference in certainty equivalents. The center line on the boxplot is the median; the height of the box is the interquartile range. If the posterior distribution of the difference is tightly massed around zero, we may conclude that the evidence in the data favors the APT restrictions. However, if the posterior distribution is highly diffuse (high variance regardless of the location), then we must conclude that the data are not informative with the respect to economically meaningful departures from the APT hypothesis. The certainty-equivalent measure, ~(a, B, P), is associated with the parameters ((Y, B, S) from the multivariate regression model for the vector of excess returns on the portfolios under investigation, r = a + Bf+ .s, E N MVN(0, 2). Certainty equivalents are calculated from the negative exponential utility function, U(X) = - exp(c,x). ce = rI + IL* + c,W(a**)/2. (p*, o*) solve the utility maximization over all sets of risky assets consisting of the size-based portfolios with return vector r, and the risk-free asset. E(r) = (Y+ Bn, var(r) = _I$+ B@PB’. @ is the covariance matrix of the extracted factors, f. 250 draws are made from the posterior distribution of (a, B, 8) using the method of Geweke (1988). A diffuse prior with support on the parameter values which insure that the global minimum-variance portfolio has expected return above the risk-free rate is used.
Table 2 Bayesian
assessment
of APT pricing restrictions, utility-curvature all NYSE and AMEX stocks in the period
parameter, 1963-87.
c, = -8.0/W,”
for
The utility-maximizing portfolio-choice problem is solved for factor models with and without the APT pricing restrictions. The utility levels for the restricted and unrestricted solutions are compared by computing the certainty-equivalent return. To assess the quantity of sample information available to discriminate between the restricted and unrestricted models, the posterior distribution of the differences is computed by simulation methods. If the posterior distribution of the difference is tightly massed around zero, we may conclude that the evidence in the data favors the APT restrictions. However, if the posterior distribution is highly diffuse (high variance regardless of the location), then we must conclude that the data are not informative with respect to economically meaningful departures from the APT hypothesis. This table reports the posterior distribution of the differences in certainty equivalents, ce(cu, B, 2) - ce(0, B, 2). The certainty-equivalent measure, CC((Y,B, 8), is associated with the parameters (a, B, Z) from the multivariate regression model for the vector of excess returns on the portfolios under investigation, r = a + &” + E, E - MVN(0, 8). Certainty equivalents are calculated from the negative exponential utility function, U(x) = - exp(c,x). ce = rf + p* + cpW(&)/2. (CL*,a*) solve the utility maximization over all sets of risky assets consisting of the size-based portfolios with return vector r, and the risk-free asset. E(r)=rr + Bq, var(r>= P + B@B’. @ is the covariance matrix of the extracted factors, f. 250 draws are made from the posterior distribution of (a, B, 8) using the method of Geweke (1988). A diffuse prior with support on the parameter values which insure that the global minimum-variance portfolio has expected return above the risk-free rate is used. Period and factorsb 1963-67 1 factor 3 factors 5 factors 1968-72 1 factor 3 factors 5 factors 1973-77 1 factor 3 factors 5 factors 1978-82 1 factor 3 factors 5 factors 1983-87 1 factor 3 factors 5 factors 1963-87 1 factor 3 factors 5 factors
Median
75% probability intervalC
90% probability intervalC
0.20 0.17 0.18
(0.11, 0.31) (0.098, 0.28) (0.095, 0.27)
(0.082, 0.38) (0.063, 0.31) (0.063, 0.31)
0.27 0.23 0.24
(0.18, 0.43) (0.12, 0.37) (0.14,0.38)
(0.15, 0.51) (0.091, 0.41) (0.093, 0.44)
0.13 0.11 0.12
(0.079,0.23) (0.051, 0.20) (0.054, 0.19)
(0.057, 0.28) (0.042, 0.23) (0.041, 0.21)
0.23 0.22 0.18
(0.12, 0.37) (0.11, 0.32) (0.10, 0.31)
(0.085, 0.44) (0.086, 0.37) (0.077, 0.36)
0.20 0.17 0.18
(0.13, 0.30) (0.096, 0.31) (0.090, 0.28)
(0.085, 0.38) (0.063, 0.34) (0.062, 0.32)
0.040 0.037 0.039
(0.018, 0.065) (0.019, 0.061) (0.021, 0.064)
(0.014, 0.076) (0.012, 0.070) (0.016,0.072)
aThe level of wealth of the representative investor, W, is $23,003, $26,538, $34,335, $50,000, $73.319. and $50.000 for the neriods 1963-67. 1968-72, 1973-77. 1978-82, 1983-87, and 1963-S;. ’ bFactors are extracted using the principal-components method of Connor and Korajczyk (1986). “A (1 - ~1% Bayesian probability interval is that interval which puts o/2 probability mass in the tails of the posterior distribution. Note that these intervals are not symmetric about the median due to the skewness in the posterior distribution.
Table 3 Bayesian
assessment
of APT pricing restrictions, utility-curvature all NYSE and AMEX stocks in the period
parameter, 1963-87.
c, = - 2.0/W,a
for
The utility-maximizing portfolio-choice problem is solved for factor models with and without the APT pricing restrictions. The utility levels for the restricted and unrestricted solutions are compared by computing the certainty-equivalent return. To assess the quantity of sample information available to discriminate between the restricted and unrestricted models, the posterior distribution of the differences is computed by simulation methods. If the posterior distribution of the difference is tightly massed around zero, we may conclude that the evidence in the data favors the APT restrictions. However, if the posterior distribution is highly diffuse (high variance regardless of the location), then we must conclude that the data are not informative with respect to economically meaningful departures from the APT hypothesis. This table reports the posterior distribution of the differences in certainty equivalents, ce(ol, B, 8) - ce(0, B, 2). The certainty-equivalent measure, ce(cu, B, 8), is associated with the parameters (a, B, Z) from the multivariate regression model for the vector of excess returns on the portfolios under investigation, I = LY+ Bf + E, E - MVN(0, Z). Certainty equivalents are calculated from the negative exponential utility function, U(x) = -exp(c,x). ce = r[+,g* + c,W(u**)/2. (p*,cr*) solve the utility maximization over all sets of risky assets conslstmg of the size-based portfolios with return vector r, and the risk-free asset. E(r) = (Y+ BT, var(r) = JZ + B@B’. @ is the covariance matrix of the extracted factors, f. 250 draws are made from the posterior distribution of ((Y, B, S) using the method of Geweke (1988). A diffuse prior with support on the parameter values which insure that the global minimum-variance portfolio has expected return above the risk-free rate is used. Period and factorsb 1963-67 1 factor 3 factors 5 factors 1968-72 1 factor 3 factors 5 factors 1973-77 1 factor 3 factors 5 factors 1978-82 1 factor 3 factors 5 factors 1983-87 1 factor 3 factors 5 factors 1963-87 1 factor 3 factors 5 factors
Median
75% probability intervalC
90% probability intervalC
0.84 0.75 0.68
(0.40, 1.3) (0.43, 1.1) (0.36, 1.1)
(0.21, 1.6) (0.31, 1.3) (0.23, 1.3)
1.2 0.94 0.95
(0.76, 1.8) , (0.51, 1.5) (0.47, 1.4)
(0.60, 2.0) (0.36, 1.8) (0.39, 1.7)
0.54 0.48 0.48
(0.27, 0.86) (0.25, 0.79) (0.20, 0.78)
(0.20, 1.0) (0.18, 0.84) (0.14, 0.88)
0.90 0.78 0.65
(0.50, 1.3) (0.50, 1.2) (0.39, 1.1)
(0.37, 1.5) (0.38, 1.4) (0.28, 1.4)
0.81 0.72 0.67
(0.50, 1.3) (0.35, 1.1) (0.34, 1.1)
(0.35, 1.5) (0.26, 1.3) (0.25, 1.3)
0.16 0.15 0.14
(0.088, 0.27) (0.071, 0.25) (0.073, 0.24)
(0.056, 0.32) (0.054, 0.29) (0.045, 0.30)
aThe level of wealth of the representative investor, W, is $23,003, $26,538, $34,335, $50,000, $73,319, and $50,000 for the periods 1963-67, 1968-72, 1973-77, 1978-82, 1983-87. and 1963-87. bFactors are extracted using the principal components methods of Connor and Korajczyk (1986). “A (1 - (Y)% Bayesian probability interval is that interval which puts a/2 probability mass in the tails of the posterior distribution. Note that these intervals are not symmetric about the median due to the skewness in the posterior distribution.
Table 4 Bayesian assessment of APT pricing restrictions, utility-curvature parameter, for all NYSE and AMEX stocks in the period 1963-87.
c, = - 15.0/W,
The utility-maximizing portfolio-choice problem is solved for factor models with and without the APT pricing restrictions. The utility levels for the restricted and unrestricted solutions are compared by computing the certainty-equivalent return. To assess the quantity of sample information available to discriminate between the restricted and unrestricted models, the posterior distribution of the differences is computed by simulation methods. If the posterior distribution of the difference is tightly massed around zero, we may conclude that the evidence in the data favors the APT restrictions. However, if the posterior distribution is highly diffuse (high variance regardless of the location), then we must conclude that the data are not informative with respect to economically meaningful departures from the APT hypothesis. This table reports the posterior distribution of the differences in certainty equivalents, ce(a, B, 2) - ce(0, B, S). The certainty-equivalent measure, ce(a, B, 81, is associated with the parameters (LY,B, 2) from the multivariate regression model for the vector of excess returns on the portfolios under investigation, r = 01+ Bf + E, E - MVN(0, 2). Certainty equivalents are calculated from the negative exponential utility function, U(x) = - exp(c,x). ce = rf+, l.~*+ c,W(o*‘)/2. (I*, solve the utility maximization over all sets of risky assets conststmg of the size-based portfolios with return vector r, and the risk-free asset. E(r) = (Y+ BT, var(r> = B + B@B’. @ is the covariance matrix of the extracted factors, f. 250 draws are made from the posterior distribution of (LY,B, 2) using the method of Geweke (1988). A diffuse prior with support on the parameter values which insure that the global minimum-variance portfolio has expected return above the risk-free rate is used. Period and factorsb 1963-67 1 factor 3 factors 5 factors 1968-72 1 factor 3 factors 5 factors 1973-77 1 factor 3 factors 5 factors 1978-82 1 factor 3 factors 5 factors 1983-87 1 factor 3 factors 5 factors 1963-87 1 factor 3 factors 5 factors
Median
75% probability intervalC
90% probability intervalC
0.11 0.091 0.096
(0.060, 0.17) (0.055, 0.16) (0.050, 0.15)
(0.040, 0.23) (0.040, 0.18) (0.034, 0.18)
0.16 0.13 0.13
(0.098, 0.24) (0.066, 0.19) (0.065, 0.19)
(0.074, 0.28) (0.038, 0.22) (0.045, 0.22)
0.072 0.061 0.058
(0.034, 0.13) (0.033, 0.10) (0.030, 0.10)
(0.025, 0.17) (0.026, 0.14) (0.023, 0.11)
0.12 0.11 0.10
(0.07, 0.18) (0.061, 0.16) (0.053, 0.16)
(0.057, 0.22) (0.046, 0.20) (0.034, 0.19)
0.11 0.089 0.084
(0.063, 0.15) (0.043, 0.15) (0.046, 0.15)
(0.043,0.18) (0.032, 0.17) (0.029, 0.18)
0.021 0.021 0.020
(0.011, 0.035) (0.0095,0.031) (0.011, 0.036)
(0.0071, 0.040) (0.0067, 0.039) (0.0067, 0.042)
aThe level of wealth of the representative investor, W, is $23,003, $26,538, $34,335, $50,000, $73,319, and $50,000 for the periods 1963-67, 1968-72, 1973-77, 1978-82, 1983-87, and 1963-87. bFactors are extracted using the principal components method of Connor and Korajczyk (1986). “A (1 - cu)% Bayesian probability interval is that interval which puts o/2 probability mass in the tails of the posterior distribution. Note that these intervals are not symmetric about the median due to the skewness in the posterior distribution.
R. McCulloch and P. E. Rossi, Utility-based tests of the APT
29
Table 2 presents posterior probability intervals for the c, = - 8/W case. For example in the period 1963-67 for the one-factor model, the certaintyequivalent difference is between 0.11 and 0.31 with probability 0.75 and between 0.082 and 0.38 with probability 0.9. Tables 3 and 4 display the differences in certainty equivalents for c, = -2/W and - 15/W, respectively. The effect of changing the risk-aversion parameter is simply to change the overall level of the differences in certainty equivalents without changing the patterns of these differences. For low levels of risk aversion, the difference in certainty equivalents is magnified since the investors choose very risky optimal portfolio positions. Examination of the posterior distributions reveals that there is a high degree of parameter uncertainty. This paucity of sample information makes it difficult to either affirm or reject the APT restrictions even with long periods of weekly data. We can conclude that there is some evidence against the APT in the 1968-72 period, but that the evidence in all other periods is neither in favor of nor against the model restrictions. Furthermore, addition of more than one factor does not alter these conclusions in favor of or against the APT. 5.7. Comments on alternative approaches In classical tests of APT restrictions [c.f. Connor and Korajczyk (1988a) and Lehmann and Modest (198S>], the likelihood-ratio tests fail in many instances to reject the null hypothesis of APT restrictions. This, of course, does not mean that the sample evidence favors the APT restrictions; it could be that there is insufficient sample information. An analysis of the power function is required to assess the quantity of sample information. In this case, the power function is easily computable; the problem is that the power against likely alternatives must be assessed. Some evaluate the power function using the unrestricted MLEs and use this as a gauge of power in the meaningful range. Analysis of the power function does not explicitly address the problem of economic significance. An alternative to the likelihood-ratio test is to construct confidence intervals for the relevant model parameters. The confidence interval gives a measure of the quantity of sample information. Shanken (1990) emphasizes the advantages of reporting confidence intervals for various asset pricing parameters rather than simply reporting p-values for tests of significance. In our case, it would be difficult to construct confidence intervals for the difference in certainty-equivalents distribution without resorting to asymptotic methods or a parametric bootstrap approach. There is no natural pivot for the distribution of certainty-equivalent differences. In the Bayesian odds-ratio approach to testing APT restrictions, it is hard to interpret the results of odds-ratio calculations without a careful prior
30
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
sensitivity analysis. As pointed out by McCulloch and Rossi (1989b), the prior sensitivity analysis provides a measure of the extent of parameter uncertainty. 6. Testing restrictions
by predictive
comparisons
In the previous section, we compared the distribution of the vector of returns r given (a, B, 2:) and given (0, B, 2) using a certainty-equivalent measure to characterize the restricted and unrestricted returns distribution. This procedure is designed to determine how the model restrictions affect the optimal portfolio choice of investors who know the true parameters. It is usually assumed in derivations of the APT restrictions that all investors know the true model parameters. This does not rule out the possibility that a group of investors who control a very small fraction of total wealth have less than perfect knowledge of the model parameters. In this section, we explore how the investor with imperfect knowledge of the model parameters would be affected by imposing the APT pricing restrictions. The rational investor would compute the predictive distribution of portfolio returns and base his optimal portfolio choice on this predictive distribution. We compute these predictive distributions and compare the decisions of an investor who is uncertain about (Y with an investor who is willing to assume that (Y= 0. If CYis indeed zero and we have enough data, then the actions of the two investors will be the same because the data will convince the first investor to act as if CY= 0. 6.1. Bayesian predictive distributions Solution to the full decision-theoretic approach to portfolio choice requires that we average over the possible values for the parameters, weighting the possible values according to the posterior distribution. The decision-maker combines his theoretical knowledge of the form of the model and the APT restrictions with the information in the data about the model parameters to make portfolio choices. We are attempting to assess the effect of imposing the model restrictions on the optimal portfolio choices of the Bayesian decision-maker. Let p(cz, B, XIII) be the posterior obtained from the data D, the unrestricted model and the diffuse prior. The Bayesian predictive distribution for the vector of excess returns from the unrestricted model, r, is then
p(rlf, 0) = ldrlf, (Y,B,~)P((Y,B,~ID)~~~Z~B. The predictive distribution for the vector of excess returns from the restricted
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
31
model, rO, is then
After integrating out the parameters, we then must uncondition on f to obtain the unconditional predictive distribution of excess returns. We present the results for a general regression model R = B*F* + E. The specific results are then obtained by adding an intercept column to F* and a row of intercept parameters to B*. Let R be a N X T matrix of T observations of N-vectors and F* be a j X T matrix of T observations of j regressors (one of which could be the intercept term). Using results from Geisser (1965) and Zellner (1971), we can show that the predictive density of Y= B*f* + E is a multivariate Student-t distribution with T-j + 1 - m degrees of freedom, mean vector E(rlf> = g*f, and covariance matrix cov(rlf) = (1 + q(f))S/ (T-jl-N)*q(f)=f’(F*F*‘)-‘f. F or example, if (Yis in our model with k factors, then the degrees of freedom for r is T - (k + 1) + 1 -N. In the restricted model with cy= 0, we have T - k + 1 - N degrees of freedom. Because the random vectors r and r, are high-dimensional, it is not obvious how to compare their distributions. A natural method of comparison in this context is to compute the mean-variance efficient frontiers based on the predictive distributions. Two distributions of excess return vectors are considered to be similar if the corresponding efficient frontiers have the same practical implications. Parameter uncertainty is incorporated directly into the frontiers based on the predictive distribution of excess returns. As pointed out by Bawa, Brown, and Klein (19781, frontiers based on estimated parameters do not account for the uncertainty in the parameter estimation. We note also that only the first two moments of the predictive distribution are required for calculation of the efficient frontiers. To calculate the unconditional moments of the predictive distributions, we let E(f*) = 17 and cov(f*> = @ as introduced in section 5. Then E(r)
= E[E(r(f)]
= E(B^*f) =l?*q
and cov(r)
= E[cov(rlf)]
+ cov(E(rlf))
=(l+E(q(f))S/(T-j-l-N)+fi*@J*‘. Recall that the B* matrix includes the intercept vector which may be set to zero. Finally, note that E(q(f))
=trace((F*F*‘)-‘@)
+v’(F*F*‘)-‘77.
Using these unconditional moments, we can construct efficient frontiers using the standard efficient-set mathematics.
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
32
Panel A
Panel B
J.
I.0
).O
0
0
Standard Deviation/Yr Fig. 4. Efficient
frontiers
Standard DeviationiYr
based on the predictive distribution for all NYSE and AMEX the period 1963-67 (N = 260).
stocks in
Efficient frontiers based on the unconditional distribution of the vector of portfolio returns calculated from the multivariate regression model, r = a + Bf + E, E _ MVN(0, 2). The frontier marked with the boxes corresponds to the unrestricted model, the other frontier to the restricted (01 = 0) model. If the AFT restrictions hold and there is a large quantity of sample information, the restricted and unrestricted frontiers should overlap. Panel A shows frontiers based on a one-factor model and panel B gives the frontiers from a three-factor model. All returns are annualized excess returns over the nominally riskless Treasury bill rate.
6.2. Eficient frontiers
Fig. 4 presents the efficient frontiers based on the predictive distributions for the five-year period 1963-67 and fig. 5 presents these results for the whole sample period, 1963-87. The left-hand panels of figs. 4 and 5 present the restricted and unrestricted frontiers for the one-factor model and the right-hand panels give the frontiers corresponding to a three-factor model. In both panels, the frontiers marked with boxes are derived from the unconstrained model. All of the frontiers are plotted on the same scale and are expressed in terms of annualized returns. In order to keep the graphs from becoming too cluttered, we omit the line passing through the origin tangent to the efficient set which represents the attainable set of mean-standard deviation combinations when the risk-free asset is included. In all cases there appears to be large differences between the constrained and the unconstrained frontiers indicating that the data does not support the hypothesis Ly= 0. Addition of factors to the one-factor model does not appreciably alter the efficient frontiers except for the period 1968-72 [see McCulloch and Rossi (1989a) for details]. The frontiers from the five-factor model are virtually indistinguishable from the three-factor model in all periods.
33
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
Panel
Panel B
A
,'7
0.0
0.5
1.0
1.5
2.0
2.5 0
l7
Standard Deviation/X!
3.0
Standard Deviation/Yr
Fig. 5. Efficient frontiers based on the predictive distribution for NYSE and AMEX stocks in the period 1963-87 (N = 1304). Efficient frontiers based on the unconditional distribution of the vector of portfolio returns calculated from the multivariate regression model, r = (Y + Ef + E, E - MVN(0, 2). The frontier marked with the boxes corresponds to the unrestricted model, the other frontier to the restricted (a = 0) model. If the APT restrictions hold and there is a large quantity of sample information, the restricted and unrestricted frontiers should overlap. Panel A shows frontiers based on a one-factor model and panel B gives the frontiers from a three-factor model. All returns are annualized excess returns over the nominally riskless Treasury bill rate.
The shape of the frontiers changes dramatically from period to period, supporting the contention of many analysts that the model is not stationary over the entire 2%year period. 4.3. Certainty-equiualent comparison of frontiers While the plots of the previous section provide a great deal of information, it would be useful to have a measure of the difference between frontiers. We can use the certainty equivalents developed in section 5 to assess the discrepancies between frontiers. Recall that, given excess return vector r with mean E(r) = y and var(r> = V, we can compute the optimal expected utility-maximizing portfolio and the associate certainty equivalent. The formulas for y and V in both the unrestricted and restricted case are given in the section on Bayesian predictive distributions above. As in section 5, W is wealth and c, is the parameter for the exponential utility function. Table 5 gives the differences in certainty equivalents between the frontiers corresponding to both the unrestricted and the restricted Bayesian predictive distributions. The three cases c, = -2/W, c, = -S/W, and c, = - 15/W are considered for one-, three-, and five-factor models. For example, in
34
R. McCulloch and P.E. Rossi, Utility-based tests of the APT Table 5
Predictive
assessment
of the APT pricing restrictions for all NYSE period 1963-87.
and AMEX
stocks
in the
Bayesian predictive distributions of the returns on size-ranked portfolios for restricted and unrestricted factor models are compared by using certainty-equivalent measures. If the data are supportive of the APT restrictions, then the restricted and unrestricted predictive distribution will produce similar utility maximization solutions and the difference in certainty equivalents will be smal!. Bayesian predictive distributions take into account uncertainty in the estimation of the mean and variance of returns. The unconditional Bayesian predictive distribution of the vector of excess returns on the size-based portfolios is calculated based on the multivariate regression model, r = a + Ef + E, E - MVN(O,8). r is the vector of excess returns on ten size-ranked portfolios observed weekly. A Jeffreys/Geisser diffuse prior is used. The certainty equivalents are based on the negative exponential utility function, U(x) = -exp(c,x).
Period and factor? 1963-67 1 factor 3 factors 5 factors 1968-72 1 factor 3 factors 5 factors 1973-77 1 factor 3 factors 5 factors 1978-82 1 factor 3 factors 5 factors 1983-87 1 factor 3 factors 5 factors 1963-87 1 factor 3 factors 5 factors
Certainty-equivalents c, = -8/Wb
c, = -2/Wb
difference c, = - 15/Wb
0.09 0.078 0.079
0.37 0.31 0.31
0.049 0.041 0.042
0.18 0.14 0.14
0.72 0.56 0.56
0.096 0.75 0.75
_c _c _c
_c _= _c
_= _= _=
0.12 0.11 0.090
0.49 0.44 0.36
0.065 0.059 0.048
0.091 0.067 0.066
0.37 0.27 0.27
0.049 0.036 0.035
0.020 0.020 0.019
0.080 0.079 0.077
0.011 0.011 0.010
aFactors are extracted using the principal components method of Connor and Korajczyk (1986). bW is $23,003, $26,538, $34,335, $50,000, $73,319, and $50,000 for the periods 1963-67, 1968-72. 1973-77. 1978-82. 1983-87, and 1963-87. “The expected return on the global’minimum-variance portfolio is below the risk-free rate. We are unable to compute certainty equivalents here.
R. McCulloch
and P.E. Rossi, Utility-based
tests of the APT
35
period 1963-67 with five factors and c, = -8/W, the difference in certainty equivalents is 7.9 percent per year. It is also of interest to compare the frontier pictures and the corresponding certainty equivalents for the two periods 1963-67 and 1963-87. The 1963-67 frontiers seem to be much closer together than the frontiers for 1963-87. The certainty equivalents tell exactly the opposite story. The reason is that the relatively favorable risk-return tradeoff in 1963-67 causes the optimal portfolio tangency point to have a very high mean which would be plotted out of the plot frame where the frontiers have diverged. This is reflected in the very high certainty equivalents. Conversely, the 1963-87 frontiers offer a less favorable risk-return tradeoff than in the 1963-67 period. The optimal portfolio has a much lower expected return level which is at a point where the frontiers are closer together. This example illustrates the important point that the results of the test are inevitably dependent on the measure used to assess the difference between the value of (Y indicated by the data and the hypothesized value. The displayed frontiers allow an informal assessment of the difference. We may be able to conclude that the frontiers are so similar or so far apart that the same conclusion will be reached for a variety of utility choices, but, in general, different conclusions will follow from different utility choices. There is an important sense in which the predictive comparisons in this section should be regarded as point estimates of what the data conveys while the posterior-estimation approach characterizes the precision of sample information. If p(rlcu, B, _Z:)is the unconditional distribution of excess returns given the model parameters, the Bayesian predictive distribution is computed by averaging this distribution with respect to the posterior distribution of model parameters,
or p(rlD)
= E[ P( rla, B,
Z)] .
Since parameter uncertainty is incorporated directly into the predictive distribution, it is difficult to study the influence of parameter uncertainty on the certainty-equivalent calculations. In addition, the diffuse prior employed forces the predictive distribution to be centered around the least squares fitted or predicted value. Parameter uncertainty is only reflected in the higher moments of the predictive distribution. We conclude that on the basis of examination of the predictive frontiers that the evidence in the data is unfavorable to the APT pricing restrictions. In light of the high parameter uncertainty discussed in section 5, we should not regard the predictive evidence as strong evidence against the APT.
36
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
Furthermore, we conclude that adding more than one factor to the model does not reduce the observed mispricing. 7. Conclusions This paper develops new methods for determining the nature and extent of departures from a version of the arbitrage pricing theory. Our methods provide a simple way of assessing the strength of the evidence in the data in support of the pricing restrictions. Perhaps most importantly, we provide methods for assessing the economic as opposed to statistical significance of the departures from the model restrictions. 7.1. Methodological conclusions If a hypothesis is to be considered with care, the investigator must consuh more than just one scalar summary of the data. We have provided both an estimation and a predictive approach to evaluating a hypothesis. The estimation approach is most useful in gauging the level of parameter uncertainty, while the predictive approach fits more naturally into the decision-theoretic problem faced by economic agents. As is widely known, tests of hypotheses cannot be conducted on statistical grounds alone. An assessment of the substantive, i.e., economic, implications of hypotheses is crucial to any successful investigation of the data. In the case of the arbitrage pricing theory, we have shown how to make assessments of the degree to which the data support the theory using simple utility-based methods. In addition, our methods allow us to easily gauge the quantity of sample information relevant to the asset pricing hypothesis. While we have confined our discussion in the paper to the Connor (1984) equilibrium version of the APT, it is clear that the methods outlined in the paper have much broader applicability. As Shanken (1982, 198.5) has noted, the assumption in the Connor (1984) model that the market portfolio is well-diversified or spanned by the factors is the same condition behind tests of the multi-beta capital-asset pricing model. Thus, we can use the methods outlined here to test the restrictions implied by the CAPM and other variants of the APT. In a general sense, the idea of using a utility-based metric to compare restricted and unrestricted models can be applied to virtually any asset pricing theory. 7.2. Substantive conclusions The estimation approach used in section 5 shows that there are deciles with intercepts different from zero with high posterior probability, and that large as well as small deciles have nonzero intercepts. By looking at features
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
37
of the joint distribution of (Y, we obtain information not available in any one-dimensional summary. In order to provide an economic basis for judging the significance of departures from model assumptions, we adopt a utility-based metric. The optimal portfolio choices made by an investor with and without the intercept restriction are compared by calculating certainty equivalents. The posterior distribution of the difference between the certainty equivalents from the restricted and unrestricted models is simulated. The dispersion of the posterior distribution of the difference in certainty equivalents provides a direct measure of the effect of parameter uncertainty on the assessment of the pricing-model restrictions. Simulation of the posterior distribution of the difference in certainty equivalents confirms that there is considerable parameter uncertainty. The predictive comparisons of section 6 are an effective way to gauge the practical importance of model restrictions. The basic idea of comparing predictive distributions with and without model restrictions can be used in any context. In this particular application, the use of efficient frontiers and utility comparisons provides a natural means of comparison which makes the method particularly fruitful. Examination of the variation in predictive frontiers and certainty equivalents from subperiod to subperiod can shed new light on the nature of the nonstationarity in the returns distribution. The shifts in predictive frontiers suggest that there is considerable time variation in the mean and covariance structure of the portfolio returns. Moreover, this time variation has economically important consequences. In summary, the predictive analysis shows very large and economically significant departures from the model restrictions. However, the extremely high level of parameter uncertainty suggests that we cannot either conclusively affirm or reject the APT even with long periods of weekly returns data. Furthermore, the addition of more than one factor does not affect our measures of the economic significance of departures from the APT hypothesis. Our conclusions differ markedly from other studies which employ traditional significance-testing procedures and which conclude, in many instances, that it is not possible to reject the APT restrictions. The utility-based method advanced here provides new measures of the economic significance of departures from model restrictions as well as a useful measure of the quantity of sample information. References Bawa, Vijay, Stephen Brown, and Roger Klein, 1978, Estimation risk and optimal portfolio choice (North-Holland, Amsterdam). Chamberlain, Gary and Michael Rothschild, 1983, Arbitrage, factor structure, and mean variance analysis on large asset markets, Econometrica 51, 1281-1304.
38
R. McCulloch and P.E. Rossi, Utility-based tests of the APT
Chen, Nai-fu, 1983, Some empirical tests of the theory of arbitrage pricing, Journal of Finance 38, 1393-1414. Connor, Greg, 1984, A unified beta theory, Journal of Economic Theory 34, 13-31. Connor. Gree. and Robert Koraiczvk, 1986, Performance measurement with the arbitrage with the Arbitrage pricing theory: A new framework for analysis, Journal of Financial Economics 15,373-394. Connor, Greg and Robert Korajczyk, 1988a, Risk and return in an equilibrium APT: Application of a new test methodology, Journal of Financial Economics 21,255-289. Connor, Greg and Robert Korajczyk, 1988b, Estimating economic factors with missing observations, Working paper (Kellogg School, Northwestern University, Evanston, IL). Ferson, Wayne, Shmuel Kandel, and Robert Stambaugh, 1987, Tests of asset pricing with time-varying expected risk premiums and market betas, Journal of Finance 42, 201-220. Friend, Irwin and Michael Blume, 1977, The demand for risky assets, in: Irwin Friend and John Bicklser, eds., Risk and return in finance, Vol. I (Ballinger, Cambridge, MA) 101-141. Geisser, Seymour, 1965, Bayesian estimation in multivariate analysis, Annals of Mathematical Statistics 36, 150-159. Geweke, John, 1988, Antithetic acceleration of Monte Carlo integration in Bayesian inference, Journal of Econometrics 38, 73-91. Gibbons, Michael, Stephen Ross, and Jay Shanken, 1989, A test of efficiency of a given portfolio, Econometrica 57, 1121-1152. Harvey, Campbell and Guofu Zhou, 1990, Bayesian inference in asset pricing tests, Journal of Financial Economics 26, 221-254. Huberman, Gur, 1982, A simple approach to arbitrage pricing, Journal of Economic Theory 28, 183-191. Huberman, Gur, 1987, A review of the APT, in: John Eatwell, Murray Milgate, and Peter Newman. eds., The New Palmarave: A dictionary of economics (MacMillan Press Limited, London)‘1066110. Huberman, Gur and Shmuel Kandel, 1985, Likelihood ratio tests of asset pricing and mutual fund separation, Working paper (University of Chicago, Chicago, IL). Lehmann, Bruce and David Modest, 1988, The empirical foundations of the arbitrage pricing theory, Journal of Financial Economics 21, 213-254. McCulloch, Robert and Peter E. Rossi, 1989a, Posterior, predictive, and utility-based approaches to testing the arbitrage pricing theory, Working paper 269 (Center for Research in Securities Prices, University of Chicago, Chicago, IL). McCulloch, Robert and Peter E. Rossi, 1989b, A Bayesian approach to testing the arbitrage pricing theory, Journal of Econometrics 49, 141-168. Roll. Richard. 1977. A critiaue of the asset pricing theorv’s tests - Part 1: On past and potential testability of the theoty,Journal of Financial Economics 4, 129-176. Roll, Richard and Stephen Ross, 1980, An empirical investigation of the arbitrage pricing theory, Journal of Finance 39, 347-350. Ross, Stephen, 1976, The arbitrage theory of capital asset pricing, Journal of Economic Theory 13, 341-360. Shanken, Jay, 1982, The arbitrage pricing theory: Is it testable?, Journal of Finance 37, 1129-1140. Shanken, Jay, 1985, Multi-beta CAPM or equilibrium APT?: A reply, Journal of Finance 45, 1189-1196. Shanken, Jay, 1987a, A Bayesian approach to testing portfolio efficiency, Journal of Financial Economics 19, 217-244. Shanken, Jay, 1987b, Multivariate proxies and asset pricing relations: Living with the Roll critique, Journal of Financial Economics 18, 91-110. Shanken, Jay, 1990, Intertemporal asset pricing: An empirical investigation, Journal of Econometrics 45, 99-121. Zellner, Arnold, 1971, An introduction to Bayesian inference in econometrics (Wiley, New York, NY).