Bayesian portfolio selection using a multifactor model

Bayesian portfolio selection using a multifactor model

International Journal of Forecasting 25 (2009) 550–566 www.elsevier.com/locate/ijforecast Bayesian portfolio selection using a multifactor model Tomo...

3MB Sizes 0 Downloads 126 Views

International Journal of Forecasting 25 (2009) 550–566 www.elsevier.com/locate/ijforecast

Bayesian portfolio selection using a multifactor model Tomohiro Ando Graduate School of Business Administration, Keio University, 2-1-1 Hiyoshi-HonchoKohoku-ku, 223-8523 Yokohama-shi, Kanagawa, Japan

Abstract This article develops a new portfolio selection method using Bayesian theory. The proposed method accounts for the uncertainties in estimation parameters and the model specification itself, both of which are ignored by the standard meanvariance method. The critical issue in constructing an appropriate predictive distribution for asset returns is evaluating the goodness of individual factors and models. This problem is investigated from a statistical point of view; we propose using the Bayesian predictive information criterion. Two Bayesian methods and the standard mean-variance method are compared through Monte Carlo simulations and in a real financial data set. The Bayesian methods perform very well compared to the standard mean-variance method. c 2009 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

Keywords: Bayesian methods; Decision making; Finance; Model selection

1. Introduction A portfolio allocates wealth among various available assets. The standard mean-variance method of portfolio selection, pioneered by Markowitz (1952), has long attracted the attention of financial economics researchers and practitioners. In this context, an investor simply allocates wealth among m assets with weights w = (w1 , . . . , wm )0 , over a oneperiod investment horizon. The optimal portfolio w is determined by solving the following problem: maximize w0 µ −

γ 0 w Σ w, 2

s.t. w0 1 = 1,

E-mail address: [email protected].

where µ and Σ are respectively the mean vector and covariance matrix of asset returns, γ is the investor’s risk-aversion parameter, and 1 is a column vector of ones. The investor can estimate µ and Σ using the maximum likelihood method, Bayesian methods, the generalized method of moments (Hansen, 1982) or other approaches. Unfortunately, it has been pointed out that a portfolio so constructed tends to consist of extreme positions that change dramatically over time (Greyserman, Jones, & Strawderman, 2006). As the mean-variance portfolio selection method ignores the estimation risk, the Bayesian method has lately received considerable attention. This approach addresses several challenges in solving the portfolio choice problem: parameter uncertainties in a given

c 2009 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. 0169-2070/$ - see front matter doi:10.1016/j.ijforecast.2009.01.005

551

T. Ando / International Journal of Forecasting 25 (2009) 550–566

model, and uncertainties inherent in the choice of model itself. Parameter uncertainty has been dealt with by Bawa, Brown, and Klein (1979), Brown (1976), Dumas and Jacquillat (1990), Frost and Savarino (1986), Jobson and Korkie (1980) and Jorion (1986), and, among others. All of these authors work with some variation of the empirical Bayes approach. Using dynamic Bayesian forecasting models, Aguilar and West (2000), Carvalho and West (2007), Putnam and Quintana (1994), Quintana (1992) and Quintana and Putnam (1996) implemented the portfolio selection problem on this basis. Zellner and Chetty (1965) structured the portfolio problem and indicated how to solve it using Bayesian methods. They explained how to get around the “nuisance parameter” problem by using the predictive density to compute the expected utility. The utility is then maximized to obtain the optimal portfolio. The problem of model uncertainty has been dealt with by Nakatsuma (2006), Pastor (2000) and Pastor and Stambaugh (2000) in terms of Bayesian model averaging. This method takes into account uncertainties that arise from the model specification itself. Nowadays, finance researchers solve the portfolio choice problem with models far more sophisticated than the single-period mean-variance framework of Markowitz. They may work with more sophisticated return processes, for example, or incorporate learning into portfolio choices. Xia (2001) investigated the impact of parameter uncertainty and estimation risk on the dynamic asset allocations of long-horizon investors. Polson and Tew (2000) also discussed some aspects of the Bayesian portfolio selection problem. In Bayesian portfolio selection, we may have to optimize several parameters within the asset return prediction model. In the empirical Bayes framework, we first select a set of adequate factors within the predictive distribution; these parameters are then employed to optimize the portfolio selection, given investor preferences. In other words, we construct a multifactor model by finding those factors that contribute most significantly to the asset return prediction. When we implement the Bayesian model averaging method, we also need to select a set of useful models to be incorporated into the asset return prediction. The so-called Occam’s window approach

(Madigan & Raftery, 1994) averages over a set of good models, but the proper size of the window is still unclear. Taken together, these issues are regarded as aspects of the statistical model selection problem. To evaluate the goodness of asset return models, we propose the Bayesian predictive information criterion (Ando, 2007, 2008). This criterion gives useful advice regarding the factors and the models to which we should pay the most attention. One of this article’s contributions to the finance literature is the introduction of a new Bayesian portfolio selection method. With regard to the econometrics literature, this article proposes the Bayesian predictive information criterion as a means of choosing among competing asset pricing models. Bayesian portfolio selection problems are also investigated from a forecasting point of view. This article is organized as follows. In Section 2, we briefly review the classical, empirical Bayes and Bayesian model averaging methods for portfolio selection. Section 3 describes the Bayesian inference for multifactor models. In Section 4, we present the Bayesian predictive information criterion. Sections 5 and 6 conduct Monte Carlo simulations and analyze real data to investigate the performance of the proposed method. Our conclusions are given in Section 7. 2. Preliminaries: The empirical Bayes and Bayesian model averaging methods for portfolio selection It is useful to begin with a brief summary of classical portfolio selection approaches. Consider an investor who allocates their wealth among m assets with weights w = (w1 , . . . , wm )0 , over a one-period investment horizon. Let Wt denote the investor’s wealth at time t. Their wealth one period later is then Wt+1 = Wt (1 + w0 rt+1 ),

w0 1 = 1,

where rt+1 = (r1,t+1 , . . . , rm,t+1 ) is an mdimensional vector of random variables that stand for the individual asset returns. The investor determines the allocation weights w by maximizing the expected utility of the next-period portfolio: Z E t [u(Wt+1 )] = u(Wt+1 )g(rt+1 )drt+1 ,

552

T. Ando / International Journal of Forecasting 25 (2009) 550–566

where the expectation E t [·] is based on conditional information known at time t. u(·) is the investor’s utility function. A common choice is the exponential utility function, u(W ) = − exp{−γ W }, where γ is the investor’s absolute risk-aversion parameter. g(rt+1 ), the probability density function of rt+1 , is unknown. We instead use the parametric model f (rt+1 |θ ), which is conditional on the parameter vector θ . This is an approximation of the true model g(rt+1 ). In general, for dependent observations, the parametric model has a form like f (rt+1 |θ , rt , rt−1 , . . .). It is assumed that the asset return rt+1 follows a multivariate normal distribution with the means µ and a covariance matrix Σ (i.e., f (rt+1 |θ ) = N (µ, Σ ), θ = (µ0 , vech(Σ ))0 ). Maximization of the expected utility problem then produces the well-known portfolio selection model of Markowitz (1952). Since the true parameters of the future returns distribution are not known, reasonable estimates of these parameters are needed. Generally, the investor estimates µ and Σ by implementing the maximum likelihood method on a historical information sequence ending at time t. Denoting these ˆ and Σˆ , we see that the actual problem estimates by µ that the investor solves is γ ˆ − w0 Σˆ w, s.t. w0 1 = 1. maximize w0 µ 2 It is well known that the solution of Markowitz ˆ and Σˆ suffers from an estimation (1952) based on µ risk. The empirical Bayes method provides a valuable solution: it integrates the estimation risk into the mean-variance analysis. Several researchers have employed empirical Bayes methods to estimate the probability density of rt+1 . An extensive survey of early work is given by Bawa et al. (1979). In the empirical Bayes framework, the investor determines w by maximizing the expected utility of the next-period portfolio: UBayes (Wt+1 ) Z Z = u(Wt+1 ) f (rt+1 |θ )π(θ|D)drt+1 dθ Z  Z = u(Wt+1 ) f (rt+1 |θ )π(θ |D)dθ drt+1 Z = u(Wt+1 )q(rt+1 |D)drt+1 ,

where π(θ |D) is the posterior density of θ given D, and D is the information sequence used to calculate estimators for µ and Σ . Integrating f (rt+1 |θ ) over the parameter space θ ∼ π(θ|D) results in the predictive distribution for future asset returns q: Z q(rt+1 |D) = f (rt+1 |θ )π(θ |D)dθ . In the empirical Bayes framework, the expected utility is thus maximized using the predictive distribution of the asset returns. The predictive distribution estimation problem will be dealt with in Section 3. Although this method takes into account the uncertainty of model parameters, the uncertainty due to the model specification itself remains to be considered. The Bayesian model averaging method was introduced to address this issue. Consider a universe of J models M1 , . . . , M J . Let f (rt+1 |θ j , M j ), π(θ j |D, M j ) and π(M j |D) denote the probability density function of rt+1 , the posterior density of θ j , and the posterior probability of model M j , respectively. In the Bayesian model averaging framework, the investor determines w by maximizing the expected utility of the next-period portfolio: UBMA (Wt+1 ) =

Z Z

X J u(Wt+1 ) f (rt+1 |θ j , M j ) j=1

 × π(θ j |D, M j )π(M j |D) drt+1 dθ j ( Z Z J X = u(Wt+1 ) π(M j |D) f (rt+1 |θ j , M j ) j=1

) × π(θ j |D, M j )dθ j drt+1 Z =

( ) J X u(Wt+1 ) π(M j |D)q(rt+1 |D, M j ) drt+1 , j=1

where q(rt+1 |D, M j ) is the predictive distribution of model M j . As well as maximizing UBayes (Wt+1 ), the problem now involves constructing the predictive distribution in UBMA (Wt+1 ). In the next section, procedures for constructing the Bayesian predictive distributions are illustrated.

553

T. Ando / International Journal of Forecasting 25 (2009) 550–566

3. Bayesian inference for multifactor models 3.1. Empirical Bayes approach Suppose that the independent observations D = {r1 , . . . , rn } are generated from an unknown true distribution G(r) having the probability density g(r). In practical situations, it is difficult to obtain precise information on the true distribution from a finite number of observations. We therefore use a parametric model as an approximation. One of the most famous models used in finance to determine rates of return is the Capital Asset Pricing Model (Sharpe, 1964). However, it is recognized that this model cannot capture the returns of many investment opportunities. As an alternative, the multifactor model (e.g. Fama and French (1993)) plays an important role in empirical financial research. Suppose that asset returns follow the multifactor model: rt = α + Γ 0 ft + εt ,

t = 1, . . . , n,

(1)

)0

where ft = ( f 1t , . . . , f pt is a p-dimensional vector of factors, and ε t = (ε1t , . . . , εmt )0 is an mdimensional noise vector (normal, with mean 0 and variance Σ ). The vector α = (α1 , . . . , αm )0 and matrix Γ = (β 1 , . . . , β m ) consist of unknown parameters; and β j = (β j1 , . . . , β j p )0 is the p-dimensional vector of factor loadings. The model can be expressed in matrix form: R = X B + E, where R = (r1 , . . . , rn )0 , X = (1n , F), F = (f1 , . . . , fn )0 , B = (α, Γ 0 )0 , and E = (ε1 , . . . , εn )0 with εt ∼ N (0, Σ ). The likelihood function is then L(D|θ ) =

n Y

f (rt |ft , B, Σ )

t=1 n Y = {det(2π Σ )}−1/2

where θ = (vec(B), vech(Σ ))0 contains all of the unknown model parameters. (The vectorization operator vec converts the matrix B into a column vector by stacking its columns on top of one another; the half-vectorization operator vech does the same with only the lower triangular part of Σ .) Assuming π(θ ) = π(B|Σ )π(Σ ), we use a matricvariate normal model and an inverted Wishart model for the prior distributions: p+1

m

π(B|B0 , Σ , A) ∝ |Σ |− 2 |A|− 2  o 1 n −1 0 −1 × exp − tr Σ (B − B0 ) A (B − B0 ) , 2 ν0

π(Σ |Λ0 , ν0 ) =

|Λ0 | 2 mν0 2

ν0  2

|Σ |−

ν0 +m+1 2

Γm   1  −1 × exp − tr Λ0 Σ , 2 2

with m ≥ ν0 , |Σ | > 0. Here Λ0 , A and B0 are m × m, ( p+1)×( p+1) and ( p+1)×m matrices, respectively. The matrix B0 specifies the prior mean of B, and the matrix A adjusts the strength of prior information. Based on some broader economic perspective, one might adjust A to specify that a prior is particularly informative by putting small values into A. On the other hand, one could also weaken or entirely remove the influence of prior information. In this paper, the strength of prior information is set to a rather low level. The posterior probability of Σ , π(Σ |D), is modeled using the inverted Wishart distribution W (Σ |S + Λ0 , n + ν0 ). That of B given Σ , π(B|Σ , D), is modeled using the matricvariate ¯ Σ , X 0 X + A−1 ) (Rossi, normal distribution N (B| B, Allenby, & Mcculloch, 2005). The posterior means of Σ and B given Σ are (S + Λ0 )/(ν0 + n − m − 1) and B¯ respectively. S and B¯ are defined below: ¯ 0 (R − X B) ¯ + ( B¯ − B0 )0 A−1 ( B¯ − B0 ), S = (R − X B)   ¯ Γ¯ 0 )0 . B¯ = (X 0 X + A−1 )−1 X X 0 Bˆ + A−1 B0 := (α,

t=1

  1 × exp − (rt − α − Γ 0 ft )0 Σ −1 (rt − α − Γ 0 ft ) 2 = (2π )−nm/2 |Σ |−n/2  o 1 n × exp − tr Σ −1 (R − X B)0 (R − X B) , 2

Finally, Bˆ = (X 0 X )−1 X 0 R is the maximum likelihood estimate. It is well known that the predictive distribution ∗ , Σ ∗ , ν ∗ ), is the multivariate t-distribution Mt ( B¯ 0 ft+1 ∗ 0 0 with ft+1 = (1, ft+1 ) :

554

T. Ando / International Journal of Forecasting 25 (2009) 550–566

f (rt+1 |ft+1 , D) Z = f (rt+1 |ft+1 , B, Σ )π(B|Σ , D)π(Σ |D)dBdΣ  n Γ ν∗ +m ∗ − 12 2 = |Σ | 1 + (rt+1 − α¯ − Γ¯ 0 ft+1 )0  m ν∗ 2 Γ 2 π o− ν∗ +m 2 , × Σ ∗ −1 (rt+1 − α¯ − Γ¯ 0 ft+1 ) where ν∗ = n + ν0 − m + 1 and Σ ∗ft+1 =

0 (X 0 X 1 + ft+1

+

A−1 )−1 f

n + ν0 − m + 1

Note that the prior density setting basically depends on the prior beliefs of decision-makers. If the decisionmaker prefers to keep the model as simple as possible, we simply make B0 and L 0 constant parameters. Remark 2. One can construct an optimal portfolio based on the empirical Bayes estimator ˆ Bayes |ft+1 = α¯ + Γ¯ ft+1 , µ Σˆ Bayes |ft+1 =

t+1

(S + Λ0 ).

The problem that remains is deciding which combination of factors the investor should focus on. This issue will be investigated from the Bayesian predictive point of view in Section 4. 3.2. Remarks on the empirical Bayes method Remark 1. Although the assumed independency of observations and unimodality in Eq. (1) cannot plausibly capture the inherent dynamics of asset returns, this setting is very common in finance (e.g. Pastor (2000)). As was shown in the previous section, these assumptions give an analytical, conjugate result. More complicated models can be used if warranted, but the analytical solution might disappear. In this case extensive Monte Carlo computations might be needed. A researcher must therefore consider the trade-off between complexity (realism) and ease of computation. If one wants to add some dynamic structure to the mean asset returns, one of the simplest ways is to use the maximum likelihood estimate Bˆ for the prior mean of B in π(B|B0 , Σ , A). (That is, the estimate Bˆ constructed at time t is used as a highly likely proxy for B at time t + 1.) Our degree of confidence in this choice is reflected by the matrix A in π(B|B0 , Σ , A). We can also incorporate dynamics into the asset return covariance matrix. For example, Σˆ = (R − ¯ ¯ 0 (R − X B)/n might be one of candidates for L 0 . X B) In Section 6, we use both ideas for real data analysis. The class of matrix normal dynamic linear models (Quintana & West, 1987) provides a fully conjugate framework for multivariate time series analysis and dynamic regression. One can also use a timevarying model for the asset return covariance matrix (Aguilar & West, 2000; Carvalho & West, 2007).

ν∗ Σ∗ . − 2 ft+1

ν∗

Notice that the portfolio is selected at the end of period t, while the factors ft+1 are still unknown. In other words, the result is conditional on the factors ft+1 (Bawa et al., 1979). In the Bayesian framework, it is thus natural to compute the expectation value of UBayes (Wt+1 ) taking into account not only the predictive distributions of asset returns but also the predictive distributions of the factors ft+1 (e.g. Pastor (2000)). The resulting portfolio considers the uncertainty in returns due to future values of the factors. Pastor (2000) used extensive computations to implement this double expectation. If there are too few trials, however, the Monte Carlo approximation errors are non-trivial. Let µ f and Σ f be the mean vector and covariance matrix for future values of the factor return ft+1 . Nakatsuma (2006) explicitly obtained the empirical Bayes estimator for the mean vector and covariance ˆ Bayes and Σˆ Bayes , by matrix for future returns, µ integrating out unknown future values of the factor return. In our model, the mean and covariance matrix of rt+1 have the following forms:   ˆ Bayes = E ft+1 µ ˆ Bayes |ft+1 = α¯ + Γ¯ µ f , µ h i   ˆ Bayes |ft+1 Σˆ Bayes = E ft+1 Σˆ Bayes |ft+1 + V ft+1 µ =

ν∗ 1 Σµ∗ f + ∗ ∗ ν −2 ν −2 h i 0 × tr{Σ f (X X + A−1 )−1 } (S + Λ0 ) + Γ¯ 0 Σ f Γ¯ .

Although these formulas still require the values of µ f and Σ f , we don’t have to express them in the form of a probability density. It is common and very simple to use the sample P average nt=1 ft /n as a proxy for the future value of asset pricing factors ft+1 . In finance, however, some

555

T. Ando / International Journal of Forecasting 25 (2009) 550–566

researchers have started to introduce predictive regression. Given that the factors vary over time, it makes sense toPestimate ft+1 usingP an exponentially weighted mean { nt=1 exp(−ζ t)}−1 nt=1 exp(−ζ t)ft . This estimator is often utilized by financial practitioners, and we adopt it here as well. In this paper, ζ is set to 0.1. In the same way, the covariance matrix Σ f is also estimated with a prior belief. 3.3. Bayesian model averaging approach for the multifactor models This section describes how to apply Bayesian model averaging to the multifactor model previously introduced. Suppose that an asset returns model M j is given by the multifactor model rt = α j + Γ j0 f jt + ε jt ,

t = 1, . . . , n,

where f jt = ( f 1t , . . . , f p j t )0 is a p j -dimensional vector of factors, and ε jt = (ε j1t , . . . , ε jmt )0 is an mdimensional noise vector with means 0 and variances Σ j . The vector α j = (α j1 , . . . , α jm )0 and matrix Γ j = (β j1 , . . . , β jm ) consist of unknown parameters, and β jk = (β jk1 , . . . , β jkp j )0 is a p j -dimensional vector of factor loadings. We again use the matricvariate normal distribution N (B j |B j0 , Σ j , A j ) and inverted Wishart distribution W (Σ j |Λ j0 , ν j0 ) for B j = (α j , Γ j0 )0 and Σ j respectively. Using the arguments given in Section 3.1, the predictive distribution of model j becomes the multivariate t-distribution Mt (ˆr j,t+1 , Σ ∗j , ν ∗j ), with −1 −1 0 0 ∗ rˆ j,t+1 = (X 0j X j + A−1 j ) (X j R + A j B j0 ) f j,t+1 ,

Σ ∗j

=

−1 ∗ 1 + f ∗j,t+1 0 (X 0j X j + A−1 j ) f j,t+1

n + ν0 − m + 1 S j = (R − X j B¯ j )0 (R − X j B¯ j ) ¯ + ( B¯ j − B j0 )0 A−1 j ( B j − B j0 ),

(S j + Λ j0 ),

−1 −1 0 B¯ j = (X 0j X j + A−1 j ) (X j R + A j B j0 ),

ν ∗j = n + ν j0 − m + 1. Here X j = (1n , F j ) and F j = (f j1 , . . . , f jn )0 . The posterior probabilities of model M j are given by π(M j |D) =

π(D|M j )π(M j ) J P j=1

π(D|M j )π(M j )

,

j = 1, . . . , J.

Here π(M j ) is the prior probability of model M j , and π(D|M j ) is its marginal likelihood: Z π(D|M j ) = L(D|θ j , M j )π(θ j |M j )dθ j =

L(D|θ j , M j )π(θ j |M j ) , π(θ j |D, M j )

where θ j = (vec(B j )0 , vech(Σ j )0 )0 is the parameter vector of model M j . L(D|θ j , M j ) is the likelihood function of model M j , while π(θ j |M j ) and π(θ j |D, M j ) are its prior and posterior distributions respectively. In order to compute the marginal likelihood of a model, one can utilize the harmonic mean estimator (Newton & Raftery, 1994). However, the presence of a small number of outliers with small likelihood values can have a large effect on this estimate. This is because the inverse likelihood does not possess a finite variance (Chib, 1995). By rearranging the posterior distribution, Chib (1995) evaluated the marginal likelihood as follows: π(D|M j ) =

L(D|θ ∗j , M j )π(θ ∗j |M j ) π(θ ∗j |D, M j )

,

j = 1, . . . , J

(2)

for any value of θ ∗ . Since every term on the right-hand side of Eq. (2) is available, we can easily evaluate the marginal likelihood: π(D|M j ) = π − ×

|Λ j0 |

ν j0 2

nm 2

× Γm (

|Λ j0 + S j | j = 1, . . . , J.

ν j0 +n 0 2 ) × |X j X j

ν j0 +n 2

× Γm (

+ A−1 j |

p j +1 2

p j +1 ν j0 2 2 ) × |A j |

,

We employ Occam’s window (Madigan & Raftery, 1994), which first identifies the model with the largest marginal likelihood, then excludes models that are unlikely a posteriori. In other words, all models M j satisfying argmaxk π(Mk |D)/π(M j |D) ≥ C are excluded. This results in a set of K models to be included in the predictive distribution. It is obvious that K depends on the size of Occam’s window C. In this paper, C is optimized using BPICBMA (see Eq. (5) in Section 4.2). The predictive mean is an average of the predictive means from each model, weighted by the posterior

556

T. Ando / International Journal of Forecasting 25 (2009) 550–566

model probabilities. The predictive covariance is a weighted average of the predictive covariance matrices and the additional term associated with the individual predictive means, across all models M j (Leamer, 1978): ˆ BMA |ft+1 = µ

K X

¯ j, π(M j |D) × µ

j=1

¯ j = α¯ j + Γ¯ j f j,t+1 ), (µ Σˆ BMA |ft+1 =

K X j=1

+

K X

π(M j |D) ×

ν ∗j ν ∗j − 2

Σ ∗j

¯ j −µ ˆ BMA )(µ ¯ j −µ ˆ BMA )0 . π(M j |D)(µ

j=1

The factor ν ∗j /(ν ∗j − 2)Σ ∗j is the covariance matrix of model M j , and K is the number of models included in the predictive distribution. In the formula for Σˆ BMA |ft+1 , the first and second terms represent the uncertainty contributions due to a specific model and the model uncertainty, respectively. Just as in the empirical Bayes case, the mean and covariance matrix of rt+1 have a closed analytical form. Although the Occam’s window approach (Madigan & Raftery, 1994) identifies a set of good models, it is still unclear how best to select the size of the window. This problem will be investigated in the next section. 4. Model selection and evaluation The previous sections have described the development of Bayesian models in general. One of the most crucial issues in portfolio selection is finding a specific model which adequately expresses the dynamics of asset returns. If we implement portfolio selection based on the multifactor model, for example, we must be able to select a set of factors that captures the asset return distribution. Furthermore, in the Bayesian model averaging method one must also select the number of models to be included. This section introduces the Bayesian predictive information criterion (BPIC) for evaluating predictive distributions (Ando, 2007, 2008). Note, however, that if decision-makers have a strong prior belief about the optimal combination of factors, then these preferences can be used instead.

4.1. The Bayesian predictive information criterion (BPIC) Taking a Bayesian approach to information theory, Ando (2007) considered the expected value of the loglikelihood:  Z Z η= log L(D ∗ |θ )π(θ |D)dθ g(D ∗ )dD ∗ , where π(θ |D) is the posterior density function and D ∗ = {r∗1 , . . . , r∗n } is the unseen observation generated from a true model. The best predictive distributions can then be selected by comparing the posterior mean of η for each statistical model. The posterior mean of the expected log-likelihood depends on the unknown true model. The main problem is therefore one of accurately estimating the posterior mean. A natural estimator is the posterior mean of the log-likelihood itself, n Z 1X ηˆ = log f (rt |θ )π(θ |D)dθ . n t=1 It is well known that the posterior mean of ηˆ generally carries a positive bias bˆ with respect to η. This arises because the same data are used to construct the posterior distributions and evaluate the posterior mean of the expected log-likelihood. Consider a situation where the prior is assumed to be dominated by the likelihood as n increases, and where one of the specified parametric models is actually the true model (or very close to the true model). In this case, as is shown by Ando (2007), the asymptotic bias Z  ˆ nb = ηˆ − η g(D)dD reduces to the dimension of θ . The asymptotic bias in ηˆ can thus be reduced by a factor dim{θ }/n. We therefore obtain a tailor-made version of the BPIC (Ando, 2007): Z BPIC = −2 L(D|θ )π(θ|D)dθ + 2 dim{θ }. (3) We can select among predictive distributions by minimizing the BPIC. Practically speaking, BPIC and other model selection criteria (such as the deviance information criterion) can be used to select the sampling

557

T. Ando / International Journal of Forecasting 25 (2009) 550–566

density, priors for parameters and models, and useful combinations of the available factors. In this paper, we apply BPIC to select the combination of factors. 4.2. Using BPIC to evaluate averaged Bayesian models To determine the best predictive distribution among various statistical models, we follow Ando (2008) by maximizing the posterior mean of the expected log likelihood: Z Z K X η= π(M j |D) × log L(D ∗ |θ j , M j ) j=1



× π(θ j |D, M j )dθ j g(D ∗ )dD ∗ , where K is the number of averaged models. By the arguments of the previous section, a natural estimator of η is the log likelihood itself: Z K X 1 ηˆ = π(M j |D) × log L(D|θ j , M j ) n j=1 × π(θ j |D, M j )dθ j . According to Ando’s (2008) Theorem 1, the bias in ηˆ is approximately Z K X nbBMA ≈ π(M j |D) × log{L(D|θ j , M j )

(

C jn (θ j ) = −

× π(θ j |D, M j )dθ j = q j . If we further assume that one of the specified parametric models is the true model (or very close to the true model), then Q jn (θ j ) ' C jn (θ j ). In this special case, as shown by Ando (2008), the asymptotic bias estimate n bˆ simplifies to n bˆBMA ≈

K X

This can be used to correct for the asymptotic bias in η. ˆ The BPIC of a model M j constructed by the Bayesian model averaging method is therefore  Z K X BPICBMA = π(M j |D) × −2 log L(D|θ j , M j ) j=1

× π(θ j |D, M j )dθ j + 2q j

π(M j |D) log{L(D|θˆ jn , M j )

=

j=1 K X

π(M j |D)q j /2

j=1

+

K X

π(M j |D) × BPIC j .



(5)

j=1

× π j (θˆ jn |M j )} + K X

π(M j |D) × q j .

j=1

× π(θ j , M j )}π(θ j |D, M j )dθ j K X

n ∂ 2 η jn (rt , θ j ) 1X . n t=1 ∂θ j ∂θ 0j

Here η jn (rt , θ j ) = log f (rt |θ j , M j ) + log π(θ j |M j ) /n, and the prior distribution π(θ j |M j ) may depend on n as long as limn→∞ n −1 log π(θ j |M j ) is finite. Assume that the prior is dominated by the likelihood as n increases. It can be shown that (Spiegelhalter, Best, Carlin, & van der Linde, 2002, p. 591) Z log L(D|θˆ jn , M j ) − n −1 log L(D|θ j , M j )

j=1



)

n o ˆ ˆ π(M j |D)tr C −1 jn (θ jn )Q jn (θ jn ) ,

j=1

(4) where q j is the dimension of θ j and θˆ jn = argmaxθ j π(θ j |D) is the posterior mode of model M j . The matrices Q jn (θ ) and C jn (θ ) are given by ( ) n ∂η jn (rt , θ j ) ∂η jn (rt , θ j ) 1X Q jn (θ j ) = , n t=1 ∂θ j ∂θ 0j

We can select the predictive distribution that minimizes BPICBMA . Note that in the most general case the models M j may have different dimensions, sampling densities, priors for parameters, prior model probabilities, and different combinations of the factors. In this paper, we only consider differences with regard to the combination of factors. When we search for the optimal size of Occam’s window C, we also prepare various values of C and compute the BPICBMA score for each. The optimal value of C that achieves the minimum BPICBMA score among the candidates is then selected. The number of

558

T. Ando / International Journal of Forecasting 25 (2009) 550–566

models included in the final model, K , is automatically determined from the selected value of C. 5. Simulations Monte Carlo simulations are conducted to investigate the performance of the proposed portfolio selection methods. The two problems considered here are (1) identifying the set of factors that truly affect asset returns, and (2) achieving a portfolio selection with good performance. It is well known that the mean variance method is much more sensitive to the mean vector than the variance matrix. The identification of true factors will therefore significantly improve the method’s estimation of asset returns. 5.1. Simulation settings Repeated random samples {rt ; t = 1, . . . , n} were generated from the multifactor model         β11 β12 ε1t α1 r1t   ε2t  r2t  α2  β21 β22  f  1t        +  . ,  ..  =  ..  +  .. ..  f  ..   .  .  . .  2t β61 β62 ε6t α6 r6t (6) where ft = ( f 1t , f 2t ) is a two-dimensional vector of factors generated from the multivariate normal distribution with mean 0 and variance matrix V = vi j (vi j = 5 (i 6= j), vi j = 1 (i = j)). The six-dimensional noise vector ε t = (ε1t , . . . , ε6t ) is generated from a normal distribution with mean 0 and variance matrix Σ = σi j , (σi j = u i j (i 6= j), σi j = 2 (i = j)). Here u i j is a uniform random variable that ranges between −2 and 2. The unknown parameters α j are uniform random variables between 1 and 2. The unknown factor loadings β j are uniform random variables between −2 and 2. We also generated a three-dimensional set of factors ft = ( f 3t , f 4t , f 5t ) from the same distribution, and a five-dimensional set by unifying the factors ( f 1t , f 2t , f 3t , f 4t , f 5t ). The latter serves as a pool of alternative models; since the true factor vector is f = ( f 1 , f 2 ), the BPIC should be able to identify the correct pair out of 25 = 32 possible combinations. To assess the performance of the models, weights are determined based on a window of n = 60 months.

The investment proportions are determined by the mean variance method, where the target return is set to 1.02 × (the averaged expected return). The returns from holding these portfolios are then calculated for the next month. We continue the process for 100 months. The procedure used in this paper can be summarized as follows: 1. For a given pair of factors, construct predictive distributions using n = 60 observations. 2. Calculate the optimal portfolio w using the meanvariance method. 3. Apply the calculated portfolio weights w to the actual returns observed in the next period to obtain the actual portfolio return. 4. Roll the sample forward by one period and repeat steps 1 through 3. We compared the following three methods: (a) mean-variance portfolio selection, using the AIC (Akaike, 1974) for factor optimization (MV-AIC). This method ignores the estimation and model specification uncertainties. (b) empirical Bayes meanvariance portfolio selection with the multifactor model, using the BPIC (BMV-BPIC) for factor optimization. In this method the model specification uncertainty is ignored. (c) Bayesian model averaging mean-variance portfolio selection, where only the size of Occam’s window C needs to be optimized by the BPIC (BMAMV-BPIC). When calculating the posterior mean of the log-likelihood, the number of Monte Carlo iterations is 10,000. As for parameters included in the priors π(B|B0 , Σ , A) and π(Σ |Λ0 , ν0 ), we set ν0 = 10 and A = 105 × I p (I p is the p × p identity matrix). Because the true parameters of B and Σ are not time-varying, we use constant parameters for B0 and L 0 . We further assume that all priors are equally probable (P(Mi ) = 1/32, i = 1, . . . , 32). The same settings are applied in the Bayesian model averaging method. The trading cost is set to 2%. 5.2. Results The Sharpe ratio (Sharpe, 1994) is the mean excess return of the portfolio divided by the standard deviation of the portfolio. The results are as follows: (a) MV-AIC, 0.8534 (0.3659); (b) BMV-BPIC, 0.8711 (0.4074); and (c) BMAMV-BPIC, 0.9328 (0.3999). These figures are averaged over 100 Monte Carlo

559

T. Ando / International Journal of Forecasting 25 (2009) 550–566 Table 1 The twelve industry portfolios. 1 2 3 4 5 6 7 8 9 10 11 12

Consumer non-durables (food, tobacco, textiles, apparel, leather, toys) Consumer durables (cars, TVs, furniture, household appliances) Manufacturing (machinery, trucks, planes, paper, com printing) Energy (oil, gas, and coal extraction and products) Chemicals and allied products Business equipment (computers, software, and electronic equipment) Telephone and television transmission Utilities Wholesale, retail, and some services (laundries, repair shops) Healthcare, medical equipment, and drugs Finance Other (mines, hotels, entertainment, etc.)

trials; the numbers in parentheses are standard deviations. The Bayesian model averaging method achieved the largest Sharpe ratio. The standard meanvariance portfolio selection method is inferior to both Bayesian methods. We also investigate P variations in the portfolio over time, expressed as nt=2 {wt − wt−1 }2 . By this measure, we found that the portfolio selected by the standard mean-variance method fluctuates more than that selected by the proposed Bayesian methods. From a practical perspective, it is better to manage a portfolio whose variation is small in order to limit trading costs. The factors selected by the BPIC score were also accurate; the criterion found the correct pair over the entire period considered. Bayesian model averaging also allows us to evaluate the posterior probability of inclusion for each factor. Following Viallefont, Raftery, and Richardson (2001), we simply sum the posterior probabilities of the model. The probability is Pr (factor f k is included|D) K X = π( j|D) × δ(M j , k), j=1

where δ(M j , k) = 1 if the factor f k is included in M j and zero otherwise. The posterior probabilities of inclusion for true factors were close to one. The posterior probabilities of inclusion for neither of the true factors were close to zero. We therefore expect that the proposed method will work effectively in real data analysis.

Table 2 Summary statistics for the industry portfolios listed in Table 1. These statistics refer to the monthly returns on each index. The sample period is January 1990–December 2004.

1 2 3 4 5 6 7 8 9 10 11 12

Mean

Std. dev.

Skewness

Kurtosis

1.0109 0.9674 1.1376 1.0194 1.0128 1.3503 0.6188 0.8331 1.0508 1.1313 1.3538 0.6644

4.0860 6.0429 4.7408 4.6417 4.2820 8.2185 5.5854 4.1334 4.9159 4.8068 5.2051 4.7452

−0.1803 −0.3067 −0.5570 0.4315 −0.3020 −0.3024 −0.1287 −0.2809 −0.1836 −0.0395 −0.3986 −0.6158

0.6516 0.5853 0.9632 0.5404 0.3308 0.4501 1.0944 0.2534 0.2197 0.0871 1.7104 1.2977

6. Real data application 6.1. Data description As an empirical illustration, we consider allocating wealth among 12 equity indices provided by the CRSP database, namely 12 Industry Portfolios. Each NYSE, AMEX, and NASDAQ stock is assigned to a single industrial sector based on its four-digit SIC code. Table 1 lists the activities found in each industry portfolio. Monthly returns from January 1990 to December 2004 are used. Summary statistics for the twelve return series are provided in Table 2. This paper considers a six-factor model. The first two factors, HML and SMB, were said by Fama and French (1993) to capture variations in stock returns. The third factor is the excess return on the market

560

T. Ando / International Journal of Forecasting 25 (2009) 550–566

Table 3 Summary statistics for six factors: Size (SMB), book-to-market (HML), excess return (ER), Moody’s Baa corporate bond yield minus the long-term U.S. government bond yield (Baa-GB), growth of the S&P’s common stock price index (SP), and growth of the US consumer price index (CPI). The sample period is January 1990 to December 2004.

SMB HML ER Baa-GB SP CPI

Mean

Std. dev.

Skewness

Kurtosis

0.2278 0.3403 0.6583 1.6561 9.1786 2.6962

3.8593 3.5486 4.2915 0.4562 40.9821 2.7759

0.7911 0.1346 −0.6648 0.9070 −0.3685 0.1238

6.9401 2.2823 0.8343 −0.4940 1.6784 0.3796

(ER), which is the value-weighted return on all NYSE, AMEX, and NASDAQ stocks minus the one-month Treasury bill rate. Following Chen, Roll, and Ross (1986), the following three economic variables are also considered: Moody’s Baa corporate bond yield minus the long-term U.S. government bond yield (Baa-GB), the rate growth of the S&P’s common stock price index (SP), and the growth rate of the U.S. consumer price index (CPI). Both SP and CPI are growth rates expressed as an annual percentage. Table 3 shows the summary statistics for these indexes. 6.2. Results As in Section 5, we consider the following portfolio selection methods: (a) a standard meanvariance portfolio, which ignores estimation errors, (b) a portfolio based on empirical Bayes estimators, which ignores model specification error, and (c) a portfolio based on Bayesian model averaging. The prior settings are as follows: to add some dynamic structure to the mean asset returns, we use the maximum likelihood estimate Bˆ as a proxy for the prior mean of B. Although the amount of information in the prior would seem to be the sample size n, we know that log π(B|B0 , Σ , A) = O p (1). Thus, the current data D used in the likelihood estimate contains stronger information than data within the prior. We set the prior variance to be A = 105 × I p , which also implies that the prior information is weak. We ¯ 0 (R − X B)/n. ¯ set L 0 = Σˆ = (R − X B) We choose ν0 = 15, and assume equal prior model probabilities P(Mi ) = 1/64. These settings apply to all three methods. The trading cost is set to 2%.

Fig. 1. Overall performance of the three portfolios. Dotted line: mean-variance method (classical-MV), dashed line: empirical Bayes method, solid line: Bayesian model averaging (BMA).

In all frameworks, the portfolio weights are based on n = 60 months of past information. These weights are then used to calculate returns in the next month. The resulting out-of-sample period spans January 1995–December 2004. The investment proportions are determined by setting the target return to 1.02 × (the expected return). To obtain the actual portfolio returns, we apply these proportions to the actual returns observed in the next month. We then roll the sample forward by one month and repeat the procedure. Table 4 reports the out-of-sample mean of monthly excess returns for each portfolio strategy. The standard deviation, skewness, kurtosis and Sharpe ratio are also calculated. The mean returns and Sharpe ratios in particular indicate that the proposed Bayesian methods outperformed the standard mean variance method. Fig. 1 plots the cumulative returns. The final wealth values for each method, starting from an initial wealth W0 = 1, are (a) 1.8397, (b) 1.8702, and (c) 1.8948 respectively. The Bayesian model averaging method again achieves the best performance. This is to be expected, as it attempts to account for all sources of uncertainty. Fig. 2 shows the time series of portfolio weights for each asset under the Bayesian model averaging method. We can calculate the sum of the posterior probabilities for this model, which includes the corresponding factor Pr (factor f k is included|D). Fig. 3 shows the probability of Pr (factor f k is included|D) for each factor as a function of time. While four of

561

T. Ando / International Journal of Forecasting 25 (2009) 550–566

Table 4 The out-of-sample mean monthly returns (%) for each portfolio selection strategy. The standard deviation, skewness, and kurtosis of the outof-sample monthly excess returns, as well as the Sharpe ratio (SR), are also reported. Mean

Std. dev.

Skewness

Kurtosis

SR

0.0289 0.0491 0.0520 0.0230 0.0431

−0.7763 −0.8403 −0.2122 −0.3742 −0.7785

0.2737 1.2237 −0.9557 −0.4499 0.8970

0.5655 0.0909 −0.0644 0.3396 0.1398

0.0288 0.0479 0.0514 0.0230 0.0424

−0.7801 −0.7959 −0.2170 −0.3682 −0.7508

0.2491 1.1337 −0.9299 −0.4175 0.8206

0.5529 0.0977 −0.0555 0.3472 0.1445

0.0288 0.0480 0.0515 0.0234 0.0425

−0.7968 −0.7906 −0.2218 −0.3865 −0.7525

0.3008 1.1006 −0.9289 −0.4630 0.8084

0.5616 0.0997 −0.0534 0.3348 0.1469

Portfolio performance based on the standard method Jan. 1995–Dec. 1997 Jan. 1998–Dec. 2000 Jan. 2001–Dec. 2003 Jan. 2004–Dec. 2004 Jan. 1995–Dec. 2004

0.0163 0.0044 −0.0033 0.0078 0.0060

Portfolio performance based on empirical Bayes estimators Jan. 1995–Dec. 1997 Jan. 1998–Dec. 2000 Jan. 2001–Dec. 2003 Jan. 2004–Dec. 2004 Jan. 1995–Dec. 2004

0.0159 0.0046 −0.0028 0.0079 0.0061

Portfolio performance based on Bayesian model averaging Jan. 1995–Dec. 1997 Jan. 1998–Dec. 2000 Jan. 2001–Dec. 2003 Jan. 2004–Dec. 2004 Jan. 1995–Dec. 2004

0.0162 0.0047 −0.0027 0.0078 0.0062

the factors are stable, two are time-varying. This result indicates that it might not be ideal to construct a portfolio that always uses the same factors over a long period. Similar results were obtained by Nakatsuma (2006). We next compute the predictive distribution and compare it to the ex-post empirical distribution. Because accurate forecasting of each asset return implies an accurate prediction of the optimal portfolio return, we shall investigate the forecasting accuracy for each asset return separately. Let rt = (r1t , . . . , rmt )0 be the observed return and rˆt = (ˆr1t , . . . , rˆmt )0 be the forecast obtained by a particular method. To compare the forecasting accuracy of each method, we calculated the out-of-sample mean squared forecasting error (MSFE), mean forecasting error (MFE) and mean absolute error (MAE) of each method. Specifically, these three measures are computed as MSFE =

te X 1 (rt − rˆt )0 (rt − rˆt ), m(te − to + 1) t=to

MFE =

te X m X 1 (r jt − rˆ jt ), m(te − to + 1) t=to j=1

MAE =

te X m X 1 |r jt − rˆ jt |. m(te − to + 1) t=to j=1

Here to is Jan. 1995 and te is Dec. 2004. Table 5 gives the out-of-sample MSFE, MFE and MAE for the three methods. Among the three methods, the Bayesian model averaging method predicts future returns best, as it has the smallest out-of-sample MSFE in each sub-period. The performance of the empirical Bayes method is slightly better than that of the standard method. Similar results were obtained with regard to the mean absolute error (MAE). The MFE indicates that Bayesian model averaging gives the least biased results. Fig. 4 shows the portfolio index of the business equipment sector from January 1995 through December 2004. Also plotted are the approximate 95% confidence intervals provided by each of the three methods. In the empirical Bayes case, these are obtainˆ Bayes = (µ j ), covariance matrix ed using the mean µ

562

T. Ando / International Journal of Forecasting 25 (2009) 550–566

(a) Consumer non-durables.

(b) Consumer durables.

(c) Manufacturing.

(d) Energy.

(e) Chemicals and allied products.

(f) Business equipment.

(g) Telephone and television transmission.

(h) Utilities.

(i) Wholesale, retail, and some services.

(j) Healthcare, medical equipment, and drugs.

(k) Finance.

(l) Other.

Fig. 2. Portfolio weights from the Bayesian model averaging method.

Σˆ Bayes = (σi j ), and a large sample property. The 95% confidence intervals for r j,t+1 can then be approximated as [µˆ j − 2σ j j , µˆ j + 2σ j j ], j = 1, . . . , m.

Note that the 95% confidence intervals of the Bayesian model averaging method are slightly wider than those of the empirical Bayes method, which in

T. Ando / International Journal of Forecasting 25 (2009) 550–566

(a) SMB.

(b) HML.

(c) ER.

(d) Baa-GB.

(e) SP.

(f) CPI.

563

Fig. 3. Posterior probability of inclusion. Table 5 The out-of-sample mean squared forecast error (MSFE), mean forecasting error (MFE), and mean absolute error (MAE) of the three methods. The standard deviations are given in parentheses. MV-AIC: the mean-variance portfolio selection method, using the AIC for factor optimization (Akaike, 1974). BMV-BPIC: the empirical Bayes mean-variance method with a multifactor model, using BPIC. BMAMV-BPIC: the Bayesian model averaging method, where only the size of Occam’s window is optimized by BPIC. MSFE Jan. 1995–Dec. 1997 Jan. 1998–Dec. 2000 Jan. 2001–Dec. 2003 Jan. 2004–Dec. 2004 Jan. 1995–Dec. 2004

MV-AIC 0.0011 (0.0018) 0.0027 (0.0042) 0.0014 (0.0020) 0.0009 (0.0012) 0.0016 (0.0028)

BMV-BPIC 0.0011 (0.0018) 0.0027 (0.0042) 0.0014 (0.0020) 0.0009 (0.0012) 0.0016 (0.0028)

BMAMV-BPIC 0.0010 (0.0016) 0.0027 (0.0041) 0.0013 (0.0019) 0.0009 (0.0013) 0.0015 (0.0027)

MFE

MV-AIC

BMV-BPIC

BMAMV-BPIC

Jan. 1995–Dec. 1997 Jan. 1998–Dec. 2000 Jan. 2001–Dec. 2003 Jan. 2004–Dec. 2004 Jan. 1995–Dec. 2004

0.0000 (0.0339) −0.0081 (0.0523) −0.0039 (0.0382) −0.0016 (0.0317) −0.0038 (0.0409)

0.0000 (0.0339) −0.0081 (0.0523) −0.0039 (0.0382) −0.0016 (0.0317) −0.0038 (0.0409)

0.0017 (0.0335) −0.0093 (0.0525) 0.0000 (0.0376) −0.0041 (0.0311) −0.0027 (0.0408)

MAE

MV-AIC

BMV-BPIC

BMAMV-BPIC

Jan. 1995–Dec. 1997 Jan. 1998–Dec. 2000 Jan. 2001–Dec. 2003 Jan. 2004–Dec. 2004 Jan. 1995–Dec. 2004

0.0256 (0.0217) 0.0421 (0.0312) 0.0293 (0.0243) 0.0245 (0.0187) 0.0316 (0.0261)

turn are wider than those of the standard method. This is to be expected, as each takes into account fewer

0.0256 (0.0217) 0.0421 (0.0312) 0.0293 (0.0243) 0.0245 (0.0187) 0.0316 (0.0261)

0.0254 (0.0206) 0.0420 (0.0302) 0.0291 (0.0233) 0.0240 (0.0189) 0.0315 (0.0254)

sources of uncertainty than the last. We can also see that the 95% confidence intervals vary over time, in-

564

T. Ando / International Journal of Forecasting 25 (2009) 550–566 Table 6 The out-of-sample mean of monthly excess returns (%) for each portfolio strategy is reported for the whole prediction period, Jan. 1995–Dec. 2004. The standard deviation, Sharpe ratio (SR), and out-of-sample mean squared forecasting error (MSFE) are also reported. See Table 5 for the model abbreviations. n

Mean

SR

MSFE

0.0937 0.1245

0.0017 (0.0029) 0.0016 (0.0028)

0.0938 0.1261

0.0017 (0.0029) 0.0016 (0.0028)

0.1048 0.1373

0.0016 (0.0028) 0.0016 (0.0027)

(a): MV-AIC 50 60

0.0041 (0.0437) 0.0053 (0.0433)

(b): BMV-BPIC 50 60

0.0040 (0.0432) 0.0053 (0.0426)

(c): BMAMV-BPIC Fig. 4. 95% confidence intervals for each of three methods, and actual returns for the business equipment sector.

50 60

0.0045 (0.0431) 0.0058 (0.0426)

7. Conclusion dicating that the asset return volatility is not constant. This result is consistent with the empirical literature on ARCH (Engle, 1982) and stochastic volatilities (Taylor, 1982). 6.3. Robustness check In this section we examine the implications of alternative prior distributions, the length of horizon n, and the target return by modifying the settings used in the previous section. In particular, we try changing these values to A = 107 × I p and ν0 = 20, while the target return is still 1.02 × (the averaged expected return). The trading cost is set to 2.5%. We then applied the three portfolio selection methods over two horizons: n = 50 and n = 60. The results are shown in Table 6. For ease of comparison, Table 6 also includes the mean and standard deviation of the out-of-sample monthly returns and the Sharpe ratio for each portfolio strategy. Although the ranking of the methods is unchanged, Table 6 shows that the quality of their results does depend on the length of the time horizon n. From the out-of-sample mean square of forecast errors (MSFE), we can see that the predictive performance is much better when n = 60 than when n = 50. Overall, the Bayesian model averaging method performs best.

This paper proposed two Bayesian portfolio selection methods based on a multifactor model and the Bayesian predictive information criterion. Both require several parameters to be optimized within the asset return prediction model. In the empirical Bayes method, the multifactor model requires a set of adequate factors that contribute most to the asset return prediction. In Bayesian model averaging, one must also select a set of adequate models to include in the predictive distribution. Factor and model selection is still an open problem. To address this issue, we introduced the Bayesian predictive information criterion (Ando, 2007, 2008). This measure gives useful advice regarding which factors and models should be given close attention. As is demonstrated by the numerical examples, the proposed Bayesian methods perform very well when the BPIC is used. This research can be extended in a number of directions. First, it is known that asset returns are not i.i.d. normal; in particular, they might be fat-tailed. We could extend the standard multifactor model to allow for correlated and fat-tailed errors. Second, the empirical analysis of this paper focused on the FamaFrench three-factor model and three other economic variables. The proposed methods should be tested on other datasets, and other pricing factors could be considered as well.

T. Ando / International Journal of Forecasting 25 (2009) 550–566

Finally, this work has focused on stock market portfolios. It would be worthwhile to examine the other markets: cash, bonds, currency exchange, options, futures, commodities, real estate, mutual funds, and hedge funds. We would like to investigate these problems in a future paper. Acknowledgements I am grateful to an Associate Editor and a referee for helpful comments on an earlier draft that greatly improved this paper. The author also would like to thank Professors Stephen Brown, Joseph Pagliari and Arnold Zellner, and Dr. John Guerard for helpful discussions and suggestions. The author expresses deep gratitude to Professor Takao Kobayashi for his helpful advice. This study was supported in part by a Grant-in-Aid for Young Scientists (B) (No.18700273). References Aguilar, O., & West, M. (2000). Bayesian dynamic factor models and variance matrix discounting for portfolio allocation. Journal of Business and Economic Statistics, 18, 338–357. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, AC–19, 716–723. Ando, T. (2007). Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika, 94, 443–458. Ando, T. (2008). Bayesian model averaging and Bayesian predictive information criterion for model selection. Journal of the Japan Statistical Society, 38(2), 243–257. Bawa, V., Brown, S., & Klein, R. (1979). Estimation risk and optimal portfolio choice. New York: North-Holland. Brown, S. (1976). Optimal portfolio choice under uncertainty: A Bayesian approach. Ph.D. dissertation. University of Chicago. Carvalho, C. M., & West, M. (2007). Dynamic matrix-variate graphical models. Bayesian Analysis, 2, 69–98. Chen, N. F., Roll, R., & Ross, S. A. (1986). Economic forces and the stock market. Journal of Business, 59, 383–403. Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90, 1313–1321. Dumas, B., & Jacquillat, B. (1990). Performance of currency portfolios chosen by a Bayesian technique: 1967–1985. Journal of Banking and Finance, 14, 539–558. Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50, 987–1008. Fama, E., & French, K. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33, 3–56. Frost, A., & Savarino, J. (1986). An empirical Bayes approach to efficient portfolio selection. Journal of Financial and Quantitative Analysis, 21, 293–305.

565

Greyserman, A., Jones, D. H., & Strawderman, W. E. (2006). Portfolio selection using hierarchical Bayesian analysis and MCMC methods. Journal of Banking and Finance, 30, 669–678. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054. Jobson, J. D., & Korkie, B. (1980). Estimation for Markowitz efficient portfolios. Journal of the American Statistical Association, 75, 544–554. Jorion, P. (1986). Bayes–Stein estimation for portfolio analysis. Journal of Financial and Quantitative Analysis, 21, 279–291. Leamer, E. E. (1978). Specification searches: Ad hoc inference with non-experimental data. New York: Wiley. Madigan, D., & Raftery, A. E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam’s window. Journal of the American Statistical Association, 89, 1535–1546. Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7, 77–91. Nakatsuma, T. (2006). A Bayesian model averaging approach for portfolio selection. In Proceeding of international workshop on Bayesian statistics and applied econometrics (pp. 229–266). Newton, M. A., & Raftery, A. E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap (with discussion). Journals of the Royal Statistical Society, Series B, 56, 3–48. Pastor, L. (2000). Portfolio selection and asset pricing models. Journal of Finance, 55, 179–223. Pastor, L., & Stambaugh, R. F. (2000). Comparing asset pricing models: An investment perspective. Journal of Financial Economics, 56, 335–381. Polson, N., & Tew, B. (2000). Bayesian portfolio selection: An empirical analysis of the S&P500 index 1970–1996. Journal of Business and Economic Statistics, 18, 164–173. Putnam, B., & Quintana, J. (1994). New Bayesian statistical approaches to estimating and evaluating models of exchange rates determination. In Proceedings of the ASA section on Bayesian statistical science. American Statistical Association. Quintana, J. (1992). Optimal portfolios of forward currency contracts. In J. Berger, et al., (Eds.), Bayesian statistics IV (pp. 753–762). Oxford: Oxford University Press. Quintana, J., & Putnam, B. (1996). Debating currency markets efficiency using multiple-factor models. In Proceedings of the ASA section on Bayesian statistical science. American Statistical Association. Quintana, J., & West, M. (1987). Multivariate time series analysis: New techniques applied to international exchange rate data. The Statistician, 36, 275–281. Rossi, P., Allenby, G., & Mcculloch, R. (2005). Bayesian statistics and marketing. New York: Wiley. Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of Finance, 19, 425–442. Sharpe, W. F. (1994). The Sharpe ratio. Journal of Portfolio Management, 20, 49–58. Fall.

566

T. Ando / International Journal of Forecasting 25 (2009) 550–566

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of Royal Statistical Society, Series B, 64, 583–639. Taylor, S. J. (1982). Financial returns modelled by the product of two stochastic processes — A study of the daily sugar prices 1961–75. In O. D. Anderson (Ed.), Time series analysis: Theory and practice 1 (pp. 203–226). Amsterdam: North-Holland.

Viallefont, V., Raftery, A. E., & Richardson, S. (2001). Variable selection and Bayesian model averaging in case-control studies. Statistics in Medicine, 20, 3215–3230. Xia, Y. (2001). Learning about predictability: The effects of parameter uncertainty on dynamic asset allocation. Journal of Finance, 56, 205–246. Zellner, A., & Chetty, K. (1965). Prediction and decision problems in regression models from the Bayesian point of view. Journal of the American Statistical Association, 60, 608–616.