Finance Research Letters xxx (xxxx) xxxx
Contents lists available at ScienceDirect
Finance Research Letters journal homepage: www.elsevier.com/locate/frl
Solving the index tracking problem based on a convex reformulation for cointegration Leonardo Riegel Sant'Anna ,a, Alan Delgado de Oliveiraa, Tiago Pascoal Filomenaa, João Frois Caldeirab ⁎
a
School of Business, Federal University of Rio Grande do Sul, 855 Washington Luis Street, Porto Alegre/RS, ZIP Code 90010-460 Brazil Department of Economics, Federal University of Santa, Catarina Campus Universitário Reitor João David Ferreira Lima, Florianópolis/SC, ZIP Code 88040-900, Brazil
b
ARTICLE INFO
ABSTRACT
Keywords: Mixed-integer non-linear optimization Cointegration Index tracking
This paper derives a mixed-integer non-linear optimization (MINLP) problem from the cointegration methodology and checks its convexity. We apply this approach to solve the index tracking (IT) problem using datasets from two distinct stock markets. The MINLP reformulation encompasses stock selection procedure and is optimized through branch-and-cut algorithm. The quality of the generated portfolios demonstrated lower turnover, which implies lower transaction costs over time and better performance in most instances regarding their tracking error in-sample and out-of-sample when compared with the traditional cointegration based IT portfolios.
JEL classification: C58 C61 G11
1. Introduction Index tracking (IT) is a passive investment strategy that aims at mimicking the performance of a market benchmark, such as the S&P 100 or the Dow Jones Industrial Average (DJIA) in the US market. In selecting a tracking portfolio, the general goal is to minimize tracking error (TE), which means the difference between index and portfolio returns over time. Thus, the most straightforward approach would consist of doing a full replication of some target index, i.e., to buy all stocks that compose this index allocating their respective weights. Nonetheless, such strategy would produce more significant transactions and management costs, especially if we consider broader indexes (for example, the S&P 100), since a large quantity of stocks constitutes them. Thereby, a key feature in solving the IT optimization problem is to impose a cardinality limit on the number of stocks that comprise these tracking portfolios. This cardinality constraint enables us to build portfolios with fewer stocks that perform similarly to that target index, i.e., reducing those associated costs. Among several approaches that have been used to solve IT, one of the most explored is cointegration (for instance, Alexander and Dimitriu, 2002; 2005; Dunis and Ho, 2005). Alexander and Dimitriu (2005) outlined two main benefits of cointegration methodology: (i) it is appropriate to model long-run asset prices dynamically; (ii) when this technique is applied to portfolio selection, it usually produces portfolios with enhanced weight stability, which would be explained by the capacity of this method (relative to other approaches) to extract information contained in the stock prices to define each portfolio (Huang et al., 2015). Furthermore, as the TE has a mean-reverting feature because the price difference between the market benchmark and the portfolio must be stationary,
Corresponding author. E-mail addresses:
[email protected] (L.R. Sant'Anna),
[email protected] (A.D. de Oliveira),
[email protected] (T.P. Filomena),
[email protected] (J.F. Caldeira). ⁎
https://doi.org/10.1016/j.frl.2019.101356 Received 5 September 2018; Received in revised form 15 October 2019; Accepted 10 November 2019 1544-6123/ © 2019 Elsevier Inc. All rights reserved.
Please cite this article as: Leonardo Riegel Sant'Anna, et al., Finance Research Letters, https://doi.org/10.1016/j.frl.2019.101356
Finance Research Letters xxx (xxxx) xxxx
L.R. Sant'Anna, et al.
cointegrated IT portfolios gain relevance for IT investing strategy: their composition is based on the long-run relationship between the benchmark and the stock prices included in the portfolio. As a result, cointegration properties have already been applied for some distinct investment strategies (Avellaneda and Lee, 2010); for instance, this methodology has already been used for index tracking, long-short, and pairs trading (Arshanapalli and Nelson, 2008; Caldeira and Moura, 2013; Jacobs and Weber, 2015; Bowen and Hutchinson, 2016). Hence, the versatility in the use of cointegration, combined with its capacity to explore temporary price distortions and create self-financed market-neutral portfolios, portrays the relevance of this method for both academicians and investors. Nevertheless, the cointegration applied to portfolio selection is unable to carry out variable selection to define the portfolio components, obligating an ex-ante choice of the stocks to comprise each portfolio. This stock selection could be made, for instance, making use of the stocks with the more significant weights in the index composition (Alexander and Dimitriu, 2005; Dunis and Ho, 2005). Still, even though such choice can be practical for smaller indexes, such as the DJIA (formed by 30 stocks), it might be a tricky approach for broader indexes such as the S&P 100 (composed by approximately 100 stocks), which have a considerably lower concentration. So, in this paper we derive a convex mixed-integer non-linear optimization model (MINLP) reformulation for IT cointegration methodology. Then, we explore the model reformulation embedding a cardinality constraint endogenously and solve the optimization with a hybrid outer-approximation based branch-and-cut algorithm (Bonami et al., 2008). Thus, we determine the stock selection procedure through a cardinality constraint to limit the size of the tracking portfolios. Furthermore, we present an empirical analysis to compare the results of the MINLP formulation vis-à -vis a random selection process applied to portfolio selection based on cointegration. We choose two market benchmarks to compute the empirical tests: the S&P 100, which is one of the main benchmarks in the US market, and the Ibovespa index, which is the primary index for the emerging Brazilian stock market. As a result, we were able to carry out empirical experiments and verify the good-quality solutions provided by the MINLP formulation (in comparison with the random selection process) regarding in-sample and out-of-sample tracking error. In our analysis, the MINLP model produced portfolios with lower average monthly turnover in most cases, which implies lower transaction costs over time and supports the use of this formulation for a passive investment strategy such as IT (since passive strategies focus on maintaining lower costs, especially in the long run). This article is divided as follows. Section 2 describes the use of cointegration for IT. Then, Section 3 presents the convex reformulation for cointegration and describes the MINLP model. Finally, Section 4 reports the empirical analysis, and Section 5 concludes the study. 2. Cointegration approach for index tracking Cointegration is a statistical property that defines that a set of time series integrated of order 1, i.e. I(1), can be linearly combined to produce one-time series, which is stationary, I(0). Formally, if we set S1t , S2t , …, Snt to be a sequence of I(1) time series, and if there are nonzero real numbers 1, 2, …, n such that 1 S1t , 2 S2t , …, n Snt becomes an I(0) series, then S1t , S2t , …, Snt are said to be cointegrated (Engle and Granger, 1987). To solve the IT problem using cointegration, we estimate the following linear regression: N
log(Pt ) =
0
+
i log(pi, t )
+
t
(1)
i=1
where Pt is the price of the market benchmark on the t-th day, t ∈ 1..T, pi,t is the price of the i-th stock on t, i ∈ 1.N, and ϵt is a zeromean error term. After estimating Eq. (1), the tracking portfolio is obtained by normalizing the coefficients βi, i ∈ 1.N, to sum up to one. The first step to estimate each portfolio consists of evaluating Eq. (1). The second step is to analyze the error term ^t resultant from the previous regression to verify the existence of unit root, which can be done using the Augmented Dickey–Fuller test. By finding the time series of error terms ^t is I(0), we confirm these variables are cointegrated. Thus, the portfolio obtained is a valid tracking portfolio. As mentioned in the Introduction, a drawback of cointegration is the ex-ante stock selection to define the portfolio components, since the OLS approach does not make variable selection. To mitigate such limitation and select each cointegrated portfolio, we follow a random selection approach1; in this aleatory process, given a dataset with N stocks for each time frame in-sample (as described in Section 4), first we randomly select s stocks (where s is the size of each portfolio) and then we calculate the cointegrated portfolio for this subset of s stocks through the methodology described in this Section; this process is repeated M times, i.e. we form M distinct candidate portfolios based on cointegration for each time frame in-sample. Then, we choose the portfolio whose estimation of Eq. (1) resulted in the smallest mean squared error. To estimate the cointegrated portfolios, we use non-negative least squares (NNLS), instead of OLS, to calculate these cointegrated portfolios, thereby avoiding negative coefficients, i.e., no short positions, since short selling stocks are often associated with liquidity and transaction cost issues (Kim et al., 2016)2; Such constraint for no short selling positions is also applied in the MINLP model, as described in Section 3.1. 1
For more detail on this approach, we refer the reader to Sant’Anna et al. (2017). In contrast with the random selection process for each cointegrated portfolio, all portfolios estimated using the MINLP model have its components endogenously selected, according to the description of the model in Section 3.1. 2
2
Finance Research Letters xxx (xxxx) xxxx
L.R. Sant'Anna, et al.
3. Optimization model: Convex reformulation In this section, we define a convex mixed-integer non-linear optimization model (MINLP) reformulation for IT cointegration methodology. We show the convexity of our reformulation and describe the algorithmic strategy used for its optimization. 3.1. Mixed-integer Non-linear Model (MINLP) To obtain the convex formulation, we rearrange Eq. (1) so that the random term ϵt is determined as follows: N t
=
+ log(Pt )
0
i log(pi, t )
(2)
i=1
Then, applying the quadratic power on both sides and summing the distance among the data points over time, we obtain: T
2 t
T
N
=
(
t=1
t=1 T
0
i log(pi, t ))
2
i=1 N
2 0
=
+ log(Pt )
+ log(Pt ) 2 +
t=1
( i log(pi, t )) 2 i=1 N
2 0log(Pt ) + 2
N i log(pi, t )
0
2log(Pt )
i=1
i log(pi, t )
(3)
i=1
Finally, we define the MINLP optimization as the minimization of the sum of the squared residuals in Eq. (3) subjected to Constraints (4)–(7):
µi , i
i
(4)
1. N
N
µi
K
(5)
i=1
µi i
{0, 1}, i +,
i
(6)
1. N
(7)
1. N
where μi is a binary variable equal to one whether the i-th stock is included in the portfolio, parameter K sets the maximum number of stocks in each portfolio, and we define βi ≥ 0 to avoid short positions in every portfolio (similarly, we impose the same constraint on the cointegration analysis by using NNLS). 3.2. Convexity analysis As we write Eq. (3) in matrix form (Boyd and Vandenberghe, 2004), we obtain:
y=
log(P1) log(P2)
, A=
log(PT )
1 log(p1,1 ) log(p2,1 )
log(pN ,1 )
0
1 log(p1,2 ) log(p2,2 )
log(pN ,2 )
0
1
1
1 log(p1, T ) log(p2, T )
, x=
log(pN , T )
N
, b=
.
T
which can be defined as: (8)
y = Ax + b
where the log-price matrix log(pi,t), i ∈ 1..N, t ∈ 1..T, can be set as matrix AT × N, xN × 1 is a vector of unknown parameters, and bT × 1 is a vector of unobserved disturbances. Reorganizing Eq. (8), we have:
b2 = y
(9)
Ax 2 ,
= 2A (y Ax ) and = 2A A. Hence, because the second Therefore, we obtain the first and second derivatives equaling order conditions for a twice differentiable b with convex domain are satisfied if and only if ∇2b2 ≥ 0, ∀x ∈ dom(b), we have a convex optimization model for any A, since A⊤A ≥ 0. b2
2 b2
3.3. Algorithm procedure To solve the MINLP optimization, we use the Bonmin solver, which is an open-source code for general MINLP problems with convex formulation, along with the B-Hyb approach, which is a hybrid outer-approximation/nonlinear programming based branchand-cut algorithm. The method consists of doing outer approximations and subproblem relaxations of Eqs. (3)–(7) to compute lower and upper bounds in a flexible branch-and-cut framework. The outer approximation algorithm rests on linearizations of (3)–(7) at 3
Finance Research Letters xxx (xxxx) xxxx
L.R. Sant'Anna, et al.
different points, thus defining a MILP (mixed-integer linear problem) relaxed version of the problem. This strategy is employed on the selected nodes of the branch-and-cut searching tree (for more detail, see: Bonami and Lee, 2007; Bonami et al., 2008). We also take advantage of the most fractional selection to guide our branching strategy, using a warm start algorithm with interior points and initial outer approximation decomposition to improve the possible initial solutions. 4. Empirical results In the empirical application, we consider two indexes: S&P 100 (US market – dataset with 101 stocks), and Ibovespa (main Brazilian benchmark – dataset with 55 stocks). Both datasets have daily closing stock prices adjusted for splits and dividends, with data from January 2010 to September 2017, and 1921 data points (each data point refers to a trading day). The tracking portfolios are limited to 15 stocks for the S&P 100 and 12 stocks for the Ibovespa and are estimated using 480 data points in-sample (Alexander and Dimitriu, 2005). Additionally, we consider the use of quarterly, semi-annual, and annual updating frequencies, i.e., portfolios are re-estimated every 60, 120, and 240 trading days in a rolling horizon framework3; Portfolios using optimization are estimated via the minimization of Eq. (3) constrained by (4)–(7). Meanwhile, cointegrated portfolios are estimated as described in Section 2, considering parameter M = 15, 000.4 To evaluate the results, we define the tracking error (TE) as follows (Beasley et al., 2003):
TE =
1/2
T
1 T
(rtp
Rt ) 2
(10)
t=1
The results are presented in Table 1 and Figs. 1 and 2 and are analyzed regarding (1) Average Annual Return, (2) Cumulative Return, (3) Portfolios’ Average TE, (4) Annual volatility, (5) Correlation, and (6) Average Monthly Turnover, defined as follows: np
N i=1
p=2
|x ip 2
x ip
1
|
×
1 f
(11)
where np is the number of portfolios estimated per portfolio size and updating frequency (we form 24 portfolios with quarterly frequency, 12 with semi-annual frequency, and 6 with annual frequency), p and p 1 are time instants where sequential rebalancing was carried out, and f equals 3 for quarterly rebalancing, 6 for semi-annual rebalancing, and 12 for annual rebalancing. First, Table 1 presents the descriptive results per portfolio selection strategy (cointegration and optimization) and portfolio updating frequency, where “optimization” refers to the results obtained using the MINLP formulation and “cointegration” refers to the results obtained with the cointegration methodology from Section 2. Overall, we can notice similar results regarding average annual return and cumulative return for both strategies, regardless of the index adopted. Nonetheless, a distinction between optimization and cointegration can be made concerning average monthly turnover; in this case, optimization resulted in lower turnover in all cases as we compare both methods (except for tracking the Ibovespa with annual updating frequency). Therefore, although both methods generate portfolios with similar performance, the optimization approach resulted in lower turnover, implying lower transaction costs over time, which is an advantage for passive strategies such as IT. Furthermore, because we can form a total number of 24 portfolios if we consider quarterly updating frequency, we evaluate the TE in-sample and out-of-sample for each of the 24 portfolios built per numerical approach (optimization and cointegration). The results are in Figs. 1 and 2, which show the average tracking error using optimization was smaller than the TE obtained using cointegration in all cases. In the case of the S&P 100, optimization resulted in smaller TE in-sample in 18 portfolios (75%) and smaller TE out-of-sample in 14 portfolios (58.3%). If we observe the results for the Ibovespa, the optimization method resulted in 19 portfolios (79.2%) with smaller TE in-sample and 18 portfolios (75%) with smaller TE out-of-sample. Such findings demonstrate that the initial solutions provided by the use of a numerical optimization model tend to be more efficient than the use of a random selection process for the traditional cointegration approach. 5. Conclusions In this article, we explored the cointegration approach by deriving a MINLP optimization model and used both cointegration and optimization approaches to solve the index tracking problem. We described cointegration, introduced the MINLP optimization, and 3 Due to the lack of information regarding the historical composition of each of the two indexes, we could not reconstruct them back to 2010. Therefore, the experimental analysis carried out in this paper was based on the structure of each index in September 2017. As a result of this empirical setting, the tracking performance of each portfolio tends to be harmed. Nonetheless, we emphasize our focus consists of comparing a method with a random choice to select each portfolio and a numerical approach with an endogenous portfolio selection. Thus, we understand not reconstructing the indexes does not alter our central conclusions in the empirical tests. 4 All tests were computed using an Intel® Xeon™ E5-2670 @ 2.6GHz 8-core processor with 64GB RAM and Linux Fedora. The tests using optimization were computed using AMPL programming language and solver Bonmin. The tests using cointegration were computed using Matlab® . All portfolios were estimated with maximum computing time of 60 seconds. Regarding cointegration, we chose M = 15, 000, as this was the largest number of simulations we could compute in 60 seconds.
4
Finance Research Letters xxx (xxxx) xxxx
L.R. Sant'Anna, et al.
Table 1 Descriptive results per index and portfolio selection method1.
Descriptive Results Average Annual Return Cumulative Return Portfolios’ Avg Tracking Error Annual Volatility Correlation Average Monthly Turnover
Index
Optimization
S&P 100 11.1% 106.0% – 12.5% – –
Quarterly 9.7% 88.0% 0.033% 13.2% 0.948 16.6%
Index Descriptive Results Average Annual Return Cumulative Return Portfolios’ Avg Tracking Error Annual Volatility Correlation Average Monthly Turnover
Ibovespa 4.8% 25.3% – 23.0% – –
Cointegration Semiannual 11.3% 108.0% 0.022% 13.2% 0.956 9.0%
Annual 13.8% 143.0% 0.015% 13.2% 0.960 12.0%
Quarterly 12.6% 124.7% 0.035% 13.5% 0.943 25.1%
Optimization Quarterly 6.3% 32.5% 0.057% 24.5% 0.952 13.4%
Semiannual 7.0% 45.7% 0.037% 24.2% 0.961 7.9%
Semiannual 13.0% 131.6% 0.025% 13.5% 0.945 13.0%
Annual 14.5% 152.3% 0.018% 13.7% 0.942 13.9%
Cointegration Annual 6.9% 40.9% 0.027% 24.6% 0.960 10.4%
Quarterly 8.7% 56.4% 0.062% 24.1% 0.944 20.6%
Semiannual 7.0% 40.7% 0.045% 23.7% 0.940 12.0%
Annual 6.9% 36.6% 0.032% 24.1% 0.945 9.3%
1 Average Annual Return refers to the average of the cumulative returns for each year from 2011 to 2017. Cumulative Return refers to the return calculated cumulatively during the entire out-of-sample period. Portfolios’ Average TE refers to the average of the tracking error calculated for each portfolio according to Eq. (10). Annual Volatility refers to × 252 , where σ is the standard deviation of daily returns verified during the entire outof-sample period. Correlation refers to the correlation between daily returns of each strategy and daily returns of the index during the entire out-ofsample period. Average Monthly Turnover is calculated according to Eq. (11).
Fig. 1. Tracking error per portfolio – S&P 100.
verified its convexity. Then, the empirical analysis was carried out with two market benchmarks (US – S&P 100; and Brazil – Index Ibovespa), and the results showed the capacity of the MINLP optimization to produce good quality solutions concerning tracking error in- and out-of-sample, compared with the random process employed with cointegration. Overall, from a theoretical viewpoint, our findings indicate that, although a random selection process for tracking portfolio components can generate satisfying results, the use of an optimization model that makes the portfolio selection endogenous to the solving process tends to produce better results consistently. Also, as we consider from previous literature that cointegration is an approach widely used in the financial markets, our results indicate that, even though this method can be valuable in general, its challenge concerning portfolio selection can be overcome through the use of a simplified optimization model, which solves this difficulty while maintaining the qualities of the cointegration methodology. 5
Finance Research Letters xxx (xxxx) xxxx
L.R. Sant'Anna, et al.
Fig. 2. Tracking error per portfolio – Ibovespa.
Declaration of Competing Interest We confirm that there are no interests to declare regarding this manuscript – Declarations of interest: none. Acknowledgments We kindly acknowledge that the Finance Research Letters has a Double Blind Peer Review policy, thus assuring that the author’s name will not be disclosed to the reviewer. Therefore, we have removed any identifying information, such as authors’ names or affiliations, from the manuscript before submission. João F. Caldeira gratefully acknowledges the support provided by CNPq under grants 430192/2016-9 and 306886/2018-9, and Tiago P. Filomena gratefully acknowledges the support provided by CNPq under grant 302777/2017-2. References Alexander, C., Dimitriu, A., 2002. The cointegration alpha: enhanced index tracking and long-short equity market neutral strategies. ISMA Discussion papers in Finance 8. Alexander, C., Dimitriu, A., 2005. Indexing and statistical arbitrage: tracking error or cointegration? J. Portfolio Manage. 31 (2), 50–63. Arshanapalli, B., Nelson, W., 2008. Cointegration and Its Application in Finance. In: Fabozzi, F.J. (Ed.), Handbook of Finance. Vol. 3. John Wiley & Sons, pp. 701–710. Ch. 61 Avellaneda, M., Lee, J.H., 2010. Statistical arbitrage in the us equities market. Quant. Finance 10 (7), 761–782. Beasley, J.E., Meade, N., Chang, T.J., 2003. An evolutionary heuristic for the index tracking problem. Eur. J. Oper. Res. 148 (3), 621–643. Bonami, P., Biegler, L.T., Conn, A.R., Cornuéjols, G., Grossmann, I.E., Laird, C.D., Lee, J., Lodi, A., Margot, F., Sawaya, N., et al., 2008. An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optim. 5 (2), 186–204. Bonami, P., Lee, J., 2007. Bonmin user’s manual. Numer. Math. 4, 1–32. https://projects.coin-or.org/ Bowen, D.A., Hutchinson, M.C., 2016. Pairs trading in the uk equity market: risk and return. Eur. J. Finance 22 (14), 1363–1387. Boyd, S., Vandenberghe, L., 2004. Convex Optimization. Cambridge University Press. https://web.stanford.edu/~boyd/cvxbook/ Caldeira, J.F., Moura, G.V., 2013. Selection of a portfolio of pairs based on cointegration: a statistical arbitrage strategy. Braz. Rev. Finance 11 (1), 49–80. Dunis, C.L., Ho, R., 2005. Cointegration portfolios of european equities for index tracking and market neutral strategies. J. Asset Manage. 6 (1), 33–52. Engle, R.F., Granger, C., 1987. Cointegration and error correction: representation, estimation and testing. Econometrica 55, 251–276. Huang, T.-C., Tu, Y.-C., Chou, H.C., 2015. Long memory and the relation between options and stock prices. Finance Res. Lett. 12, 77–91. Jacobs, H., Weber, M., 2015. On the determinants of pairs trading profitability. J. Financ. Markets 23, 75–97. Kim, J.H., Kim, W.C., Fabozzi, F.J., 2016. Portfolio selection with conservative short-selling. Finance Res. Lett. 18, 363–369. Sant’Anna, L.R., Filomena, T.P., Caldeira, J.F., 2017. Index tracking and enhanced indexing using cointegration and correlation with endogenous portfolio selection. Q. Rev. Econ. Finance 65, 146–157.
6