Idiosyncratic returns and relative value in the US Treasury market

Idiosyncratic returns and relative value in the US Treasury market

Journal of Empirical Finance 44 (2017) 125–144 Contents lists available at ScienceDirect Journal of Empirical Finance journal homepage: www.elsevier...

3MB Sizes 0 Downloads 120 Views

Journal of Empirical Finance 44 (2017) 125–144

Contents lists available at ScienceDirect

Journal of Empirical Finance journal homepage: www.elsevier.com/locate/jempfin

Idiosyncratic returns and relative value in the US Treasury market Youngju Nielsen a , Raunaq S. Pungaliya b, * a

Department of Economics, Sungkyunkwan University, International Hall Suite 311, 25-2 Sungkyunkwan-ro, Jongno-gu, Seoul, 110-745, Republic of Korea b SKK Graduate School of Business, Sungkyunkwan University, International Hall Suite 339, 25-2 Sungkyunkwan-ro, Jongno-gu, Seoul, 110-745, Republic of Korea

a r t i c l e

i n f o

JEL classification: C5 E4 G1 Keywords: Treasury securities Anomaly Trading strategy Nelson–Siegel

a b s t r a c t This paper documents a simple and implementable security selection strategy that has generated a significantly positive risk-adjusted alpha in the US Treasury bond market from 1990 to 2015. The strategy is based on identifying relatively misvalued securities based on the Nelson–Siegel (1987) curve, while controlling for unobserved bond specific factors that may lead to persistent value effects. These results are surprising as the liquidity and depth of the US Treasury market substantially reduces barriers to arbitrage. Our findings are robust to controls for duration, known risk factors, and a placebo ‘‘random’’ selection strategy. © 2017 Elsevier B.V. All rights reserved.

1. Introduction The study of market efficiency, and by corollary the study of market anomalies, is inextricably linked to the growth of modern finance. The academic literature over time has focused on identifying whether seemingly anomalous returns represent premia for unidentified risks or are a result of barriers to arbitrage (Shleifer and Vishny, 1997). Most documented anomalies relate to the equity market where barriers to arbitrage can be significant; for example, several equity anomaly returns are dependent on small, low-price, low-volume stocks with high idiosyncratic risks, while yet others can be explained by existing risk factors (Fama and French, 1996; Mashruwala et al., 2006). In this study, we focus on one of the most liquid asset markets in the world, the US Treasury market, and show that even in a setting with relatively low barriers to arbitrage, a reasonably simple security selection strategy generates significantly positive risk adjusted returns. In doing so, we provide a direct example of how sophisticated fixed income relative value (FI-RV) investors may have been able to utilize pricing differentials to their benefit (Huggins and Schaller, 2013). The strategy we propose is based on identifying securities that are mispriced relative to both other securities and the security’s historical value according to the Nelson–Siegel yield curve model (Nelson and Siegel, 1987; Diebold and Li, 2006). Our study uses daily data and a comprehensive sample of 1,037 US Treasury bonds and notes traded from 1990 to 2015, with exclusions made for securities with unique pricing features. As the security selection strategy relies only on historical prices (and yields) of the underlying securities, the results of our study shed new light on weak form market efficiency in this market. Our methodology starts with the three factor (level, slope, curvature) Nelson–Siegel model to fit the shape of the yield curve. We use the fitted curve to identify the idiosyncratic or unexplained portion of the yield for each security. Specifically, we subtract the model implied yield from the actual yield of the security to compute residuals from the Nelson–Siegel fit. This provides a crosssectional view of securities that are not priced on the Nelson–Siegel curve. However, certain securities may remain persistently priced off the curve either for known factors such as Illiquidity due to on-the-run/off-the-run effects or for unknown factors such as * Corresponding author.

E-mail addresses: [email protected] (Y. Nielsen), [email protected] (R.S. Pungaliya). https://doi.org/10.1016/j.jempfin.2017.09.003 Received 31 January 2017; Received in revised form 14 July 2017; Accepted 20 September 2017 Available online 10 October 2017 0927-5398/© 2017 Elsevier B.V. All rights reserved.

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

unobserved heterogeneity. Thus, we also take a time-series view and standardize each security’s residual by computing a z-score by subtracting the mean value of the security’s residual over the past three months and scaling the difference by its standard deviation over the same period. The value of this standardized z-score presents a cross-sectional and time-series signal of potentially misvalued Treasury securities. Bonds with low z-scores indicate overvaluation, while those with high z-scores indicate undervaluation.1 Thus, long portfolios focus on the bonds with the highest z-score values, while short portfolios are constructed from bonds with the lowest z-score values. We next implement and study a trading strategy in out-of-sample tests by creating both long only and long short portfolios of 1, 2, 5, and 10 bonds on the basis of the z-score signal as of the portfolio formation date. For example, the long only 10 bond portfolio includes bonds with the 10 highest z-scores on the rebalancing date. Since both long only and long short strategies have their own strengths and weaknesses, we study the characteristics of both in the paper. The long only strategy has an average duration of about 5.8 years, which is very close to the average duration of 5.2 in the overall bond sample. We also study a long short duration neutral strategy which is structured to have a duration of zero. The portfolios are rebalanced weekly based on updated z-score values, and then rolled over. Our analysis shows that the 10 bond long only equally weighted strategy provides an average annualized return of 8.06% over the sample period, with a Sharpe ratio of 1.5, while a 10 bond long short duration neutral strategy provides a return of 2.45%, with a Sharpe ratio of 1.6. As expected, given a similar Sharpe ratio, the lower returns of the long short portfolio are accompanied by lower volatility (1.6%) in comparison to the long only strategy (5.4%). Thus, in both the long only case, and the long short case, the returns from the selection strategy cannot be said to come from a heightened duration exposure. Finally, we note that since all securities are issued by the US government, they all have the same credit risk.2 In order to isolate whether our results are driven by our z-score based selection method, we conduct a placebo simulation where we randomly pick 10 long securities while maintaining all other parameters of the test, including duration, the same. We repeat the random selection procedure 1000 times to generate a distribution of possible returns and then examine its properties. Our tests show that our relative value strategy portfolio returns (8.06%) are significantly higher than the mean placebo portfolio return (5.34%) at a 99% significance level. Accounting for risk gives similar inferences as the Sharpe ratio for the strategy is 1.5 versus a mean of 0.95 for the placebo portfolio, a difference significant at the 99% level. Results from the placebo or random portfolio tests indicate that our selection methodology findings are not an artifact of either the test design or random chance. Transaction costs are an important consideration for the implementation of any strategy. More importantly, high transaction costs can lead to market inefficiency and create barriers to arbitrage. Our base measure of transaction costs assumes a naïve rebalancing strategy where all bonds indicated by the model are rebalanced on the weekly rebalancing date.3 Our analysis with naïve transaction costs suggests that the return for the 10-bond long only strategy are reduced by about 200 bp, but remain positive and significant. We next compute risk-adjusted alphas for transaction cost adjusted returns by controlling for known systematic risk factors following Fama and French (1993) and Jostova et al. (2013). Specifically, we run an OLS regression with Newey–West standard errors that incorporates risk factors for both stocks (market excess returns, size, value, momentum) and bonds (term premium, default premium). Our results suggest that the strategy returns generate significant risk adjusted alpha of approximately 11.86 basis points each week after transaction costs, which translates to an annual alpha of approximately 6.35%. Anomalies often disappear after they are revealed, with informed traders pushing prices to their efficient thresholds. The modified version of the Nelson Siegel model used in this study was published by Diebold and Li in 2006. A Google Scholar search indicates that the earliest working paper version of the paper was online in 2000. In order to test whether returns to the z-score based selection strategy have attenuated through time, we split the sample into three periods (1990–1999, 2000–2009, and 2010–2015) and compute weekly risk-adjusted alphas again. We find that while the alpha of the strategy remained steady in 1990–1999 and 2000–2009 subperiods at 5.45% and 6.03% respectively, it has relatively declined in the more recent 2010–2015 period to 3.56%. In spite of the decline, the alpha remains economically high and statistically significant. Past studies in the bond market have documented anomalous returns, most notably with respect to momentum profitability. For example, Asness et al. (2013) show the presence of equity value and momentum effects in the international government bond market. However, unlike Asness et al. who study the pricing of an entire asset class by examining asset indices, our study is focused on the security level, wherein we identify relatively misvalued individual securities and then group them in a strategic portfolio. In a related paper, Jostova et al. (2013) find significant evidence of individual bond momentum in speculative grade corporate bonds, but find no evidence of momentum in investment-grade bonds consistent with Gebhardt et al. (2005). However, unlike their studies we find the anomaly to be present in the highest rated (AAA US-Treasuries) investment-grade asset class. Differing forms of fixed income relative value investing have been popular at Wall Street trading desks for quite some time (Choudhary, 2006; Huggins and Schaller, 2013). For example, Merrill Lynch, JP Morgan, UBS, amongst others offer daily ‘‘rich/cheap’’ reports about individual securities to market practitioners, where richness/cheapness is relative to some benchmark curve. In a related study, Sercu and Wu (1997) examine the information content in Vasicek (1977), Cox et al. (1985), and spline (McCulloch, 1971) bond model residuals in the Belgian bond market. Our study differs from Sercu and Wu (1997) in two important ways. First, we follow the Nelson–Siegel approach to compute residuals which is not studied in their paper. Second, as our study encompasses a 1

The average and median z-score across all bonds is 0, with 1st and 3rd quartile values being −0.75 and 0.76 respectively. The credit risk of US Treasury securities is generally assumed to be zero in most models. This assumption has been called into question following the global financial crisis. We abstract from this issue in the paper as it has little bearing on our analysis. 3 This measure overestimates the extent of transaction costs because transaction costs can, in practice, be lowered by reducing the frequency of rebalancing and by choosing not to rebalance if the resultant portfolio is not substantially different in expected exposure. 2

126

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Table 1 Summary statistics. In this table we present at-issue summary statistics for the sample of Treasury bonds used in this study. The study includes all US Treasury securities in the CRSP Treasury files that fit the following criteria: (1) the security is traded during the duration of our study i.e. from March 1990 to July 2015, (2) its maturity greater than 2 years and less than 30 years at issue, (3) has more than 1 year to maturity, (4) has a coupon rate greater than 6%, and (5) it is not a callable or a flower bond. These criteria result in a final sample of 1037 bonds. Panel A below presents the original maturity (in years) and coupon rate (%) for this overall sample, while Panel B presents the coupon rate for bonds in the sample sorted in standard maturity buckets. Panel A: Summary statistics for the overall sample

Original maturity (years) Coupon (%)

𝑁

Mean

Standard deviation

Q1

Median

Q3

1,037 1,037

6.66 4.60

7.14 3.02

2.00 2.00

5.00 4.50

7.00 6.50

Panel B: Average coupon rate (%) by original maturity buckets Original maturity

<= 2 years

2 to 5 years

5 to 7 years

7 to 10 years

10 to 20 years

>= 20 years

𝑁 Average coupon (%)

307 3.83

387 3.87

130 4.97

118 6.31

11 9.57

84 7.15

much longer time series (1990–2015 versus 1990–1991), and a larger, more liquid market (US versus Belgium), our paper highlights the long-term sustainability of a relative value strategy in a seemingly informationally efficient market place with low barriers to arbitrage. Finally, our study is also related to Gatev et al. (2006) who study the performance of a relative value ‘‘pairs trading’’ Wall Street arbitrage rule. They show that their trading rule yields excess returns of about 11% for a self-financing portfolio in the equity market. Our study is similar to theirs as both papers focus on a relative value statistical arbitrage strategy with roots in Wall Street. Further, both pairs trading in equity markets and the Treasury bond trading strategy espoused in this paper are simple to implement and rely only on past prices, and thus raise fundamental questions about market efficiency or standard asset pricing models used in the literature. The rest of the paper is organized as follows. Section 2 presents the data and methodology. Section 3 describes our results. Section 4 examines potential economic drivers of strategy returns. Section 5 concludes. 2. Data and methodology 2.1. Data We start with all US Treasury securities in the CRSP Treasury files that fit the following criteria: (1) the security is traded during the duration of our study i.e. from March 1990 to July 2015 (1,266 weeks), (2) its maturity greater than 2 years and less than 30 years at issue, (3) has more than 1 year to maturity, (4) has a coupon rate greater than 6%, and (5) it is not a callable or a flower bond. These criteria are in place to clean the dataset and eliminate short term T-bills, bonds with special liquidity problems, and bonds that have unique option like pricing features that can have a bearing on the price. After filtering, we have a final sample of 1,037 bonds. Our analysis uses the end-of-day price quote computed as the average of the bid and ask for the security. The standard holding-period considered in the paper is 1 week, or 5 trading days. The full unbalanced panel consists of 177,810 bond weeks. Unlike previous studies that use benchmark bonds or the imputed constant maturity curve for fitting, we use all outstanding U.S. Treasury issues (Diebold and Li, 2006).4 Table 1 presents summary statistics for the sample of Treasury bonds used in the study. Panel A describes the distribution of coupon rates and maturity at issue of the bonds. The average (median) bond has a maturity at issue of 6.66 (5.00) years and a coupon rate of 4.60 (4.50) %. Table 1 Panel B shows the number of bonds and the average coupon rate in six predefined buckets based on the maturity at issue (less than 2 years, 2 to 5 years, 5 to 7 years, 10 to 20 years, and more than 20 years). Related to Panel B, Fig. 1 shows the number of securities in the sample by the year of issue. The sample is skewed towards bonds that are less than 5 years to maturity at the time of issue, reflecting the actual distribution of bonds issued. Specifically, the sample has 694 bonds that are less than 5 years to maturity and 343 bonds greater than 5 years at the time of issue. 2.2. Modeling the yield curve There are two standard approaches to model the yield curve, namely, cubic splines and the Nelson–Siegel method. We use the Nelson and Siegel (1987) functional form as described in Diebold and Li (2006) as it is both convenient and parsimonious, but also has an economic interpretation. The yield curve assumes a variety of shapes through time, including upward sloping, downward sloping, humped and inverted. The yield curve in the Nelson–Siegel framework can assume all of those shapes by variation of the three beta parameters. The Diebold and Li characterization of the Nelson–Siegel model entails a three-factor exponential approximation of the yield curve as described below. ) ( ) ( 1 − 𝑒−𝜆𝑡 𝜏 1 − 𝑒−𝜆𝑡 𝜏 + 𝛽2𝑡 − 𝑒−𝜆𝑡 𝜏 (1) 𝑦𝑡 (𝜏) = 𝛽0𝑡 + 𝛽1𝑡 𝜆𝑡 𝜏 𝜆𝑡 𝜏 4 We repeat our main tests using the constant maturity treasury curve and find that they are qualitatively similar. These results are discussed in the robustness section and available from the authors upon request.

127

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Fig. 1. Bond issue years. This figure presents number of Treasury securities in the sample by the original year of issue.

Table 2 Estimated Nelson–Siegel coefficients. In this table we present summary statistics for Nelson (𝛽1)− 𝑙𝑒𝑣𝑒𝑙, 𝛽2 − 𝑠𝑙𝑜𝑝𝑒, 𝛽3 − 𝑐𝑢𝑟𝑣𝑎𝑡𝑢𝑟𝑒) estimated on each ( Siegel ) model ( factors −𝜆𝜏 1−𝑒−𝜆𝜏 ̂ 1−𝑒 trading day for our sample period. The Nelson–Siegel model is given by 𝑦̂𝑡 (𝜏) = 𝛽̂ + 𝛽̂ − 𝑒−𝜆𝜏 . We also include the percentage of days the given 0𝑡 + 𝛽1𝑡 2𝑡 𝜆𝜏

𝜆𝜏

coefficient was significant at the 5% level. The last three columns of the table show autocorrelation coefficients of the factors over a 1 day, 5 trading day (weekly) and 20 trading day (monthly) horizon. Factor

Mean

Std. Dev.

Q1

Median

Q3

% days significant at 5%

𝜌̂ (1)

𝜌̂ (5)

𝜌̂ (20)

𝛽̂1 𝛽̂2 𝛽̂3

5.890 −2.557 −2.740 0.974

1.586 1.834 2.430 0.059

4.950 −4.087 −4.412 0.976

5.810 −2.755 −2.179 0.994

6.950 −0.979 −0.939 0.998

99.97% 87.55% 61.10%

0.999 0.998 0.996

0.994 0.993 0.985

0.978 0.975 0.946

Adj. R2

In the Nelson–Siegel model, 𝛽0𝑡 , 𝛽1𝑡 , and 𝛽2𝑡 can be interpreted as latent dynamic factors. The loading on 𝛽0𝑡 is 1 and does not decay, while the loading on 𝛽1𝑡 starts at 1 but decays monotonically and quickly to 0. The loading on 𝛽2𝑡 starts at 0, increases and then decays to zero. Diebold and Li (2006) insightfully show that the three factors while related to long term, short term, and the medium term can also be interpreted in terms of level, slope, and curvature of the yield curve. Finally, the 𝜆𝑡 parameter determines the exponential decay rate. An increase in 𝜆𝑡 increases the rate of decay. The tradeoff in increasing the rate of decay is that low values of the 𝜆𝑡 parameter (slow decay) give us a better fit of the curve at long maturities, while large values of 𝜆𝑡 work better at short maturities. Following Diebold and Li (2006), we use a fixed 𝜆𝑡 = 0.6 for our analysis and maximize the loading of 𝛽2𝑡 around the 3-year mark. We fit the Nelson–Siegel model for each trading day using yields on all available bonds that fit our criteria described above. This gives us a time series of 𝛽0𝑡 , 𝛽1𝑡 , 𝛽2𝑡 parameters that define the Nelson–Siegel curve, and a corresponding panel of residuals for each bond for each trading day. Specifically, the residual for each bond 𝑖 on trading day 𝑡 is 𝑅𝑖𝑡 = 𝑦𝑖𝑡 − 𝑦̂𝑖𝑡 , where 𝑦̂𝑖𝑡 is the implied yield of the bond based on the Nelson–Siegel model, and 𝑦𝑖𝑡 is the actual yield to maturity of the bond. As we use the full set of bonds available, there can be extreme outliers that can influence the entire fit of the yield curve. In order to reduce the impact of outliers, we follow a two-step process. In the first step, we fit the Nelson–Siegel model with the entire data and mark extreme outliers as those with a residual value greater than two standard deviations of the residual distribution. In the second step, we fit the regression again excluding the marked outliers to get a final estimate of the Nelson–Siegel betas. In Table 2, we present statistics that describe the daily fit with the estimate of 𝛽0𝑡 , 𝛽1𝑡 , 𝛽2𝑡 after excluding outliers. We report the mean, standard deviation, and Q1, median, Q3 statistics from the time series of beta estimates over the sample period. We find that 𝛽̂0 , 𝛽̂1 , 𝛽̂2 are significant at the 5% level on 99.97%, 87.55%, and 61.10% of days. Furthermore, we find that the yield curve shape generally persists as seen by autocorrelation coefficient computed for the three parameters over a 1, 5 (week), and 20 (month) trading day period. An additional issue with using all bonds is that bond maturities are not regularly spaced as in the case of a Constant Maturity curve. Thus, our procedure implicitly weighs the active region of the yield curve more heavily when fitting the model. For example, it is harder to fit the yield curve to 30 years than 10 years because of the same reason, as the long maturity range of the curve is sparsely populated. We repeat our main analysis using a constant maturity curve and find that our results are qualitatively unchanged. Fig. 2 shows a time series plot of the Nelson–Siegel beta parameters over the sample period from 1990 to 2015, along with their respective time series autocorrelation over a long duration. In general, we note that the level of the yield curve has seen a mostly 128

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Fig. 2. Time series evolution of Nelson–Siegel Betas. This figure presents the time series evolution of betas estimated using the Nelson–Siegel model of 𝑦𝑡 (𝜏) = ( −𝜆 𝜏 ) ( −𝜆 𝜏 ) 𝑡 𝑡 + 𝛽2𝑡 1−𝑒𝜆 𝜏 − 𝑒−𝜆𝑡 𝜏 , estimated daily from March 1990 to July 2015 in the left panel. In addition to the fitted beta values, the figure also presents 𝛽0𝑡 + 𝛽1𝑡 1−𝑒𝜆 𝜏 𝑡

𝑡

their respective time series autocorrelation and decay in the right panel.

secular decline as seen by the downward trend in the 𝛽0 parameter. The 𝛽1 or slope parameter exhibits volatility at a lower frequency. The moves in the parameter can be generally traced to the economic cycle with the parameter rising in ‘‘good’’ times (1992–2000, 2004–2007) and falling in ‘‘bad’’ times (2001, 2007–2010). The 𝛽2 parameter is more difficult to interpret, but it plays a significant role in characterizing the curvature of the yield curve at any given time. Fig. 3 shows six examples of daily fitting on six randomly chosen days in our sample. The figure illustrates different yield curve shapes fitted by the Nelson–Siegel model along with actual bond yields in circles. 2.3. Residual from the fitted curve In the previous section, we outlined the process with which we fitted the Nelson–Siegel model, using an average of about 150 U.S. Treasury notes and bonds on any given day and computed the residual for each security based on the difference between the actual yield and the fitted yield. The residual yield represents the portion of yield that the three Nelson–Siegel beta factors cannot explain. Table 3 presents the distribution (minimum, Q1, median, mean, Q3, and maximum) of the residuals for bonds of varying time to maturity. Residuals in the table are multiplied by 100 for ease of exposition (so, a residual of 0.50 refers to 0.50%). While the median residual across all maturities is pretty close to zero at −0.007%, the interquartile range is wider and goes from −0.041% to 0.025%. We now ask whether residual changes are related to yield changes. If we find that residual changes are related to yield changes, we can use the predicted residual change as a factor to explain bond returns. Furthermore, as residual changes can be computed and predicted, a finding of return predictability can likely be used to implement a trading strategy. We run the following regression for each bond in the sample with more than 300 days to maturity: 𝛥𝑦𝑖𝑡 = 𝛼𝑖 + 𝛽𝑖 𝛥𝑅𝑖𝑡 + 𝜀𝑡𝑖 Here,

𝛥𝑦𝑖𝑡

=

𝑦𝑖𝑡

− 𝑦𝑖𝑡−1 ,

𝛥𝑅𝑖𝑡

=

(2) 𝑅𝑖𝑡

− 𝑅𝑖𝑡−1 ,

and

𝑅𝑖𝑡

=

𝑦𝑖𝑡

− 𝑦̂𝑖𝑡 . 129

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Fig. 3. Examples of daily fitting. In this figure we show examples of the Nelson Siegel yield curve fit on six randomly chosen days in our sample. The X axis shows the time to maturity in days and the Y axis is the yield in decimals. The red cross and the blue square represent bonds that are classified to the short and long position buckets respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Table 4 reports a summary of the regression results. Panel A shows results for the overall sample and Panel B shows the results in subsamples sorted on original maturity of the bond. We see that 𝛽̂ is significant at the 1% level in both the overall sample and in subsamples based on original maturity. Overall, we find that the residual change is significantly related to the contemporaneous yield change. 2.4. Introducing z-score: A measure of richness vs. cheapness in Treasury bonds Relative value arbitrage in fixed income trading is a strategy that entails taking long and short positions for cheap and rich fixed income securities respectively. In general, richness and cheapness is decided based on comparison with other issues on the same credit curve. In the case of U.S. Treasury bonds all issues have the same credit – that of the US government. Therefore, the relationship of a particular Treasury security to the yield curve can be a good and simple indicator of its richness and cheapness. A positive residual 130

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Table 3 Nelson–Siegel yield curve residuals. We fit the 3-factor Nelson–Siegel model as presented in Table 2 and compute residuals as the difference between the actual yield and the implied yield by the Nelson Siegel model. In this table, we present the distribution of yield curve residuals for bonds of varying maturity. In addition, the last three columns show the residual autocorrelations at weekly (5 trading days), bi weekly (10 trading days), and monthly (20 trading days) levels. Maturity

Min

Q1

Median

Mean

Q3

Max

𝜌̂ (5)

𝜌̂ (10)

𝜌̂ (20)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Average

−0.195 −0.107 −0.190 −0.242 −0.303 −0.317 −0.535 −0.927 −1.006 −0.471 −0.204 −0.270 −0.193 −0.196 −0.198 −0.369 −0.252 −0.253 −0.293 −0.372 −0.457 −0.529 −0.147 −0.095 −0.118 −0.280 −1.041 −1.080 −1.118 −1.142 −0.430

−0.037 −0.011 −0.017 0.000 −0.031 −0.063 −0.076 −0.082 −0.087 −0.127 −0.091 −0.072 −0.035 −0.014 0.006 0.002 0.006 0.020 0.016 0.015 0.001 −0.013 −0.010 −0.008 −0.018 −0.042 −0.063 −0.096 −0.131 −0.190 −0.041

−0.015 0.007 0.014 0.013 −0.015 −0.037 −0.042 −0.054 −0.060 −0.078 −0.031 −0.015 0.000 0.020 0.031 0.029 0.035 0.035 0.036 0.038 0.031 0.028 0.019 0.017 0.008 −0.010 −0.019 −0.024 −0.060 −0.107 −0.007

−0.020 0.006 0.008 0.002 −0.021 −0.037 −0.053 −0.075 −0.068 −0.092 −0.032 −0.015 0.007 0.021 0.025 0.014 0.026 0.032 0.029 0.023 0.012 0.011 0.020 0.022 0.014 −0.007 −0.034 −0.054 −0.079 −0.108 −0.014

0.007 0.023 0.034 0.024 0.004 0.002 −0.009 −0.029 −0.024 −0.040 0.007 0.023 0.045 0.050 0.050 0.046 0.049 0.053 0.050 0.055 0.057 0.052 0.056 0.055 0.049 0.039 0.026 0.015 −0.001 −0.019 0.025

0.776 0.078 0.184 0.096 0.077 0.093 0.261 0.349 0.221 0.208 0.518 0.487 0.445 0.302 0.236 0.133 0.107 0.141 0.130 0.122 0.135 0.174 0.153 0.132 0.180 0.160 0.220 0.194 0.205 0.235 0.225

0.901 0.921 0.957 0.961 0.955 0.946 0.960 0.971 0.951 0.975 0.980 0.981 0.977 0.967 0.964 0.962 0.933 0.942 0.950 0.956 0.972 0.969 0.980 0.970 0.963 0.969 0.979 0.981 0.983 0.973

0.871 0.868 0.925 0.928 0.915 0.900 0.922 0.943 0.904 0.954 0.960 0.965 0.957 0.947 0.938 0.933 0.890 0.900 0.906 0.918 0.946 0.943 0.965 0.946 0.934 0.942 0.955 0.960 0.964 0.934

0.813 0.775 0.857 0.866 0.846 0.820 0.841 0.890 0.807 0.918 0.932 0.941 0.928 0.916 0.883 0.869 0.821 0.816 0.802 0.822 0.880 0.873 0.941 0.904 0.885 0.901 0.901 0.912 0.921 0.849

Table 4 Nelson–Siegel residuals and yield changes. This table presents a regression of Nelson–Siegel residuals on yield changes. Specifically, we model 𝛥𝑦𝑖𝑡 = 𝛼𝑖 + 𝛽𝑖 𝛥𝑦̂𝑖𝑡 + 𝜀𝑡𝑖 . ( −𝜆 𝜏 ) 𝑡 𝑖 Here, 𝛥𝑦𝑖𝑡 = 𝑦𝑖𝑡 − 𝑦𝑖𝑡−1 and 𝛥𝑦̂𝑖𝑡 = 𝑦̂𝑖𝑡 − 𝑦̂ , where 𝑦𝑖𝑡 is the Treasury yield at time t and 𝑦̂𝑖𝑡 is the fitted yield value from Nelson–Siegel model of 𝑦𝑡 (𝜏) = 𝛽1𝑡 + 𝛽2𝑡 1−𝑒𝜆 𝜏 + 𝑡−1 𝑡 ( −𝜆 𝜏 ) 1−𝑒 𝑡 −𝜆𝑡 𝜏 𝛽3𝑡 − 𝑒 . Regressions are run separately for each security with more than 300 days to maturity, and the output summarized below. 𝜆𝜏 𝑡

Panel A: Regression summary for the overall sample (𝑁 = 700 regressions)

𝛽̂ t-statistic Adjusted R2

Mean

Standard deviation

Q1

Median

Q3

1.057 7.49 0.073

1.116 7.32 0.093

1.039 6.87 0.047

0.732 4.45 0.025

1.311 10.62 0.080

Panel B: Mean statistics by original maturity buckets

𝛽̂ t-statistic % issues significant at 5% % issues significant at 1% Adjusted R2 𝑁

2 to 5 years

5 to 7 years

7 to 10 years

10 to 20 years

>= 20 years

0.933 6.02 90.4 86.0 0.055 386

1.938 10.06 100.0 100.0 0.110 105

1.699 12.52 100.0 100.0 0.097 113

0.910 11.10 100.0 100.0 0.068 28

−0.605 2.02 97.1 94.1 0.075 68

indicates that the issue is richly priced in comparison to others, while a negative residual indicates that a security is cheap relative to similar others. However, there are some U.S. Treasury issues that always look cheap or rich. These are usually issues with very limited liquidity. These also show extreme residual values in relation to the fitted Nelson–Siegel curve. Autocorrelation of daily returns is an important issue for the Nelson–Siegel model (Diebold and Li, 2006). This autocorrelation can show up in Nelson–Siegel betas, but more importantly in the residuals, the transformed version of which form the basis of our z-score signal. Our two-step z-score methodology accounts for this persistence in the residuals by standardizing them. The second step of the z-score construction, the time series correction, accounts for this issue by standardizing each security’s residual by subtracting the mean value of the security’s residual over the past 60 trading days (approximately 3 months) and scaling the difference by its standard deviation over the same period. In 𝑅𝑖 −𝐸 (𝑅 ) particular, a standardized residual is computed using the equation, 𝑅_𝑧𝑖𝑡 = 𝑡𝜎 𝑅 𝑖 . This standardized value is the z-score that forms ( 𝑖) 131

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

the basis for our analysis. Without the standardization and time series correction performed in step two, the z-score signal would have little meaning. In order to standardize every security’s residual, The standardization procedure is repeated for each security for each day on a rolling basis, and acts in a similar manner as a fixed effect to account for unobserved security characteristics. 2.5. Portfolio construction We now outline a trading strategy that uses the z-score as computed above to identify bonds that are potentially richly or cheaply priced. We generate portfolios that contain U.S. Treasury issues with the largest and smallest standardized residuals 𝑅_𝑧𝑖𝑡 after crosssectional comparison at each day t. In order to understand the tradeoffs between increasing the number of bonds in the portfolio, we study various portfolio sizes from 1 bond to 10 bonds. On the one hand, a smaller portfolio size can ensure that only the most potentially mispriced bonds are picked and maximize the chance of a higher return. On the other hand, a larger portfolio size increases diversification and reduces the risk of misidentifying a particular bond’s value. This comes at a cost as the 10th highest or lowest residual may not be as richly or cheaply priced reducing potential absolute returns. If the z-score metric fails at identifying richly or cheaply priced securities, returns to the strategy portfolio should be in line with the benchmark and not be abnormally large or significant. We begin with a simple strategy of apportioning exactly the same weight to each security. Once the number of securities to pick has been decided, we can use more sophisticated portfolio construction techniques to allocate weights to determine individual positions. In the base case, a long only 5-bond portfolio will have an equal weight of 0.2 for each treasury security. A long short portfolio with 5 bonds will have a weight of 0.2 for 5 long securities and −0.2 for 5 short securities. As a result, each portfolio is quite close to being dollar neutral (zero-cost or self-financing), but is not duration neutral. We also study an alternate weighting scheme wherein we assign weights across long and short securities to make the overall portfolio duration neutral. Here, we use a quadratic programming solver to compute the duration neutral weights that maximize the portfolio z-score. At the security level, returns are calculated based on the selected security’s dirty price, where the dirty price is the sum of clean price and the coupon payment. 2.6. Strategy summary The trading signal proposed above is based on the z-score signal, computed using a transformation of the residuals of the Nelson– Siegel model. The strategy uses historical price data, and all tests are conducted out of sample. In summary, the procedure involves the following steps: 1. Clean and filter bond data for use (Section 2.1). This removes non-standard bonds and bonds with special pricing features from our pool. 2. Fit the Nelson–Siegel model using the filtered set of bonds on each trading date. Remove bonds with residuals more than two standard deviations, and fit the Nelson–Siegel model again (Section 2.2). These residuals will be used to construct the z-score. 3. Apply a time series correction to standardize the above residuals, by subtracting the mean z-score of the security for the last 60 trading days and dividing by the standard deviation over the same period (Section 2.4). The standardized residuals are the z-score signal and available daily. The z-score signal has a mean value of 0, with large values of indicating undervaluation, and low values indicate overvaluation. 4. Use the z-score signal to form portfolios, assigning high z-score bonds to long portfolios and low z-score bonds to short portfolios.

3. Results 3.1. Risk and return statistics Table 5 documents risk and return characteristics of long only equally weighted US Treasury bond portfolios constructed on the basis of the z-score, which is based on the standardized Nelson–Siegel residual. The table shows the average annualized return, volatility, Sharpe ratio, and maximum drawdown for 1, 2, 5, and 10 bond portfolios over the sample period from 3∕15∕1990 to 7∕30∕2015 . The holding period for all the portfolios is one week, after which the list of bonds in the portfolio is updated and rebalanced based on current z-score values. We find that the long only equally weighted strategy portfolio returns between 9.58% and 8.06% annualized for the sample period, with both returns and the Sharpe ratio slightly decreasing with an increase in the number of bonds in the portfolio. However, the annualized volatility of the portfolio for 1, 2, 5 and 10 bond portfolios is quite similar and lies from 5.21% to 5.36%. Thus, increasing the size of the portfolio seems to hurt both absolute and risk adjusted performance in the equally weighted case. In order to provide a visual sense of the performance of the strategy, we plot the cumulative return for the various portfolios in Fig. 4 for the entire sample period. Panel A represents a portfolio with 1 long position, Panel B represents an equally weighted portfolio with 10 long positions. The figure also shows the weekly return and the drawdown over time. 132

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

(a) Panel A.

Fig. 4. Portfolio return and drawdown over time. In this figure we plot the cumulative return, the weekly return, and the drawdown for each week the portfolio is rebalanced from 3∕15∕1990 to 7∕30∕2015. The figure is split into two panels representing variations on the investment strategy. Panel A represents a portfolio with 1 long position, Panel B represents an equally weighted portfolio with 10 long positions.

133

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

(b) Panel B.

Fig. 4. (continued)

134

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Table 5 Risk and return statistics: Long only portfolios. This table documents risk and return characteristics of US Treasury bond portfolios (long only) of various sizes (1, 2, 5, and 10 bond) generated based on deviations from the Nelson–Siegel curve as discussed in Section 2.4. Panel A presents statistics related to the return, volatility, Sharpe ratio and maximum drawdown of the various long only portfolios. Panel B describes the duration distribution of the portfolios generated in Panel A. 1 bond portfolio

2 bond portfolio

5 bond portfolio

10 bond portfolio

9.58 5.29 1.811 6.51

9.16 5.21 1.760 6.18

8.33 5.24 1.589 6.86

8.06 5.36 1.503 7.75

1 bond portfolio

2 bond portfolio

5 bond portfolio

10 bond portfolio

2.81 4.69 5.88 7.89

3.18 4.97 5.83 7.62

3.30 5.02 5.82 7.49

Panel A: Long only portfolio risk and return summary (equally weighted) Return (annualized %) Volatility (annualized %) Sharpe ratio Maximum drawdown

Panel B: Long only portfolio duration summary (equally weighted) 1st quartile Median duration Mean duration 3rd quartile

2.39 4.38 5.94 7.84

Table 6 Risk and return statistics: Long only quintile portfolios. This table is similar to Table 5 with one key change. The strategy portfolio considered here are broader and formed using bonds in the top and bottom quintile ranked by z-score. The middle portfolio consists of all the bonds in the 2nd-4th quintiles. For comparison, we also include risk and return statistics for top 10 bond (base model) and bottom 10 bond long only portfolios.

Return (annualized %) Volatility (annualized %) Sharpe ratio Maximum drawdown

Bottom 10

Bottom quintile

Middle quintiles

Top quintile

Top 10

All

4.59 4.35 1.05 8.18

5.14 4.45 1.15 7.84

6.91 4.60 1.50 7.08

7.22 5.19 1.40 8.22

8.06 5.36 1.50 7.75

6.62 4.56 1.45 7.31

3.2. Placebo test returns In order to test whether the z-score based selection strategy is driving the positive returns, we create a placebo portfolio where the portfolio size, holding period, and bond universe is held constant, but bonds are selected at random and not according to the z-score on the rebalancing date. We ensure that the duration of the randomly selected portfolio matches that of our strategy portfolio (see Fig. 5).5 We repeat this random selection process 1000 times and study the characteristics of the resulting placebo portfolio distribution. We find that returns for the strategy portfolio are both economically and econometrically significantly greater than the ‘‘random’’ or placebo portfolio. The distribution of the random portfolio return has a mean value of 5.34% and a 99% confidence interval range between 5.15% and 5.53%, versus 8.06% for our 10-bond strategy portfolio. The average volatility of the random portfolio (5.61%) is also a little higher than the strategy portfolio (5.36%). Taken together, the Sharpe ratio of the strategy at 1.50 dominates the average Sharpe ratio of the random portfolio which stands at 0.951 (with a standard error of 0.004).

3.3. Dispersion of returns across z-scores: A test using quintile portfolios The portfolios considered above focus on 1 to 10 bonds with the highest z-score. An important consideration in understanding the z-score and implementing the strategy is to test whether there exists dispersion of returns across the z-score range. To this end, we form quintile portfolios based on the z-score and examine its properties. For comparison, we keep the holding period the same at 5 trading days (1 week). These results are presented in Table 6 and are broadly consistent with our expectations. The top quintile portfolio has an annualized return of 7.22%, compared to the bottom quintile portfolio of 5.14% – a difference of 208 basis points. This difference in returns between the top and bottom quintile is not only statistically, but also economically significant. Returns of portfolios in the middle (quintiles 2 to 4), lie in between at 6.91%. Furthermore, as it is more extreme, returns to a portfolio of the bottom 10 bonds ranked by z-score, performs even lower the bottom quintile, at 4.59%. Similarly, returns to a portfolio of the top 10 bonds performs better than the top quintile, at 8.06%. Cumulative returns for the various quintile portfolios for our sample period are presented in Fig. 6 and show consistent results. In the figure, q100 represents the top quintile, while q20 represents the bottom quintile. Cumulative returns of the q100 portfolio far outperforms, the q20 portfolio over time. An aggregate index constructed by equally weighting ‘‘all’’ bonds in the sample is also included for comparison. These tests thus document the dispersion in returns of portfolios across the z-score range.

5 The procedure is as follows. First, we sort our equally weighted portfolio constituents into duration buckets of 1–2 years, 2–5 years, 5–7 years, 7–10 years, 10–15 years, and 15 years and above. We then randomly select exactly the same value of notes and bonds within these buckets for the placebo portfolio.

135

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Fig. 5. Duration of random versus equally weighted portfolio. In this figure we plot the duration of the z-score strategy portfolio (equally weighted, 10 long bond), from Table 5, versus the duration of the matched random portfolio for our placebo test.

Fig. 6. Cumulative portfolio returns for quintile portfolios based on z-scores. In this figure we plot the cumulative return for quintile portfolios based on z-scores from 3∕15∕1990 to 7∕30∕2015. In addition, we also include an aggregate index which represents returns from an equally weighted portfolio of ‘‘all’’ bonds in the sample.

3.4. The effect of holding periods

The holding period used in the tests above is one week (5 trading days), after which the list of bonds in the portfolio is updated and rebalanced based on current z-score values. This holding period length was chosen to balance the quality of the z-score signal and potential transaction costs from rebalancing. As we increase the holding period, we find that the returns to the strategy go down, indicating not only that the z-score signal becomes stale over time, but also that bond prices are moving to their expected levels. Specifically, in comparison to the 5-day holding period average annualized return of 8.06% for the 10-bond long only portfolio, the 10, 15, and 20 day holding period returns are 7.28%, 6.69%, and 5.97% respectively. For the rest of the paper, we maintain the holding period at 5 days. 136

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Fig. 7. Cumulative portfolio returns for portfolios formed using lagged z-scores. In this figure we plot the cumulative return for the 10-bond long only portfolios based on lagged z-scores from 3∕15∕1990 to 7∕30∕2015. Portfolios are formed using 1 day to 5 day lagged z-score values.

3.5. Value weighting versus equally weighting All of our portfolios are created by equally weighting bonds in the sample for simplicity. A case could be made to use value weighting, either by the outstanding value of the bond or by the bond’s specific z-score value in constructing our portfolios. Value weighting by the outstanding value of the bond will focus the portfolio on larger issues. These maybe more readily available, but our model and z-score signal is agnostic to the bond issue size. Value weighting by z-score would focus the portfolio on bonds with larger deviations in terms of the z-score, and could potentially increase returns. However, this would largely depend on the difference in z-scores between the top 10 bonds considered. We examine both value weighting schemes and find that they do not seem to affect our results. Compared to an annual return of 8.06% and a standard deviation of 5.36% for the equally weighted long only 10-bond portfolio, returns to portfolios formed by value weighting by the outstanding value of the bond perform similarly, with slightly lower returns of 7.78% and a standard deviation of 5.3%. Portfolios formed by value weighting using the bond’s specific z-score values result in slightly higher (overweighting high z-score bonds, underweighting low z-score bonds), but similar results with returns of 8.16% and a standard deviation of 5.31%. For simplicity, and ease of exposition, we continue to use equally weighted portfolios in our analysis.

3.6. Signal timeliness: Using lagged z-scores An important consideration in any practical implementation of an anomaly is the value relevance and eventual decay in the information content of the underlying signal. We study this by constructing portfolios using 0 to 5 day lagged z-scores. As our holding period is 5 days and the Treasury market is highly liquid, these delays between 1 to 5 days can be considered as very large. In modern trading environments, up-to-the-second z-score computation, automated portfolio formation, and high frequency trading are commonplace. Nevertheless, examining properties of portfolios formed using lagged z-scores helps us understand not only the timeliness of the signal and its relevance in practical implementation, but also whether bond prices are moving in expected directions. We keep the holding period fixed as 5 days as in our main tests for comparison. We expect that the z-score signal would become less informative with increasing lags, resulting in reduced returns to the strategy portfolio. We find this to be the case as seen in cumulative return plots presented in Fig. 7. Specifically, in the case of the long only 10 bond portfolio, returns reduce from 8.06% for the portfolio with no lag, to 6.67%, 6.33%, 6.06%, 5.54%, and 5.47% for portfolios with 1, 2, 3, 4, and 5 day lags respectively. This suggests two important things. First, the z-score signal is informative as seen by the monotonic decay. If the signal did not contain information about future returns and was random, one would not observe a pattern in the lag returns. Second, the decay suggests that market participants (and thus bond prices) react to the public z-score signal and move towards the expected direction reducing profitability of the strategy over time. However, the adjustment is not immediate as 1-day lag returns are greater than 5-day lag returns suggesting inefficiencies in the process. So, even if a trader formed portfolios with a 1-day old z-score, she would still profit. 137

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Table 7 Risk and return statistics: Long short and duration neutral portfolios. This table documents risk and return characteristics of US Treasury bond portfolios (long/short) of various sizes (1, 2, 5, and 10 bond) generated based on deviations from the Nelson–Siegel curve as discussed in Section 2.4. Panel A presents statistics related to the return, volatility, Sharpe ratio and maximum drawdown of the various equally weighted long/short portfolios. Panel B shows the same for duration neutral long/short portfolios. 1 bond portfolio

2 bond portfolio

5 bond portfolio

10 bond portfolio

5.23 3.66 1.429 8.88

4.26 3.18 1.340 8.18

3.33 2.89 1.152 6.35

2.48 2.51 0.986 11.38

2.56 1.92 1.334 12.44

2.46 1.58 1.561 6.47

Panel A: Long/short portfolio (equally weighted) Return (annualized %) Volatility (annualized %) Sharpe ratio Maximum drawdown

5.95 4.13 1.442 11.04

Panel B: Long/short portfolio (duration neutral) Return (annualized %) Volatility (annualized %) Sharpe ratio Maximum drawdown

1.46 3.38 0.431 20.09

3.7. Long short and duration neutral portfolio returns We next consider both equally weighted and duration neutral long short portfolios in Table 7. Long short portfolios have several distinct advantages. First, the short side of the exposure can help finance the long position. Second, the amount of capital required for a leveraged position is also significantly reduced as the cost of broker financing is related predominantly to the duration exposure. Third, a long short position may exhibit lower volatility than a long only exposure. Fourth, long short portfolios provide a ready comparison to traditional studies in the equity anomaly literature that have focused on long short zero-cost portfolios. Panel A of Table 7 presents results for equally weighted long short portfolios, while Panel B presents results for duration neutral long short portfolios. We make the following observations. 1. Both the returns and the volatility of the equally weighted long short portfolio (Table 7 Panel A) are lower than the equally weighted long only portfolio (Table 6 Panel A). This may not be an issue if the Sharpe ratio is maintained. However, the Sharpe ratio of the equally weighted long short portfolio is also lower in comparison to the respective same-size long only portfolio. 2. Significantly, the 10-bond z-score optimized duration neutral long short portfolio (Table 7 Panel B) has a better Sharpe ratio at 1.56 in comparison to the equally weighted long short portfolio at 1.15, and the equally weighted long only portfolio at 1.50. Unlike equally weighted portfolios, the Sharpe ratio of duration neutral portfolios seem to increase with portfolio size. Moreover, the 10 bond duration neutral long short portfolio exhibits significantly lower volatility at 1.58% compared to 2.89% for equally weighted long short portfolio and 5.36% for the corresponding long only portfolio. 3. The maximum drawdown for the long short portfolio is a function of the portfolio size. The maximum drawdown for the equally weighted long short portfolio reduces from 11.04% to 6.35% as we move from a 1 bond portfolio to a 10 bond portfolio. Similarly, the drawdown reduces from 20.09% to 6.47% in the case of the duration neutral long short portfolio. 4. As Sharpe ratios and the maximum drawdown for the long short 10 bond portfolio, and the long only 10 bond portfolio are similar, the final choice between the two approaches would rest on the cost of financing and the intended leverage position. It would be incorrect to focus solely on the higher return of the long only portfolio and decide in its favor.

3.8. Transaction costs As previously noted, the US Treasury market is one of the most liquid markets in the world with large issue sizes and narrow bid–ask spreads (Chakravarty and Sarkar, 1999). In more recent times, the exponential growth of automated trading in the US Treasury market has arguably improved liquidity through increased order flow and competition (New York Fed, 2015). Engle et al. (2013) and Adrian (2013) show that bid–ask spreads in the US Treasury market were around 0.6/32nds, 0.3/32nds, and 0.25/32nds from 2006 to 2012 except for a brief spike during the global financial crisis. We first compute transaction cost adjusted returns using one half of the end-of-day bid–ask spread of each issue on the rebalancing date. We find that the return (Sharpe ratio) for the long only equally weighted portfolio is only 6.13% (1.15) after transaction costs, compared to 8.06% (1.50) before transaction costs – a cost of approximately 200 basis points annually. Thus, accounting for transaction costs results in lower, but still positive returns. The Sharpe ratio is one half of the original, but still higher than a buy and hold strategy. In practice, a trader might not always trade on a signal if he believes that the replacement bond has an exposure that is close to the one already in the portfolio and replacing will just add transaction cost. Thus, transaction costs implied using the ‘‘naïve’’ procedure above has an upward bias as it assumes a forced rebalancing of all bonds as dictated by the model. Consequently, these computed transaction cost adjusted returns represent a lower bound for the strategy. This is an important consideration for the long/short strategy as it entails twice the number of securities as the long only strategy. In response, we also analyze our results using a second transaction cost measure that improves upon the naïve rebalancing strategy by assuming a ‘‘smart’’ trader that does not rebalance if securities in the new portfolio have the same z-score compared to those in the 138

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

existing portfolio.6 We find that the return (Sharpe ratio) of the long short duration neutral portfolio is 0.98% (0.63) after accounting for transaction costs, compared to 2.46% (1.56) before transaction costs. While the smart transaction costs proxy can be considered as a practical improvement over the naive measure as changes in exposure are an important consideration for rebalancing, one could argue that this proxy also has limitations as a trader may look at other factors and security characteristics before making the decision to buy or sell. Thus, in order to be conservative, we discuss results using only the naïve transaction cost measure which has relatively higher transaction costs for tests that measure alpha in the next section. Finally, a trader could vary the number of securities in the portfolio to trade off diversification benefits and transaction costs. However, such an analysis is beyond the scope of the paper. 3.9. Systematic risk adjusted portfolio alphas A simple regression of the weekly strategy portfolio return after transaction costs on the Barclays US Treasury Index has a beta of 0.76 on the index and an alpha of 10.8 basis points, both significant at a 99% confidence level. Furthermore, a non-parametric test that tallies weekly performance for the sample period indicates that our strategy portfolio beats Barclays Treasury index return with odds much greater than chance with a win-loss ratio of 60:40.7 In this section, we examine whether transaction cost adjusted returns to the strategy represent alpha or are a compensation for known systematic risks or pricing factors. We follow Jostova et al. (2013) who study momentum in corporate bond portfolios for this analysis. Their procedure uses an OLS regression with Newey–West adjusted standard errors to estimate alpha, and combines both stock and bond market factors (Fama and French, 1989; Gebhardt et al., 2005). 𝑟𝑝𝑡 = 𝛼 𝑝 + 𝛽 𝑝 .𝑭𝑡 + 𝜀𝑝𝑡

(3)

𝑟𝑝𝑡

Here, the return refers to the weekly excess return on the test portfolio, and 𝑭𝑡 represents the vector of systematic risk factors identified in the prior literature. These factors include the market excess return, the size and value factors (SMB, HML), momentum (UMD), and changes in the term spread (TERM) and the default spread (DEFAULT). Table 8 presents the results using the 10-bond equally weighted long only portfolio. Model (8.1) includes only the market excess return, model (8.2) adds the Fama–French size, book-to-market and Carhart momentum factors, model (8.3) includes only bond market factors i.e. changes in the term and default spread, while model (8.4) combines both bond and stock factors in (8.2) and (8.3). Finally, model (8.5) uses an alternate scaled proxy for term and default spreads (mTERM and mDEFAULT) following Jostova 𝛥DEFAULT 𝛥TERM , and the mDEFAULT proxy is equal to 1+DEFAULT . We make the following et al. (2013). The mTERM proxy is equal to 1+TERM t−1 t−1 observations. 1. The intercept row indicates the weekly alpha in basis points. The tests indicate that after controls for systematic risk, weekly alpha is highly significant and on the order of 11.86 basis points per week for the entire sample. This represents an annualized alpha of 6.36%. 2. The market excess return is negative and significant at the 10% level in model (8.1), but loses significance when other factors (SMB, HML, UMD) are added in model (8.2). 3. The size factor is significantly negatively related, while the momentum factor is significantly positively related to portfolio returns. Interestingly, while the portfolio strategy studied in the paper is solely based on Treasury bonds, systematic risk factors that price equity are also significant in explaining returns in the bond market. 4. The value factor is insignificant across all specifications. 5. Of the two bond specific factors, changes in the term spread is negative and highly significant while changes in the default spread is insignificant. This is expected as US Treasury security returns are likely unrelated to changes in the default premia (BAA-AA) but correlated with changes in the slope of the term structure as given by the term spread (10 year – 2 year CMT). 6. Models (8.6) – (8.8) repeat the analysis in model (8.4) for 1990–1999, 2000–2009, and 2010–2015 respectively. The results indicate that alpha for the strategy has decreased from about 5.45% per year in the 1990s and 6.03% in the 2000’s to 3.56% in the 2010–2015 period. The results in this section indicate that our US Treasury bond based z-score strategy has led to significant alpha generation after accounting for systematic risks. 4. Examining potential economic drivers of strategy returns The construction of the z-score is relatively simple as it relies on a signal generated using publicly available information from a well-studied yield curve model. As such, it is not clear why in the context of a highly liquid market, the strategy should generate 6 Specifically, we first compute the exposure change for different maturity buckets (0–2 years, 2–5 years, 5–10 years, 10–20 years, and 20–30 years). We do this as transaction costs might vary at different points of the yield curve. As an example, transaction costs for securities in the 5-10 year bucket are almost always below 3 bps, but can be higher at the long end. We next calculate the average transaction cost for each sector on the day of the rebalancing and multiply that with the exposure change. 7 The sample period for our analysis using the broad Barclays US Treasury Index begins on March 1994 and ends on July 2015, a total of 1070 weeks.

139

Y. Nielsen, R.S. Pungaliya Table 8 Alphas of bond portfolios after transaction costs. In this table, we compute alpha by estimating time series regressions of 10 bond long only portfolio excess returns after transaction costs on systematic risk factors. Transaction cost estimates are conservative and assume a naïve rebalancing strategy where the portfolio is rebalanced weekly as per the model output. The procedure to compute transaction costs is based on one half of the bid ask spread and is detailed in Section 3.2. The table shows estimated alphas along with their associated t-statistics based on Newey–West standard errors. MKT refers to the excess returns on the market portfolio, while SMB, HML and UMD are the size, value, and momentum factors respectively. TERM and DEFAULT refer to the term spread (10 year – 2 year) and default spread (BAA – AAA) respectively. 𝛥TERM is 𝑇 𝐸𝑅𝑀𝑡 − 𝑇 𝐸𝑅𝑀𝑡−1 , while ( ) ( ) 𝛥DEFAULT is 𝐷𝐸𝐹 𝐴𝑈 𝐿𝑇𝑡 − 𝐷𝐸𝐹 𝐴𝑈 𝐿𝑇𝑡−1 . Finally, mTERM is 𝛥TERM∕ 1 + TERMt−1 and mDEFAULT is 𝛥DEFAULT∕ 1 + DEFAULTt−1 . For convenience, the shaded row shows the annualized alpha in percentage. The alpha computation is based on the weekly alpha presented in the Intercept row in basis points. Dependent variable

(8.1)

(8.2)

(8.3)

(8.4)

(8.5)

(8.6)

(8.7)

[1990–1999]

[2000–2009]

[2010–2015]

11.856(5.74)∗∗∗ −0.013(−0.94) −0.063(−3.49)∗∗∗ 0.001(0.08) 0.035(2.87)∗∗∗

10.206 (3.24)∗∗∗ 0.082(2.63)∗∗∗ −0.086(−2.62)∗∗∗ 0.051(1.35) 0.056(2.68)∗∗∗

11.262(3.54)∗∗∗ −0.065(−4.45)∗∗∗ −0.032(−1.21) 0.013(0.57) 0.014(1.00)

6.724 (2.19)∗∗ 0.024(0.61) 0.042(1.70) −0.014(−0.38) −0.020(−0.88)

−2.283(−4.91)∗∗∗ −0.294(−0.46) 1,265 9.56% 6.36%

−0.733(−1.27) −0.044(−0.03) 489 10.06% 5.45%

−1.527(−2.55)∗∗∗ −0.284(−0.30) 499 8.54% 6.03%

−5.763(−6.83)∗∗∗ −0.828(−1.06) 277 45.69% 3.56%

Long only (equally weighted, 10 bonds) 140

12.125(5.94)∗∗∗ −0.025(−1.94)∗

11.488 (5.71)∗∗∗ −0.009(−0.67) −0.068(−3.84)∗∗∗ −0.004(−0.22) 0.038(3.50)∗∗∗

12.023(5.82)∗∗∗

−2.309(−4.84)∗∗∗ 0.083(0.14)

1,265 0.55% 6.50%

1,265 2.85% 6.15%

1,265 7.06% 6.45%

11.844 (5.74)∗∗∗ −0.012(−0.93) −0.063(−3.49)∗∗∗ 0.002(0.09) 0.035(2.87)∗∗∗ −2.259(−4.94)∗∗∗ −0.292(−0.46)

1,265 9.62% 6.35%

Journal of Empirical Finance 44 (2017) 125–144

Intercept (weekly 𝛼 in bp) MKT SMB HML UMD 𝛥TERM 𝛥DEFAULT mTERM mDEFAULT 𝑁 Adjusted R2 (%) Annualized 𝛼 (%) after transaction costs based on intercept above

(8.8)

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Table 9 Do bond characteristics matter? Panel A of this table lists average values and standard deviations of bond-characteristics for bonds included and not included in the 10-bond long only strategy portfolio, at each rebalancing date from March 1990 to July 2015. The strategy portfolio considered in this table is formed by picking bonds with the 10 highest z-scores each week from our aggregate weekly bond panel that has 177,810 observations spread across 1,266 weeks. Of these, there are 12,660 weekly bond observations in the strategy portfolio (1,266 weeks x 10 bonds per week) and 165,150 observations in the all other bonds subsample. The last column shows the difference between the two subsamples and stars represent significance from difference t-tests at standard levels. In Panel B, we use the same broad sample of 177,810 bonds to run a logit regression with a dummy variable that equals one for bonds included in the strategy, and 0 otherwise. The logit uses two-way clustered standard errors at the bond and rebalancing date (time) level. Model 1 excludes duration because of its built-in relation with the coupon rate and time to maturity, while Model 2 includes it. Panel A: Summary statistics

Z-score Duration (in years) Coupon rate (%) Bid–ask spread Bond age Years to maturity

Strategy portfolio (𝑁 = 12, 660)

All other bonds (𝑁 = 165, 150)

Difference

1.75 (0.41) 5.51 (3.94) 6.18 (2.87) 0.05 (0.03) 5.54 (5.84) 8.13 (7.90)

−0.13 (0.90) 5.23 (4.02) 5.88 (3.03) 0.05 (0.03) 5.44 (5.73) 7.58 (7.71)

1.88∗∗∗ 0.28∗∗∗ 0.30∗∗∗ 0.00∗∗∗ 0.10∗ 0.56∗∗∗

Panel B: Logit regression to explain inclusion in the strategy portfolio Bid–ask spread Model 1 Model 2

∗∗∗

6.15 (7.34) 6.32 (7.50)∗∗∗

Bond age

Duration

Coupon rate

∗∗∗

−0.02 (−5.04) −0.02 (−5.60)∗∗∗

Time to maturity ∗∗∗

0.08 (11.29) 0.09(11.21)∗∗∗

0.05 (2.21)∗∗∗

∗∗∗

0.01(3.65) −0.01 (−1.06)

Pseudo R2

Constant ∗∗∗

−2.71(−69.25) −2.81(−50.56)∗∗∗

0.0044 0.0047

abnormal returns. While puzzling and concerning in the context of expected market efficiency, simple models have been documented to consistently beat the stock market; for instance, in the case of the post earnings announcement drift (PEAD) and the accrual pricing anomalies (Bernard and Thomas, 1989, 1990; Sloan, 1996). In both these cases the underlying models are simple, reliant on publicly available information, and have generated positive alpha for extended periods. In this section, we take a first step in trying to understand the economic drivers of the strategy and make sense of the returns. 4.1. Bond characteristics We approach this important issue from two directions: bottom-up and top-down. First, in the bottom-up approach, we test whether there are any bond-level characteristics that can explain our findings. For instance, if z-scores (the variable we use to sort and select bonds) were consistently higher for illiquid bonds, we would be more likely to hold illiquid bonds in our strategy portfolio. The returns to this portfolio could then be seen as compensation for higher illiquidity risk. To this end, we carefully examine key bond-level characteristics such as the bid–ask rate (liquidity proxy, smaller is more liquid), age (liquidity, younger is more liquid), duration, coupon rate, and time to maturity between our long-only 10-bond strategy portfolio and the rest of our Treasury bond universe in Table 9 Panel A. Our aggregate weekly bond panel has 177,810 observations spread across 1,266 weeks. Of these, there are 12,660 weekly bond observations in the strategy portfolio (1,266 weeks x 10 bonds per week) and 165,150 observations in the all other bonds subsample. Univariate tests suggest that there is significant statistical difference between all bond characteristics in the two groups. Given this evidence, one could say that bond characteristics matter with the strategy portfolio exhibiting slightly higher levels of duration, coupon rate, bond age, and years to maturity. However, economically the differences between the means and standard deviations are quite small, and significance could be due to the large N. Another way to examine the relevance of bond characteristics to the strategy is through time series plots. We present the average values of the various bond characteristics for the strategy portfolio and all other bonds at each rebalancing date in Fig. 8. The figure shows us that the average bond characteristics of the strategy portfolio vary considerably over time, with no clearly discernable pattern of being above or below the all other bond sample average. Furthermore, the volatility around the sample average suggests no significant persistence in bond characteristics for the strategy portfolio. This figure suggests that the strategy returns are likely not driven by overweighting or underweighting of a fundamental bond characteristic in the strategy portfolio. 4.2. Business cycles It is well known that in economic down cycles market liquidity and risk premia are both affected. Limited speculative capital not only increases the expected cost of capital, but also directly affects risk appetite and ability (Shleifer and Vishny, 1997; Naes et al., 2011). In this situation, some assets may face continuous demand as investors have a tendency to hold specific issues and borrowers may not be able to borrow specific issues. Thus, in addition to market liquidity being low, the trader may also face a lack of liquidity related to specific issues such as off-the-run securities (Pasquariello and Vega, 2009). Consequently, expensive (overpriced) assets would remain continuously expensive, while cheap (underpriced) ones will remain continuously cheap. Thus, the movement of the bond price to fair value will not be as rapid as we expect. In reference to our model, the z-score of the bond will not mean revert as expected, significantly reducing the profitability of the strategy. We examine if business cycles can explain the returns to the strategy. We compute the difference between the returns of the top quintile portfolio of bonds formed by z-score and an equally weighted index portfolio formed using all bonds in the sample (the 141

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Fig. 8. Comparison of bond characteristics of the strategy portfolio with all other bonds. In this figure we plot the average bond characteristics of the z-score strategy portfolio (equally weighted, 10 long bond) along with the average bond characteristics of the rest of the bond sample at each weekly rebalancing date from March 1990 to July 2015.

Table 10 GDP growth and strategy returns. This table, based on the data in Fig. 9, shows the correlations of abnormal returns of the strategy with GDP growth at the quarterly level. The first row depicts strategy returns are for the 10-bond long only strategy. The second row expands the bond portfolio to the top quintile based on z-score. The index portfolio is constructed using equally weighted returns of all bonds our Treasury bond universe.

Strategy returns – Index Top quintile returns – Index

GDP growth

Lead GDP growth

Lag GDP growth

0.260 0.305

0.302 0.274

0.043 0.044

aggregate portfolio), and find that this difference has a significant and high correlation with GDP growth. We plot this series against GDP growth in Fig. 9. The figure indicates a clear correlation between index adjusted (or abnormal) returns of the strategy and GDP growth.8 The figure shows that the strategy outperforms the aggregate portfolio during most years, but significantly so in good economic states. On the downside, recession years such as 2001 and 2008 see a marked dip in the profitability of the strategy. Thus, one potential explanation for the high returns in good states, could be the risk of disaster in bad ones. 5. Conclusion It is common for traditional fixed income traders to take specific positions on individual securities in the US treasury market. These positions can be based on the trader’s view of the evolution of the yield curve, or his or her view regarding the relative value of an

8 We compute correlations between abnormal returns and GDP growth at the quarterly level as the highest frequency available for GDP growth data is quarterly and present the results in Table 10. The correlations are statistically significant and indicate that there is some evidence that returns to the strategy are correlated with the business cycle. For robustness, we also run this analysis at the annual level using annual GDP growth and annual strategy returns. Despite being limited by data to only 26 data points (one for each year from 1990–2015), the correlation between the strategy returns – index with GDP growth increases from 0.26 to 0.39.

142

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Fig. 9. Top quintile strategy outperformance through the business cycle. This figure plots index adjusted annual returns of the broader strategy portfolio based on the top quintile of bonds against the US GDP growth rate. The broader strategy portfolio consists of long only positions in the top quintile of bonds ranked according to the z-score. The holding period of the portfolio is 1 week, and returns are annualized for presentation. The index considered is generated by equally weighting all bonds in the sample.

individual issue in comparison to others.9 This paper focuses on the latter idea and presents evidence of significant abnormal returns with a risk adjusted 𝛼 of 6.36% per year in a dynamic portfolio of US Treasury securities from 1990 to 2015. Our methodology relies on (1) isolating the idiosyncratic component of individual bond yields after accounting for three factors (level, slope, and curvature) as given by the Nelson and Siegel (1987) yield curve model, and (2) creating focused portfolios (long only or long short) based on the derived valuation signal (z-score). The sample of Treasury notes and bonds used in the paper is comprehensive, except for standard exclusions for optionality and short maturity. Moreover, the algorithm used to identify undervalued securities is simple, implementable, and provides significant abnormal returns for both long only and long short portfolios. This is significant as both strategies have varying strengths. On the one hand, long only strategies do not require short positions that may be constrained by a lack of security availability or a shortage of risk capital. On the other hand, long short strategies can be designed to be both low cost where the short side can help fund the long position, and duration neutral which reduces the cost of a broker financed leverage position. Is the performance of the strategy an anomaly? The US Treasury market is one of the most liquid asset markets in the world where textbook forces that limit arbitrage related to illiquidity and capital constraints are less likely to apply (Shleifer and Vishny, 1997). Moreover, our 𝛼 results are based on transaction cost adjusted returns and are robust to simultaneous controls for traditional risk factors such as the market excess return, Fama–French size and value, momentum, and bond specific risk factors such as changes in the term and default premium (Fama and French, 1993; Carhart, 1997). A significantly positive risk-adjusted alpha suggests that our algorithm is successful in identifying mispriced Treasury securities, or that our risk model is incomplete and is unable to account for all sources of systematic risk relevant to our portfolio. Further tests suggest that our returns are likely not related to specific bond characteristics, but could be potentially related to the business cycle. The strategy generates profits in most years. However, we find some evidence that indicates abnormal profits (in excess of the index) is correlated with GDP growth. References Adrian, T., 2013, Treasury market liquidity: An overview (presentation), In: 3rd US Treasury roundtable on Treasury markets and debt management & 13th IMF public debt management forum, June 19-20, 2013. Asness, C., Moskowitz, T., Pedersen, L., 2013. Value and momentum everywhere. J. Financ. 68 (3), 929–984. Bernard, V., Thomas, J.K., 1989. Post-earnings-announcement drift: Delayed price response or risk premium? J. Account. Res. 27, 1–36. Bernard, V., Thomas, J.K., 1990. Evidence that stock prices do not fully reflect the implications of current earnings for future earnings. J. Account. Econ. 13 (4), 305–340. Carhart, M., 1997. On persistence in mutual fund performance. J. Financ. 52 (1), 57–82. Chakravarty, S., Sarkar, A., 1999. Liquidity in US fixed income markets: A comparison of the bid–ask spread in corporate, government, and municipal bond markets. In: FRB of New York Staff Report No. 73. Choudhary, M., 2006. The futures bond basis. In: Appendix B: Relative Value Analysis: Bond Spreads, 2nd. Wiley Finance. Cox, J., Ingersoll, J., Ross, S., 1985. An intertemporal general equilibrium model of asset prices. Econometrica 53 (2), 363–384. Diebold, F., Li, C., 2006. Forecasting the term structure of government bond yields. J. Econometrics 130, 337–364. Engle, R., M.Fleming, Ghysels, E., Nguyen, G., 2013. Liquidity, volatility, and flights to safety in the US Treasury market: Evidence from a new class of dynamic order book models. FRB Staff Rep. 590. Fama, E., French, K., 1989. Business conditions and expected returns on stocks and bonds. J. Financ. Econ. 25, 23–49. Fama, E., French, K., 1993. Common risk factors in the returns on bonds and stocks. J. Financ. Econ. 33, 3-53. Fama, E., French, K., 1996. Multifactor explanations of asset pricing anomalies. J. Financ. 51 (1), 55–84. Gatev, E., Gotetzmann, W., Rouwenhorst, K., 2006. Pairs trading: Performance of a relative-value arbitrage rule. Rev. Financ. Stud. 19 (3), 797–827.

9 According to our conversation with an anonymous market practitioner who has worked on fixed income desks in Wall Street, often relative value traders select one or two specific bonds to go long or short that he or she deems as ‘mispriced’, and do not follow a portfolio approach.

143

Y. Nielsen, R.S. Pungaliya

Journal of Empirical Finance 44 (2017) 125–144

Gebhardt, W., Hvidkjaer, S., Swaminathan, B., 2005. The cross-section of expected corporate bond returns: Betas or characteristics? J. Financ. Econ. 75, 85–114. Huggins, D., Schaller, C., 2013. Fixed Income Relative Value Analysis. Bloomberg Press. Jostova, G., Nikolova, S., Philipov, A., Stahel, C., 2013. Momentum in corporate bond returns. Rev. Financ. Stud. 26 (7), 1649–1693. Mashruwala, C., Rajgopal, S., Shevlin, T., 2006. Why is the accrual anomaly not arbitraged away? The role of idiosyncratic risk and transaction costs. J. Account. Econ. 42 (1-2), 3-33. McCulloch, J., 1971. Measuring the term structure of interest rates. J. Bus. 44, 19–31. Naes, R., Skjeltorp, J., Odegaard, B., 2011. Stock market liquidity and the business cycle. J. Financ. 66 (1), 139–176. Nelson, C., Siegel, A., 1987. Parsimonious modeling of yield curve. J. Bus. 60, 473–489. New York Fed (Treasury Market Practices Group), 2015, Automated Trading in Treasury Markets. Pasquariello, P., Vega, C., 2009. The on-the-run liquidity phenomenon. J. Financ. Econ. 92, 1–24. Sercu, P., Wu, X., 1997. The information content in bond market residuals: An empirical study on the Belgian bond market. J. Bank. Financ. 21 (5), 685–720. Shleifer, A., Vishny, R.W., 1997. The limits of arbitrage. J. Financ. 52 (1), 35–55. Sloan, R., 1996. Do stock prices fully reflect information in accruals and cash flows about future earnings? Account. Rev. 71 (3), 289–315. Vasicek, O., 1977. An equilibrium characterization of the term structure. J. Financ. Econ. 5, 177–188.

144