SR-BLITS: Sharpe Ratio’s Backward-Looking Improvement as a Trading Strategy
Accepted Manuscript
SR-BLITS: Sharpe Ratio’s Backward-Looking Improvement as a Trading Strategy Mohammed Shahid Abdulla PII: DOI: Reference:
S0970-3896(16)30188-4 https://doi.org/10.1016/j.iimb.2019.07.005 IIMB 338
To appear in:
IIMB Management Review
Received date: Revised date: Accepted date:
29 December 2016 30 January 2018 15 July 2019
Please cite this article as: Mohammed Shahid Abdulla, SR-BLITS: Sharpe Ratio’s BackwardLooking Improvement as a Trading Strategy, IIMB Management Review (2019), doi: https://doi.org/10.1016/j.iimb.2019.07.005
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
CR IP T
SR-BLITS: Sharpe Ratio’s Backward-Looking
AN US
Improvement as a Trading Strategy
Abstract
A common and trivial strategy with respect to a single security or a trade-
M
able asset is to simply buy-and-hold. In contrast, a trading strategy named
ED
SR-BLITS is proposed that takes a position based on buy and sell signals which are calculated at each decision index T . These signals are derived from the
PT
maximization of Sharpe Ratio (SR), a measure of risk-adjusted returns, which is calculated using values of the past (T − 1) returns. At index T , a vector
CE
of ideal SR-maximising positions – for all indices t < T thus far – is calculated, accounting for payments made to change the existing vector of positions.
AC
This purchase (or sale) to effectively correct all past positions taken at indices t < T , is assumed to be performed at the current price of the security or asset. The computation for these signals involves solving at most 2 systems of linear
1
ACCEPTED MANUSCRIPT
equations at each T , and only 1 if transaction cost is not considered. However, the matrix size to be inverted increases with T , requiring the algorithm to be restricted to episodes of some size M >= T . Numerical experiments on simu-
CR IP T
lated Geometric Brownian Motion (GBM) series using a range of parameters, as well as NSE and NASDAQ indices are conducted. With transaction costs considered, these reveal more than 30 percent improvement in average SR for
an episode of trading in GBM and NSE, when compared to a buy-and-hold
1
AN US
strategy.
Introduction and Problem Description
M
A trader in an asset attempts to optimize or maximize a suitable internal measure,
ED
such as profit, economic utility or risk-adjusted return. A trading system computes the optimum input variables into such a measure, and thus infers a trading decision
PT
that the trader executes in the market. In this paper, we propose the use of a novel metric called a backward-looking Sharpe Ratio (SR) and maximize this internal
CE
quantity. It is empirically observed that the trading decisions recommended by such
AC
a trading system (called SR-BLITS), results in positions that have a net SR favorable over the buy-and-hold (BH) strategy. SR-BLITS is currently applicable to trading systems that have been designed to trade in a single security or asset.
2
ACCEPTED MANUSCRIPT
Consider a time-series of asset prices {ρ1 , ρ2 , ..., ρT , ρT +1 ..., ρM } where M is a episode upper-limit. We assume that the overall sequence of an asset’s or security’s price can be composed of M −length arrays in order to apply a trading strategy. At
∆
CR IP T
a time T , where T ≤ M , after the price ρT of the asset is observed, a T − 1 sized −1 returns sequence {rt }Tt=1 can be inferred where rt =ρt+1 − ρt , for t < T . Note that
these returns are not normalized, since this application pertains to a single price
AN US
T series {ρt }M t=1 . Consider Rt := rt , since rt would be the return encountered in a BH
strategy. In further discussion, we will use a different definition of returns sequence −1 {RtT }Tt=1 . Thus,
AT q , where 2 T T B − (A ) PT −1 T PT −1 T 2 (Rt ) T T t=1 Rt A = , and B = t=1 T −1 T −1 ∆
=
(1)
ED
M
ST
is the Sharpe Ratio (SR) defined with these returns. Consider an alternate definition
PT
of SR that we shall use as short-hand later:
CE
−1 SaT ({xt }Tt=1 , {ρt }Tt=1 ) , RtT = xt (ρt+1 − ρt ) = xt · rt .
The S T defined in (1) can be rewritten as S T = SaT ({1}, {ρt }Tt=1 ), where {1} is
AC
a T − 1 size vector of 1s indicating a conventional BH strategy. The 1s indicate
that the asset has been bought at t = 1 at corresponding price ρ1 , held for each 1 < t ≤ T , and has not been sold at any intervening index. The vector of positions 3
ACCEPTED MANUSCRIPT
is s.t. 0 ≤ xTt ≤ 1, with xTt = 1.0 implying that η units of the asset are held at t, some of which may be purchased at price ρt (depending on what the position xTt−1 was). Also, the position size η is a positive integer large enough such that xTt · η is
CR IP T
a positive integer, indicating the actual number of shares of the asset held. Assume that no short positions are taken at any t and also that, temporarily, no transaction costs are due for any purchase or sale of the asset made at index T . The latter
AN US
assumption will be relaxed later in §4.
An insight used in this work is a correction for past positions, provided that such corrections are made at current price ρT . Suppose that against causality, it is actually possible to ‘correct’ position xT −k to a more suitable x0T −k , the latter
M
prescribed by some optimization algorithm. We must thus request another assetholding entity, one who exclusively engages in BH, to transfer (x0T −k − xT −k ) nos. of
ED
the asset that she held at index (T − k). We pay her the amount (x0T −k − xT −k ) · ρT
PT
so that she can make good the (x0T −k − xT −k ) nos. transferred from her holding to us. Our return at (T − k) has to be adjusted to incorporate the amount paid, thus
CE
RT −k := rT −k · x0T −k − ρT · (x0T −k − xT −k ).
AC
However, this return does not accrue to us in any real sense - it is only an element of the T −length time-series for which we calculate the internal or backward-looking SR. The findings from simulating SR-BLITS is that the decision xT (4) of a purchase
4
ACCEPTED MANUSCRIPT
or sale made at each T , by the above BH-only entity, is empirically observed to have good SR. This outcome occurs is because these purchase/sale actions are made against optimization of the internal SR, which proposes a vector of new positions like
CR IP T
x0T −k . Note also that the shares purchased from the BH-only entity for index (T − k) are not exactly (x0T −k − xT −k ) in number, since this is only an illustration. The number is discounted by quantities that depend on x0T −k−1 and xT −k−1 , as explained
2
AN US
in (2) below.
Review of Literature
M
Technical trading strategies based on price, momentum and volume exist and are regularly reviewed, e.g. (Kwon & Kish, 2002). There is some analogous work in the
ED
literature which also optimizes SR before taking a trading decision. Systems that optimize trading-related SR in a single security, using a specific machine-learning tech-
PT
nique named Recurrent Reinforcement Learning (RRL), were considered in (Moody
CE
& Saffell, 1998). In particular, the strategy in (Moody & Saffell, 1998) was able to outperform BH on the the S&P 500 index, considered over 25 years, with each
AC
month treated as time t. A shortcoming in that work was the’ constant magnitude’ assumption, whereby the position in the security is either of {−1, 0, 1}, with −1 indicating being η shares ‘short’ in the security. In contrast, the algorithm proposed 5
ACCEPTED MANUSCRIPT
here assumes the portfolio xTt · η of the tradeable asset takes values in {0, 1, 2, ...., η}, for an η >> 1 at each instant t. This gives more number of discrete actions to choose from at each instant, and also enables better risk control as stated in. Techniques
CR IP T
from Reinforcement Learning (RL) and optimal control have been applied to the single-asset trading problem, e.g. (Li & Chan, 2006). Hence, one may also draw justification for this situation using a common observation in RL. This observation
AN US
is that compact action set control problems have better optimal policies over similar problems with discrete action sets, e.g. (Abdulla & Bhatnagar, 2015). Thus, control in terms of constant magnitude positions like {−1, 0, +1} is less preferable to control via actions in the interval [−1, 1].
M
The model in (Moody & Saffell, 1998) assumes that a series of external variables
ED
{y1 , y2 , ...., yT −1 , yT } is also available to the trader during optimization. There are 84 different input series to calculate a single trading decision, which in turn is based
PT
on the maximization of a non-linear function using the stochastic gradient ascent algorithm. The above work uses multi-input neural networks, while we do not use
CE
any variables beyond the two series of asset prices {ρs } and internal positions {xTs −1 }
AC
in this work. On the one hand, this enables to keep computational load lighter, as and when the proposed algorithm is ported to high-frequency trading in a highly liquid asset. On the other, the internal measure in our case is optimized in closed-form,
6
ACCEPTED MANUSCRIPT
and since it is a SR it accommodates only returns-related terms. To further compare with the model used in (Moody & Saffell, 1998), one may assume that returns from the risk-free asset rtf are 0 for each t in our case, due to trading over small-length
CR IP T
episodes of size M . An algorithm that chooses a portfolio by estimating the highest SR can also seen in more recent work such as (Shen, Wang, Jiang, & Zha, 2015).
Using a variation of the RRL was the aim in (Gorse, 2011), which also considered
AN US
daily buy-sell signals to evaluate the modified algorithm’s efficacy over simple BH. However, in (Gorse, 2011), the algorithm considers maximisation of returns alone, and does not consider an objective based on risk-adjusted returns like the SR. An easy-to-follow student project based on (Moody & Saffell, 1998), which also doesn’t
M
consider risk-free assets explicitly (but optimizes the SR), is to be found at (Molina, 2006). This project employs past returns rt observed on a security as the inputs to
ED
a learning technique, but applies only one decision xTT out of a entire vector {xTt }Tt=1
PT
of learned output decisions. This is natural in the sense that the remaining decisions T −1 pertain to the past, where decisions {x11 , x22 , ..., xTT −1 {xTt }t=1 −1 } have already been
CE
taken by the algorithm. The optimization of the SR in (Molina, 2006) relies on
AC
output variables {xTt }Tt=1 being interdependent. Hence, applying only the one such −1 output xTT , while neglecting {xTt }Tt=1 , seems insufficient.
A similar SR related maximization is attempted, but with respect to Foreign
7
ACCEPTED MANUSCRIPT
Exchange prices, in (Gold, 2003). After noticing that performance of algorithm in (Gold, 2003) has become poorer in Forex markets, modifications were suggested in (Dempster & Leemans, 2007). Also to be noted is that above references employ
CR IP T
Machine-Learning techniques or compare with algorithms that do (Reinforcement Learning, Genetic Algorithms) where differing types of inputs are required. In our case, we employ a different tack whereby we directly infer a sequence of positions
AN US
{xTt }Tt=1 that maximises the backwards-looking SR, and then make a purchase at the current price to justify changes in the reference sequence of positions, termed −1 {xTt −1 }Tt=1 . The notation of the problem, and a specific explanation of the method
SR-BLITS without transaction costs
ED
3
M
SR-BLITS, follows:
At decision index T there are earlier ’tics’ or indices 1 ≤ t ≤ T , for which we wish
PT
to design an internal sequence of positions {xTs }Ts=1 , such that a suitable internal
CE
SR is maximised. To describe this internal SR proposed above: assume a reference
AC
sequence of positions in the asset, calculated at earlier decision index T − 1, as
8
ACCEPTED MANUSCRIPT
−1 {xTs −1 }Ts=1 . Now define a new returns sequence for 1 ≤ t ≤ T − 1 as below:
RtT = rt xTt − (ρT − ρt )PtT , where (2)
CR IP T
−1 PtT = (xTt − xTt−1 ) − (xTt −1 − xTt−1 ),
in which PtT indicates the purchase to be made at the T −th index to correct the earlier internal position xTt −1 for the t−th index onto its new, adjusted, value xTt . This adjustment is done since it is attractive due to the asset being currently available
AN US
at ρT , even though the return at the t−index will be adjusted for the difference between ρt+1 and ρt . The form of PtT is explained next: PtT is not just assigned the difference xTt − xtT −1 between the two positions xTt and xTt −1 from decision arrays
M
{xTs } and {xTs −1 }, respectively. Instead, it is (xTt − xTt−1 ) (i.e. the extra purchase needed at t vis-a-vis the already corrected position at t − 1 xTt−1 ), further discounted
ED
−1 by the previous profile’s additional purchase for epoch t, i.e. (xTt −1 − xTt−1 ). The
PT
only exception to this description is when t = 1, where the purchase needed is simply P1T = xT1 − xT1 −1 . For such cases, the alternate definition of SR that we shall use as
AC
CE
short-hand has three parameters: T −1 −1 SbT ({xTs }s=1 , {ρs }Ts=1 , {xTs −1 }Ts=1 ) ,
(3)
where a new set of T − 1 positions xTs are calculated with reference to T asset prices and earlier sequence of positions xTs −1 . 9
ACCEPTED MANUSCRIPT
Neither SaT nor SbT are true SRs: the purchase P T will dictate the true, effective, position xT as follows: (4)
CR IP T
xT := xT −1 + P T , B + 1 ≤ T ≤ M .
Also note that a positive integer B << M exists, which is a bootstrap index. Thus, even upon discovery of price ρB at index B, the position in the asset is regular buy-and-hold (BH), and buy-sell signals are calculated only for later indices T > B.
AN US
This is done so that the metric SbT for optimization, from which the buy-sell signals xT are inferred, can become stable and not be overly sensitive to minor changes in the internal positions that SR-BLITS recommends. The default position in the
M
asset during the bootstrap period will be BH, i.e. xT = 1.0, T ≤ B, in order for B B the SR Sa ({1}, {ρs }B+1 s=1 ) to have stabilized. Thus, the {xs }s=1 profile that will be
ED
B }B+1 used as reference for computing {xB+1 s=1 would simply be xs = 1.0, ∀s ≤ B. s
PT
The rationale is that SR-maximising purchases P T at each T will also bring about an enhanced SR of the effective positions xT that result from this strategy. The
CE
final comparison, therefore, is between the SR for positions {xT }M T =1 calculated by SR-BLITS and the BH position {1}M T =1 . Note that other measures of the buy/sell
AC
suitability of current asset price ρT vis-a-vis earlier prices exist, e.g. the moving
average indicator
1 T −1
PT −1 s=1
ρs . The moving average indicator is however so generic
so as to not be path-dependent, the latter property being crucial to an algorithmic 10
ACCEPTED MANUSCRIPT
trading system (Moody & Saffell, 1998). As part of preliminary results, a property of AT used above in (1) will be employed
Proposition 1. For 1 ≤ t ≤ T − 1, Proof: Recall that AT = and
T dRt+1 dxT t
1 T −1
dAT dxT t
PT −1 t=1
= (ρT − ρt+1 ), we have that
= 0.
CR IP T
in later results:
RtT . Now note that
dRtT dxT t
+
T dRt+1 dxT t
dRtT dxT t
= rt − (ρT − ρt ),
= 0, since rt = ρt+1 − ρt . Result
AN US
thus holds true for xTt with 1 ≤ t ≤ T − 2, since there is no participation in any T and RtT . Further, since xTT −1 is a variable only RsT terms within AT other than Rt+1 T −1 in term RTT −1 , we have that RTT −1 = −(ρT − ρT −1 )(−xTT −2 − (xTT −1 −1 − xT −2 )) due to T dRT −1 dxT T −1
= 0.
M
rT −1 = (ρT − ρT −1 ), and thus
Since SbT has to be maximized over the (T − 1)−sized vector {xTs }, we calculate dS T dxT t
for each position xTt , t ≤ T − 1. Though xTT is part of {xTs }Ts=1
ED
an expression for
by notation, it does not currently contribute to any component in SbT . However,
CE
discovered).
PT
it will contribute to the returns sequence in next index T + 1 (after price ρT +1 is
AC
−2 Thus we have a sufficient condition to characterize the maximizing vector {xTs }Ts=1 .
11
ACCEPTED MANUSCRIPT
Consider these definitions: =
rt+1 (ρT − ρt+1 ) − (ρT − ρt+1 )2 ,
aT,2 t
=
rt2 − 2rt (ρT − ρt ) + (ρT − ρt )2 + (ρT − ρt+1 )2
aT,3 t
=
rt (ρT − ρt ) − (ρT − ρt )2 , 1 ≤ t ≤ T − 2
bTt
=
−1 −1 ((ρT − ρt )2 − rt (ρT − ρt ))(xTt −1 − xTt−1 ) − (ρT − ρt+1 )2 (xTt+1 − xTt −1 ),
CR IP T
aT,1 t
∀t s.t. 2 ≤ t ≤ T − 2 =
((ρT − ρ1 )2 − r1 (ρT − ρ1 ))xT1 −1 − (ρT − ρ2 )2 (xT2 −1 − xT1 −1 )
AN US
bT1
Theorem 1. Assuming a non-singular matrix of coefficients and xTT −1 := xTT −2 , the system of T − 2 linear equations,
M
T,2 T T aT,1 = bT1 1 x 2 + a1 x 1
ED
T,2 T T,3 T T T aT,1 t xt+1 + at xt + at xt−1 = bt ,
,
PT
∀t s.t. 2 ≤ t ≤ T − 2
CE
has a solution which is a candidate maximum (point of inflexion) for SbT ({xTs }, {ρs }, {xTs −1 }).
AC
Proof: Using quotient rule of differentiation, and Proposition 1 with regard to AT , we have for 2 ≤ t ≤ T − 2: T T dSbT ({xTs }, {ρs }, {xsT −1 }) −AT T dRt T dRt+1 = {R + R }, (5) 3 t t+1 T T 2 2 dxTt dx dx T T t t (T − 1)(B − (A ) )
12
ACCEPTED MANUSCRIPT
T , equate the RHS to 0, and substitute for RtT , Rt+1
dRtT dxT t
and
T dRt+1 dxT t
from (2) and the
proof of Proposition 1, respectively. Appropriate substitution for t = 1 also yields
T,2 T T T first equation, aT,1 1 x 2 + a1 x 1 = b 1 .
CR IP T
Note that (T − 2)−equations however are in (T − 1)−variables, hence the assignment xTT −1 := xTT −2 was necessary. In the following, we establish a result regarding the second derivative
T −1 }) d2 S T ({xT s },{ρs },{xs , 2 T d(xt )
and show that it is verifiably negative.
AN US
−1 Theorem 2. If, for a solution {xTs }Ts=1 obtained from Theorem 1, the corresponding −1 SbT ({xTs }, {ρs }, {xsT −1 }) > 0, then {xTs }Ts=1 is the maximum.
Proof: Applying shorthand S T , and using the quotient rule on (5) above: 2
3
d2 S T 2
M
d(xTt )
2
T T 2 T d2 Rt+1 dRt+1 −(S T ) dRtT T d Rt T ( = (( ) + ( ) + R ( ) + R t t+1 2 2 )) dxTt (T − 1)(AT )2 dxTt d(xTt ) d(xTt ) 2 dS T dxT t
ED
T T −3(S T ) T dRt+1 T dRt + Rt+1 }· + {Rt dxTt dxTt 3
2
2
dRtT dRtT −(S T ) = (( ) + ( )) dxTt (T − 1)(AT )2 dxTt
2
3
T
(AT ) + (S T ) 2AT dA dxT t
d2 RT
PT
(6)
(T − 1)(AT )4
2
The second equality is obtained from (5), which equates to 0, as well as ( d(xT t)2 ) = 0, t
= 0, due to constants inside the terms
CE
2 d2 RT ( dxt+1 T ) t
dRT 2
dRtT dxT t
and
T dRt+1 . dxT t
In this second equal-
dRT 2
−1 −1 , {ρs }Ts=1 , {xTs −1 }Ts=1 )> ity, the term (( dxTt ) +( dxTt ) ) > 0, hence if S T ≡ SbT ({xTs }Ts=1 t
AC
0, then
3 −(S T )
(T −1)(AT )2
t
< 0, which is the second-order condition for the maximum.
−1 Notice that reference sequence of positions {xTs −1 }Ts=1 had (T −1) position values,
and thus there must be an additional position in {xTs }, that of xTT . This position xTT 13
ACCEPTED MANUSCRIPT
is purely a position in the internally maintained sequence {xTs }Ts=1 . The aggregate
P
T
=
(xT1
=
xTT −1
PT −1 t=1
−
PtT , is made at price ρT , thus
xT1 −1 )
+
T −1 X t=2
−
xTT −1 −1 .
−1 ((xTt − xTt−1 ) − (xTt −1 − xTt−1 ))
CR IP T
purchase at index T , P T =
(7)
The sign of P T would determine if it is a buy or a sell, respectively. Thus we have
AN US
that:
xTT = xTT −1 + P T , therefore, xTT = 2xTT −1 − xTT −1 −1 . In Theorem 1, we chose that xTT −1 = xTT −2 , which additionally ensures that the
M
T −1 first component (xTT −1 − xTT −2 ) of PTT−1 = (xTT −1 − xTT −2 ) − (xTT −1 −1 − xT −2 ) is 0 (i.e. of
itself PTT−1 requires no purchase to be made). At the same time, the constraint of
ED
xTs ∈ [0, 1], 1 ≤ s ≤ T has to be respected, hence the projection, or truncation, of
CE
PT
the solution from Theorem 1:
xTs ∈ [0, 1], 1 ≤ s ≤ T − 3 xTT −2 ∈ [
T −1 xTT −1 −1 1 + xT −1 , ]. 2 2
(8)
AC
The level of control in this system is not the best possible, since xTT −1 has to be
forcefully pegged to xTT −2 , when there may be a high difference between asset prices ρT −1 and ρT . There is thus a situation of pegging xTT −1 to xTT −2 , which itself is bound 14
ACCEPTED MANUSCRIPT
by constraint (8). It would be preferable to equate xTT −1 to an independent value designated by SR-BLITS, and this is indeed the case when transaction cost is not 0. Thus, we discuss next the situation when transaction costs are non-trivial.
CR IP T
Also note that, in sum, a (T − 2) vector xTs is obtained to maximize the function SbT of (3) in the proposed SR-BLITS algorithm. It would be instructive to examine whether an equivalent change of the xTs vector helps in maximizing the numerator
AN US
AT of SbT (1). In particular, AT being the average backward-looking return, such an optimization would boost returns directly and not merely the returns-to-risk ratio that SR SbT represents. However, Proposition 1 above indicates
dAT dxT s
= 0, 1 ≤ s ≤
SR-BLITS with transaction cost
ED
4
M
T − 1, and hence any change in xTs does not change returns.
Let us consider transaction costs for the purchase P T at each T , which we express as
PT
δρT |P T | where δ is the rate of commission (e.g. 10 basis points or BPs would mean
CE
δ = 0.001). For now, we modify only RTT −1 from (2) such that:
AC
T −1 T −1 T RTT −1 = −rT −1 (−xTT −2 − (xTT −1 −1 − xT −2 )) − δρT |xT −1 − xT −1 |.
Thus, the return element RTT −1 asymmetrically takes the burden of the transaction cost on entire P T purchase (or sale). However, in a departure from 15
T dRT −1 dxT T −1
= 0 in the
ACCEPTED MANUSCRIPT
proof of Proposition 1 we now have: dRTT −1 = −δρT · sign(xTT −1 − xTT −1 −1 ), dxTT −1 dRT
with substitution in a sufficient condition derived from Proposition 1: RTT −1 dxTT −1 =
CR IP T
T −1
0. Thus we solve for xTT −1 and xTT −2 by constructing an equation:
T −1 −1 −1 T − xTT −2 )) + δρT |xTT −1 − xTT −1 (rT −1 (−xTT −2 − (xTT −1 −1 |) · δρT · sign(xT −1 − xT −1 ) = 0.
AN US
It would however be ideal to keep the equation linear in the two variables xTT −2 and xTT −1 (adding to the (T − 2) linear equations from Proposition 1), and thus we consider two situations. In the first, eventually we may have a solution xTT −1 such
M
−1 that xTT −1 − xTT −1 ≥ 0, and thus solve for xTT −1 , xTT −2 in the below:
ED
T −1 T −1 2 2 T (δρT rT −1 (−xTT −2 − (xTT −1 −1 − xT −2 )) + δ ρT (xT −1 − xT −1 )) = 0. −1 In the second case, if xTT −1 − xTT −1 < 0 then this gives rise to:
PT
T −1 T −1 2 2 T (−δρT rT −1 (−xTT −2 − (xTT −1 −1 − xT −2 )) + δ ρT (xT −1 − xT −1 )) = 0.
CE
Consequently, with the T − 2 equations from Proposition 1, this variation in the (T − 1)−th equation results in 2 matrix inversions of size (T − 1) × (T − 1) to
AC
T −1 T,2 T −1 produce two candidate solutions {xT,1 s }s=1 (resp. {xs }s=1 ) which may then be
i. choose a variant based on whether the assumption (xTT −1 − xTT −1 −1 ) >= 0 (resp.
(xTT −1 − xTT −1 −1 ) < 0) is satisfied, 16
ACCEPTED MANUSCRIPT
ii. check for this variant whether the requirement on SbT in Theorem 2 is satisfied, and finally iii. truncate to similar limits as in (8), i.e. xTT −1 ∈ [
T −1 −1 xT T −1 1+xT −1 , ] 2 2
CR IP T
A discussion of how to make the transaction cost’s distribution fairer is due here. We could apportion the transaction cost into each RtT as follows: DtT
∆
T
= δρT (|P | − |
T −1 X
s=1,s6=t
PsT |),
AN US
being the additional transaction cost that position xTt has caused to be incurred (it may even be negative and therefore a credit). This component of modified returns sequence RTt = RtT + DtT , behaves as follows:
M
−1 T DtT := δρT (|xTT −1 − xTT −1 | − |xTT −1 − xTT −1 −1 − Pt |)
T T T −1 −1 = δρT · sign(xTT −1 − xTT −1 − xTt−1 ))), −1 − ((xt − xt−1 ) − (xt
PT
dDtT dxTt
ED
−1 T T T −1 −1 = δρT (|xTT −1 − xTT −1 | − |xTT −1 − xTT −1 − xTt−1 ))|) −1 − ((xt − xt−1 ) − (xt
for which we may consider two cases as earlier. But such a situation would hold for
CE
each t, resulting in an exponential number of matrices to be inverted. For example, distributing transaction cost over last 3 steps T − 1, T − 2 and T − 3 would result
AC
in 2 × 2 × 2 = 8 matrix inversions. A similar problem would arise if the Downside Deviation Ratio, a returns-sensitive
substitute for the SR (cf. (Moody & Saffell, 2001, §II.E)), were to be employed. Also 17
ACCEPTED MANUSCRIPT
note the presence of variable xTT −1 in each expression of
dRtT dxT t
. Eventual simplification
(or any other insights contributing to low computational complexity) in this method
this temporal credit assignment problem.
5
Numerical Experiments
CR IP T
would enable a suitable, fair, transaction cost arrangement that would have solved
AN US
We first undertook an experiment with Geometric Brownian Motion (GBM) simulations of a security’s price with varying, though typical, parameters. A GBM simulation has volatility in the price trajectory of the asset (even if it is of i.i.d.
M
origin) and thus the aim was to see if a trading operation based on SR-BLITS would result in better effective SR. With parameters M = 30 and B = 10 for the algorithm,
ED
we simulated 2000 episodes of GBM with a randomly chosen growth-rate µ and variability σ values. These µ and σ were drawn from respective intervals [0.05, 0.15]
PT
representing growth-rate of between 5 and 15 per cent, and [0.1, 0.3] as volatility.
CE
The ρT values thus correspond to simulated daily prices for approximately a one month period. The experiment pertained to the variant proposed in Section 4, with
AC
transaction cost pegged at 3 basis points (BPs), i.e. δ = 0.0003. Note that transaction cost of 2 BPs in the Foreign Exchange market is not uncommon, e.g. (Dempster & Leemans, 2007). All GBM trajectories began with ρ1 = 100.0, and the trading 18
ACCEPTED MANUSCRIPT
strategy began calculating purchases P T (as also the effective position xT ) only for T ≥ B + 1 with xT = 1.0 for T ≤ B. An SR-maximising purchase P T was engaged
CR IP T
in, only if the SRs satisfied −1 SbT ({xTs }, {ρs }, {xsT −1 }) >= 2 × SaT −1 ({xTs −1 }Ts=1 , {ρs }Ts=1 ),
alongside modification if RHS term is negative, where 2 is replaced by 21 . A summary
AN US
of the results are in this table: The average SR for each episode shows a 70 percent Table 1: SR-BLITS for GBM, M=30, B=10 Metric
Episode SR
Cash-Flow
SR-BLITS
0.037
0.517
0.022
0.603
M
BH
ED
improvement. The average cash-flow in an episode considers outflow to purchase (net of transaction costs) and inflow when a sale occurs, including liquidating the
PT
position at the end of an episode. The mean values of both metrics are justified for
CE
reporting here vide the p-value 0.05 threshold for the one-sided t−test statistic. With M = 60 and B = 20, we found similar statistically-significant results. To explain
AC
these results, note that the SR may be favourable (indicating a better risk-return stance) but the cash-flow may well be worse (since they are a purely returns, i.e. RT related quantity). Indeed, it is clarified in (Moody & Saffell, 2001, (17)) that SR 19
ACCEPTED MANUSCRIPT
Table 2: SR-BLITS for GBM, M=60, B=20 Episode SR
Cash-Flow
SR-BLITS
0.035
1.377
BH
0.026
1.659
CR IP T
Metric
applies a penalty to returns greater than a certain threshold, and this is contrary to typical notions of risk and reward. Since the index T is increasing, tending towards
AN US
the episode upper-index M , it is best to restrict M to a suitably small number e.g. between 30 and 100. In works such as (Moody & Saffell, 2001) and (Molina, 2006), the simulated results employ a model of random walks with autoregressive trend
M
processes.
For the NSE index called Nifty, we used a daily 360−tick record (one per minute
ED
of 6 trading hours), spread over 132 consecutive days between 2014 and 2015. As
following:
PT
in GBM above, we kept M = 30, B = 10 and δ = 0.0003 (3 BPs), to observe the
AC
CE
Table 3: SR-BLITS for NSE Nifty 2014-2015 Metric
Episode SR
Cash-Flow
SR-BLITS
0.004
-5.864
BH
0.001
-5.984
20
ACCEPTED MANUSCRIPT
The NSE series had maximum and minimum values 9109.15, and 7965.25, respectively, making the above cashflow values very small in comparison. The difference between cash-flows per-episode is not statistically significant at the p = 0.05 level
CR IP T
(though Episode SR is). It is worth noting that the cash-flow for BH would be more favourable than the current, episodic, mean if buy-and-hold were to occur over the entire 132 day horizon. However, a buy-and-hold that lasts over multiple episodes is
AN US
not a fair comparison with SR-BLITS, which is essentially a trading strategy.
For the NASDAQ series, we similarly used the 360−tick record, spread over 132 consecutive days in 2014-2015. Here, SR-BLITS gave a per-episode average SR of 0.014 vs. BH 0.013 (with no statistical significance at p = 0.05 level, either).
M
Comparable outcome SRs in ‘QTrader’ of (Moody & Saffell, 1998), for example, are
ED
higher at 0.63 vs the correspondingly higher BH SR of 0.34. Note that this with profits being reinvested while trading, although there is also a transaction cost of 50
Conclusions
CE
6
PT
BPs. The code used here is made available for open-source use via (Abdulla, 2018).
AC
Optimization of a Sharpe-Ratio -based measure that compensates for past buy/sell decisions is described here in the form the algorithm SR-BLITS. The simulation model that we adopt to test SR-BLITS, Geometric Brownian Motion (GBM), in21
ACCEPTED MANUSCRIPT
dicates a 50 percent improvement in the achieved true Sharpe Ratio of returns. In terms of simulation models of a security’s price: random walk, GARCH, or ARMA simulation, adopted by other publications, was however not adopted by us. Further
CR IP T
experiments, with varied settings, could indicate the suitability of SR-BLITS to those models. The RRL algorithm in (Moody & Saffell, 2001) also considers a risk-free asset in a 2−asset portfolio, which we do not consider here. Another extension that
AN US
appears feasible is to further adapt the Orthogonal Bandit Algorithm of changing portfolio weights in (Shen et al., 2015). It should be possible to choose a portfolio weight vector at each decision index T based on not just the recommendation of algorithm in (Shen et al., 2015), but also in terms of whether past losses are optimally
References M.
S.
PT
Abdulla,
ED
M
corrected for, as in SR-BLITS.
(2018).
SR-BLITS
Code.
CE
https://in.mathworks.com/matlabcentral/fileexchange/66220-sr-blits. MathWorks.
AC
Abdulla, M. S., & Bhatnagar, S. (2015). A Transitions-only algorithm for Compact Action Set Markov Decision Processes. In Proceedings of the Indian Control Conference ICC, IEEE, Chennai, 5-7 Jan. 22
ACCEPTED MANUSCRIPT
Dempster, M., & Leemans, V. (2007). Design of an FX Trading System using Adaptive Reinforcement Learning. In 3rd annual carisma seminar, 26-27 june, london.
CR IP T
Gold, C. (2003). FX Trading Via Recurrent Reinforcement Learning. In Ieee international conference on computational intelligence for financial engineering.
Gorse, D. (2011). Application of stochastic recurrent reinforcement learning to index
AN US
trading. In European symposium on artificial neural networks, computational intelligence and machine learning.
Kwon, K.-Y., & Kish, R. J. (2002). A comparative study of technical trading strategies and return predictability: an extension of Brock, Lakonishok, and
M
LeBaron (1992) using NYSE and NASDAQ indices. The Quarterly Review of
ED
Economics and Finance, 42 , 611–631. Li, J., & Chan, L. (2006). Reward Adjustment Reinforcement Learning for Risk-
PT
averse Asset Allocation. In International Joint Conference on Neural Networks (IJCNN), Vancouver, 16-21 Jul. G.
CE
Molina,
AC
ment ford
(2006).
Learning
University.
Stock
(RRL),
Trading
CS In
229
with
Recurrent
Application
Reinforce-
Project,
Stan-
http://cs229.stanford.edu/proj2006/Molina-
StockTradingWithRecurrentReinforcementLearning.pdf.
23
ACCEPTED MANUSCRIPT
Moody, J., & Saffell, M. (1998). Reinforcement Learning for Trading. In Neural Information Processing Systems (NIPS).
Transactions on Neural Networks, 12 (4).
CR IP T
Moody, J., & Saffell, M. (2001). Learning to Trade via Direct Reinforcement. IEEE
Shen, W., Wang, J., Jiang, Y.-G., & Zha, H. (2015). Portfolio Choices with Orthogonal Bandit Learning. In 24th international joint conference on artificial
AC
CE
PT
ED
M
AN US
intelligence (ijcai).
24