SR-BLITS: Sharpe Ratio’s Backward-Looking Improvement as a Trading Strategy

SR-BLITS: Sharpe Ratio’s Backward-Looking Improvement as a Trading Strategy

SR-BLITS: Sharpe Ratio’s Backward-Looking Improvement as a Trading Strategy Accepted Manuscript SR-BLITS: Sharpe Ratio’s Backward-Looking Improvemen...

453KB Sizes 1 Downloads 14 Views

SR-BLITS: Sharpe Ratio’s Backward-Looking Improvement as a Trading Strategy

Accepted Manuscript

SR-BLITS: Sharpe Ratio’s Backward-Looking Improvement as a Trading Strategy Mohammed Shahid Abdulla PII: DOI: Reference:

S0970-3896(16)30188-4 https://doi.org/10.1016/j.iimb.2019.07.005 IIMB 338

To appear in:

IIMB Management Review

Received date: Revised date: Accepted date:

29 December 2016 30 January 2018 15 July 2019

Please cite this article as: Mohammed Shahid Abdulla, SR-BLITS: Sharpe Ratio’s BackwardLooking Improvement as a Trading Strategy, IIMB Management Review (2019), doi: https://doi.org/10.1016/j.iimb.2019.07.005

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

CR IP T

SR-BLITS: Sharpe Ratio’s Backward-Looking

AN US

Improvement as a Trading Strategy

Abstract

A common and trivial strategy with respect to a single security or a trade-

M

able asset is to simply buy-and-hold. In contrast, a trading strategy named

ED

SR-BLITS is proposed that takes a position based on buy and sell signals which are calculated at each decision index T . These signals are derived from the

PT

maximization of Sharpe Ratio (SR), a measure of risk-adjusted returns, which is calculated using values of the past (T − 1) returns. At index T , a vector

CE

of ideal SR-maximising positions – for all indices t < T thus far – is calculated, accounting for payments made to change the existing vector of positions.

AC

This purchase (or sale) to effectively correct all past positions taken at indices t < T , is assumed to be performed at the current price of the security or asset. The computation for these signals involves solving at most 2 systems of linear

1

ACCEPTED MANUSCRIPT

equations at each T , and only 1 if transaction cost is not considered. However, the matrix size to be inverted increases with T , requiring the algorithm to be restricted to episodes of some size M >= T . Numerical experiments on simu-

CR IP T

lated Geometric Brownian Motion (GBM) series using a range of parameters, as well as NSE and NASDAQ indices are conducted. With transaction costs considered, these reveal more than 30 percent improvement in average SR for

an episode of trading in GBM and NSE, when compared to a buy-and-hold

1

AN US

strategy.

Introduction and Problem Description

M

A trader in an asset attempts to optimize or maximize a suitable internal measure,

ED

such as profit, economic utility or risk-adjusted return. A trading system computes the optimum input variables into such a measure, and thus infers a trading decision

PT

that the trader executes in the market. In this paper, we propose the use of a novel metric called a backward-looking Sharpe Ratio (SR) and maximize this internal

CE

quantity. It is empirically observed that the trading decisions recommended by such

AC

a trading system (called SR-BLITS), results in positions that have a net SR favorable over the buy-and-hold (BH) strategy. SR-BLITS is currently applicable to trading systems that have been designed to trade in a single security or asset.

2

ACCEPTED MANUSCRIPT

Consider a time-series of asset prices {ρ1 , ρ2 , ..., ρT , ρT +1 ..., ρM } where M is a episode upper-limit. We assume that the overall sequence of an asset’s or security’s price can be composed of M −length arrays in order to apply a trading strategy. At



CR IP T

a time T , where T ≤ M , after the price ρT of the asset is observed, a T − 1 sized −1 returns sequence {rt }Tt=1 can be inferred where rt =ρt+1 − ρt , for t < T . Note that

these returns are not normalized, since this application pertains to a single price

AN US

T series {ρt }M t=1 . Consider Rt := rt , since rt would be the return encountered in a BH

strategy. In further discussion, we will use a different definition of returns sequence −1 {RtT }Tt=1 . Thus,

AT q , where 2 T T B − (A ) PT −1 T PT −1 T 2 (Rt ) T T t=1 Rt A = , and B = t=1 T −1 T −1 ∆

=

(1)

ED

M

ST

is the Sharpe Ratio (SR) defined with these returns. Consider an alternate definition

PT

of SR that we shall use as short-hand later:

CE

−1 SaT ({xt }Tt=1 , {ρt }Tt=1 ) , RtT = xt (ρt+1 − ρt ) = xt · rt .

The S T defined in (1) can be rewritten as S T = SaT ({1}, {ρt }Tt=1 ), where {1} is

AC

a T − 1 size vector of 1s indicating a conventional BH strategy. The 1s indicate

that the asset has been bought at t = 1 at corresponding price ρ1 , held for each 1 < t ≤ T , and has not been sold at any intervening index. The vector of positions 3

ACCEPTED MANUSCRIPT

is s.t. 0 ≤ xTt ≤ 1, with xTt = 1.0 implying that η units of the asset are held at t, some of which may be purchased at price ρt (depending on what the position xTt−1 was). Also, the position size η is a positive integer large enough such that xTt · η is

CR IP T

a positive integer, indicating the actual number of shares of the asset held. Assume that no short positions are taken at any t and also that, temporarily, no transaction costs are due for any purchase or sale of the asset made at index T . The latter

AN US

assumption will be relaxed later in §4.

An insight used in this work is a correction for past positions, provided that such corrections are made at current price ρT . Suppose that against causality, it is actually possible to ‘correct’ position xT −k to a more suitable x0T −k , the latter

M

prescribed by some optimization algorithm. We must thus request another assetholding entity, one who exclusively engages in BH, to transfer (x0T −k − xT −k ) nos. of

ED

the asset that she held at index (T − k). We pay her the amount (x0T −k − xT −k ) · ρT

PT

so that she can make good the (x0T −k − xT −k ) nos. transferred from her holding to us. Our return at (T − k) has to be adjusted to incorporate the amount paid, thus

CE

RT −k := rT −k · x0T −k − ρT · (x0T −k − xT −k ).

AC

However, this return does not accrue to us in any real sense - it is only an element of the T −length time-series for which we calculate the internal or backward-looking SR. The findings from simulating SR-BLITS is that the decision xT (4) of a purchase

4

ACCEPTED MANUSCRIPT

or sale made at each T , by the above BH-only entity, is empirically observed to have good SR. This outcome occurs is because these purchase/sale actions are made against optimization of the internal SR, which proposes a vector of new positions like

CR IP T

x0T −k . Note also that the shares purchased from the BH-only entity for index (T − k) are not exactly (x0T −k − xT −k ) in number, since this is only an illustration. The number is discounted by quantities that depend on x0T −k−1 and xT −k−1 , as explained

2

AN US

in (2) below.

Review of Literature

M

Technical trading strategies based on price, momentum and volume exist and are regularly reviewed, e.g. (Kwon & Kish, 2002). There is some analogous work in the

ED

literature which also optimizes SR before taking a trading decision. Systems that optimize trading-related SR in a single security, using a specific machine-learning tech-

PT

nique named Recurrent Reinforcement Learning (RRL), were considered in (Moody

CE

& Saffell, 1998). In particular, the strategy in (Moody & Saffell, 1998) was able to outperform BH on the the S&P 500 index, considered over 25 years, with each

AC

month treated as time t. A shortcoming in that work was the’ constant magnitude’ assumption, whereby the position in the security is either of {−1, 0, 1}, with −1 indicating being η shares ‘short’ in the security. In contrast, the algorithm proposed 5

ACCEPTED MANUSCRIPT

here assumes the portfolio xTt · η of the tradeable asset takes values in {0, 1, 2, ...., η}, for an η >> 1 at each instant t. This gives more number of discrete actions to choose from at each instant, and also enables better risk control as stated in. Techniques

CR IP T

from Reinforcement Learning (RL) and optimal control have been applied to the single-asset trading problem, e.g. (Li & Chan, 2006). Hence, one may also draw justification for this situation using a common observation in RL. This observation

AN US

is that compact action set control problems have better optimal policies over similar problems with discrete action sets, e.g. (Abdulla & Bhatnagar, 2015). Thus, control in terms of constant magnitude positions like {−1, 0, +1} is less preferable to control via actions in the interval [−1, 1].

M

The model in (Moody & Saffell, 1998) assumes that a series of external variables

ED

{y1 , y2 , ...., yT −1 , yT } is also available to the trader during optimization. There are 84 different input series to calculate a single trading decision, which in turn is based

PT

on the maximization of a non-linear function using the stochastic gradient ascent algorithm. The above work uses multi-input neural networks, while we do not use

CE

any variables beyond the two series of asset prices {ρs } and internal positions {xTs −1 }

AC

in this work. On the one hand, this enables to keep computational load lighter, as and when the proposed algorithm is ported to high-frequency trading in a highly liquid asset. On the other, the internal measure in our case is optimized in closed-form,

6

ACCEPTED MANUSCRIPT

and since it is a SR it accommodates only returns-related terms. To further compare with the model used in (Moody & Saffell, 1998), one may assume that returns from the risk-free asset rtf are 0 for each t in our case, due to trading over small-length

CR IP T

episodes of size M . An algorithm that chooses a portfolio by estimating the highest SR can also seen in more recent work such as (Shen, Wang, Jiang, & Zha, 2015).

Using a variation of the RRL was the aim in (Gorse, 2011), which also considered

AN US

daily buy-sell signals to evaluate the modified algorithm’s efficacy over simple BH. However, in (Gorse, 2011), the algorithm considers maximisation of returns alone, and does not consider an objective based on risk-adjusted returns like the SR. An easy-to-follow student project based on (Moody & Saffell, 1998), which also doesn’t

M

consider risk-free assets explicitly (but optimizes the SR), is to be found at (Molina, 2006). This project employs past returns rt observed on a security as the inputs to

ED

a learning technique, but applies only one decision xTT out of a entire vector {xTt }Tt=1

PT

of learned output decisions. This is natural in the sense that the remaining decisions T −1 pertain to the past, where decisions {x11 , x22 , ..., xTT −1 {xTt }t=1 −1 } have already been

CE

taken by the algorithm. The optimization of the SR in (Molina, 2006) relies on

AC

output variables {xTt }Tt=1 being interdependent. Hence, applying only the one such −1 output xTT , while neglecting {xTt }Tt=1 , seems insufficient.

A similar SR related maximization is attempted, but with respect to Foreign

7

ACCEPTED MANUSCRIPT

Exchange prices, in (Gold, 2003). After noticing that performance of algorithm in (Gold, 2003) has become poorer in Forex markets, modifications were suggested in (Dempster & Leemans, 2007). Also to be noted is that above references employ

CR IP T

Machine-Learning techniques or compare with algorithms that do (Reinforcement Learning, Genetic Algorithms) where differing types of inputs are required. In our case, we employ a different tack whereby we directly infer a sequence of positions

AN US

{xTt }Tt=1 that maximises the backwards-looking SR, and then make a purchase at the current price to justify changes in the reference sequence of positions, termed −1 {xTt −1 }Tt=1 . The notation of the problem, and a specific explanation of the method

SR-BLITS without transaction costs

ED

3

M

SR-BLITS, follows:

At decision index T there are earlier ’tics’ or indices 1 ≤ t ≤ T , for which we wish

PT

to design an internal sequence of positions {xTs }Ts=1 , such that a suitable internal

CE

SR is maximised. To describe this internal SR proposed above: assume a reference

AC

sequence of positions in the asset, calculated at earlier decision index T − 1, as

8

ACCEPTED MANUSCRIPT

−1 {xTs −1 }Ts=1 . Now define a new returns sequence for 1 ≤ t ≤ T − 1 as below:

RtT = rt xTt − (ρT − ρt )PtT , where (2)

CR IP T

−1 PtT = (xTt − xTt−1 ) − (xTt −1 − xTt−1 ),

in which PtT indicates the purchase to be made at the T −th index to correct the earlier internal position xTt −1 for the t−th index onto its new, adjusted, value xTt . This adjustment is done since it is attractive due to the asset being currently available

AN US

at ρT , even though the return at the t−index will be adjusted for the difference between ρt+1 and ρt . The form of PtT is explained next: PtT is not just assigned the difference xTt − xtT −1 between the two positions xTt and xTt −1 from decision arrays

M

{xTs } and {xTs −1 }, respectively. Instead, it is (xTt − xTt−1 ) (i.e. the extra purchase needed at t vis-a-vis the already corrected position at t − 1 xTt−1 ), further discounted

ED

−1 by the previous profile’s additional purchase for epoch t, i.e. (xTt −1 − xTt−1 ). The

PT

only exception to this description is when t = 1, where the purchase needed is simply P1T = xT1 − xT1 −1 . For such cases, the alternate definition of SR that we shall use as

AC

CE

short-hand has three parameters: T −1 −1 SbT ({xTs }s=1 , {ρs }Ts=1 , {xTs −1 }Ts=1 ) ,

(3)

where a new set of T − 1 positions xTs are calculated with reference to T asset prices and earlier sequence of positions xTs −1 . 9

ACCEPTED MANUSCRIPT

Neither SaT nor SbT are true SRs: the purchase P T will dictate the true, effective, position xT as follows: (4)

CR IP T

xT := xT −1 + P T , B + 1 ≤ T ≤ M .

Also note that a positive integer B << M exists, which is a bootstrap index. Thus, even upon discovery of price ρB at index B, the position in the asset is regular buy-and-hold (BH), and buy-sell signals are calculated only for later indices T > B.

AN US

This is done so that the metric SbT for optimization, from which the buy-sell signals xT are inferred, can become stable and not be overly sensitive to minor changes in the internal positions that SR-BLITS recommends. The default position in the

M

asset during the bootstrap period will be BH, i.e. xT = 1.0, T ≤ B, in order for B B the SR Sa ({1}, {ρs }B+1 s=1 ) to have stabilized. Thus, the {xs }s=1 profile that will be

ED

B }B+1 used as reference for computing {xB+1 s=1 would simply be xs = 1.0, ∀s ≤ B. s

PT

The rationale is that SR-maximising purchases P T at each T will also bring about an enhanced SR of the effective positions xT that result from this strategy. The

CE

final comparison, therefore, is between the SR for positions {xT }M T =1 calculated by SR-BLITS and the BH position {1}M T =1 . Note that other measures of the buy/sell

AC

suitability of current asset price ρT vis-a-vis earlier prices exist, e.g. the moving

average indicator

1 T −1

PT −1 s=1

ρs . The moving average indicator is however so generic

so as to not be path-dependent, the latter property being crucial to an algorithmic 10

ACCEPTED MANUSCRIPT

trading system (Moody & Saffell, 1998). As part of preliminary results, a property of AT used above in (1) will be employed

Proposition 1. For 1 ≤ t ≤ T − 1, Proof: Recall that AT = and

T dRt+1 dxT t

1 T −1

dAT dxT t

PT −1 t=1

= (ρT − ρt+1 ), we have that

= 0.

CR IP T

in later results:

RtT . Now note that

dRtT dxT t

+

T dRt+1 dxT t

dRtT dxT t

= rt − (ρT − ρt ),

= 0, since rt = ρt+1 − ρt . Result

AN US

thus holds true for xTt with 1 ≤ t ≤ T − 2, since there is no participation in any T and RtT . Further, since xTT −1 is a variable only RsT terms within AT other than Rt+1 T −1 in term RTT −1 , we have that RTT −1 = −(ρT − ρT −1 )(−xTT −2 − (xTT −1 −1 − xT −2 )) due to T dRT −1 dxT T −1

= 0.



M

rT −1 = (ρT − ρT −1 ), and thus

Since SbT has to be maximized over the (T − 1)−sized vector {xTs }, we calculate dS T dxT t

for each position xTt , t ≤ T − 1. Though xTT is part of {xTs }Ts=1

ED

an expression for

by notation, it does not currently contribute to any component in SbT . However,

CE

discovered).

PT

it will contribute to the returns sequence in next index T + 1 (after price ρT +1 is

AC

−2 Thus we have a sufficient condition to characterize the maximizing vector {xTs }Ts=1 .

11

ACCEPTED MANUSCRIPT

Consider these definitions: =

rt+1 (ρT − ρt+1 ) − (ρT − ρt+1 )2 ,

aT,2 t

=

rt2 − 2rt (ρT − ρt ) + (ρT − ρt )2 + (ρT − ρt+1 )2

aT,3 t

=

rt (ρT − ρt ) − (ρT − ρt )2 , 1 ≤ t ≤ T − 2

bTt

=

−1 −1 ((ρT − ρt )2 − rt (ρT − ρt ))(xTt −1 − xTt−1 ) − (ρT − ρt+1 )2 (xTt+1 − xTt −1 ),

CR IP T

aT,1 t

∀t s.t. 2 ≤ t ≤ T − 2 =

((ρT − ρ1 )2 − r1 (ρT − ρ1 ))xT1 −1 − (ρT − ρ2 )2 (xT2 −1 − xT1 −1 )

AN US

bT1

Theorem 1. Assuming a non-singular matrix of coefficients and xTT −1 := xTT −2 , the system of T − 2 linear equations,

M

T,2 T T aT,1 = bT1 1 x 2 + a1 x 1

ED

T,2 T T,3 T T T aT,1 t xt+1 + at xt + at xt−1 = bt ,

,

PT

∀t s.t. 2 ≤ t ≤ T − 2

CE

has a solution which is a candidate maximum (point of inflexion) for SbT ({xTs }, {ρs }, {xTs −1 }).

AC

Proof: Using quotient rule of differentiation, and Proposition 1 with regard to AT , we have for 2 ≤ t ≤ T − 2: T T dSbT ({xTs }, {ρs }, {xsT −1 }) −AT T dRt T dRt+1 = {R + R }, (5) 3 t t+1 T T 2 2 dxTt dx dx T T t t (T − 1)(B − (A ) )

12

ACCEPTED MANUSCRIPT

T , equate the RHS to 0, and substitute for RtT , Rt+1

dRtT dxT t

and

T dRt+1 dxT t

from (2) and the

proof of Proposition 1, respectively. Appropriate substitution for t = 1 also yields 

T,2 T T T first equation, aT,1 1 x 2 + a1 x 1 = b 1 .

CR IP T

Note that (T − 2)−equations however are in (T − 1)−variables, hence the assignment xTT −1 := xTT −2 was necessary. In the following, we establish a result regarding the second derivative

T −1 }) d2 S T ({xT s },{ρs },{xs , 2 T d(xt )

and show that it is verifiably negative.

AN US

−1 Theorem 2. If, for a solution {xTs }Ts=1 obtained from Theorem 1, the corresponding −1 SbT ({xTs }, {ρs }, {xsT −1 }) > 0, then {xTs }Ts=1 is the maximum.

Proof: Applying shorthand S T , and using the quotient rule on (5) above: 2

3

d2 S T 2

M

d(xTt )

2

T T 2 T d2 Rt+1 dRt+1 −(S T ) dRtT T d Rt T ( = (( ) + ( ) + R ( ) + R t t+1 2 2 )) dxTt (T − 1)(AT )2 dxTt d(xTt ) d(xTt ) 2 dS T dxT t

ED

T T −3(S T ) T dRt+1 T dRt + Rt+1 }· + {Rt dxTt dxTt 3

2

2

dRtT dRtT −(S T ) = (( ) + ( )) dxTt (T − 1)(AT )2 dxTt

2

3

T

(AT ) + (S T ) 2AT dA dxT t

d2 RT

PT

(6)

(T − 1)(AT )4

2

The second equality is obtained from (5), which equates to 0, as well as ( d(xT t)2 ) = 0, t

= 0, due to constants inside the terms

CE

2 d2 RT ( dxt+1 T ) t

dRT 2

dRtT dxT t

and

T dRt+1 . dxT t

In this second equal-

dRT 2

−1 −1 , {ρs }Ts=1 , {xTs −1 }Ts=1 )> ity, the term (( dxTt ) +( dxTt ) ) > 0, hence if S T ≡ SbT ({xTs }Ts=1 t

AC

0, then

3 −(S T )

(T −1)(AT )2

t

< 0, which is the second-order condition for the maximum.



−1 Notice that reference sequence of positions {xTs −1 }Ts=1 had (T −1) position values,

and thus there must be an additional position in {xTs }, that of xTT . This position xTT 13

ACCEPTED MANUSCRIPT

is purely a position in the internally maintained sequence {xTs }Ts=1 . The aggregate

P

T

=

(xT1

=

xTT −1

PT −1 t=1



PtT , is made at price ρT , thus

xT1 −1 )

+

T −1 X t=2



xTT −1 −1 .

−1 ((xTt − xTt−1 ) − (xTt −1 − xTt−1 ))

CR IP T

purchase at index T , P T =

(7)

The sign of P T would determine if it is a buy or a sell, respectively. Thus we have

AN US

that:

xTT = xTT −1 + P T , therefore, xTT = 2xTT −1 − xTT −1 −1 . In Theorem 1, we chose that xTT −1 = xTT −2 , which additionally ensures that the

M

T −1 first component (xTT −1 − xTT −2 ) of PTT−1 = (xTT −1 − xTT −2 ) − (xTT −1 −1 − xT −2 ) is 0 (i.e. of

itself PTT−1 requires no purchase to be made). At the same time, the constraint of

ED

xTs ∈ [0, 1], 1 ≤ s ≤ T has to be respected, hence the projection, or truncation, of

CE

PT

the solution from Theorem 1:

xTs ∈ [0, 1], 1 ≤ s ≤ T − 3 xTT −2 ∈ [

T −1 xTT −1 −1 1 + xT −1 , ]. 2 2

(8)

AC

The level of control in this system is not the best possible, since xTT −1 has to be

forcefully pegged to xTT −2 , when there may be a high difference between asset prices ρT −1 and ρT . There is thus a situation of pegging xTT −1 to xTT −2 , which itself is bound 14

ACCEPTED MANUSCRIPT

by constraint (8). It would be preferable to equate xTT −1 to an independent value designated by SR-BLITS, and this is indeed the case when transaction cost is not 0. Thus, we discuss next the situation when transaction costs are non-trivial.

CR IP T

Also note that, in sum, a (T − 2) vector xTs is obtained to maximize the function SbT of (3) in the proposed SR-BLITS algorithm. It would be instructive to examine whether an equivalent change of the xTs vector helps in maximizing the numerator

AN US

AT of SbT (1). In particular, AT being the average backward-looking return, such an optimization would boost returns directly and not merely the returns-to-risk ratio that SR SbT represents. However, Proposition 1 above indicates

dAT dxT s

= 0, 1 ≤ s ≤

SR-BLITS with transaction cost

ED

4

M

T − 1, and hence any change in xTs does not change returns.

Let us consider transaction costs for the purchase P T at each T , which we express as

PT

δρT |P T | where δ is the rate of commission (e.g. 10 basis points or BPs would mean

CE

δ = 0.001). For now, we modify only RTT −1 from (2) such that:

AC

T −1 T −1 T RTT −1 = −rT −1 (−xTT −2 − (xTT −1 −1 − xT −2 )) − δρT |xT −1 − xT −1 |.

Thus, the return element RTT −1 asymmetrically takes the burden of the transaction cost on entire P T purchase (or sale). However, in a departure from 15

T dRT −1 dxT T −1

= 0 in the

ACCEPTED MANUSCRIPT

proof of Proposition 1 we now have: dRTT −1 = −δρT · sign(xTT −1 − xTT −1 −1 ), dxTT −1 dRT

with substitution in a sufficient condition derived from Proposition 1: RTT −1 dxTT −1 =

CR IP T

T −1

0. Thus we solve for xTT −1 and xTT −2 by constructing an equation:

T −1 −1 −1 T − xTT −2 )) + δρT |xTT −1 − xTT −1 (rT −1 (−xTT −2 − (xTT −1 −1 |) · δρT · sign(xT −1 − xT −1 ) = 0.

AN US

It would however be ideal to keep the equation linear in the two variables xTT −2 and xTT −1 (adding to the (T − 2) linear equations from Proposition 1), and thus we consider two situations. In the first, eventually we may have a solution xTT −1 such

M

−1 that xTT −1 − xTT −1 ≥ 0, and thus solve for xTT −1 , xTT −2 in the below:

ED

T −1 T −1 2 2 T (δρT rT −1 (−xTT −2 − (xTT −1 −1 − xT −2 )) + δ ρT (xT −1 − xT −1 )) = 0. −1 In the second case, if xTT −1 − xTT −1 < 0 then this gives rise to:

PT

T −1 T −1 2 2 T (−δρT rT −1 (−xTT −2 − (xTT −1 −1 − xT −2 )) + δ ρT (xT −1 − xT −1 )) = 0.

CE

Consequently, with the T − 2 equations from Proposition 1, this variation in the (T − 1)−th equation results in 2 matrix inversions of size (T − 1) × (T − 1) to

AC

T −1 T,2 T −1 produce two candidate solutions {xT,1 s }s=1 (resp. {xs }s=1 ) which may then be

i. choose a variant based on whether the assumption (xTT −1 − xTT −1 −1 ) >= 0 (resp.

(xTT −1 − xTT −1 −1 ) < 0) is satisfied, 16

ACCEPTED MANUSCRIPT

ii. check for this variant whether the requirement on SbT in Theorem 2 is satisfied, and finally iii. truncate to similar limits as in (8), i.e. xTT −1 ∈ [

T −1 −1 xT T −1 1+xT −1 , ] 2 2

CR IP T

A discussion of how to make the transaction cost’s distribution fairer is due here. We could apportion the transaction cost into each RtT as follows: DtT



T

= δρT (|P | − |

T −1 X

s=1,s6=t

PsT |),

AN US

being the additional transaction cost that position xTt has caused to be incurred (it may even be negative and therefore a credit). This component of modified returns sequence RTt = RtT + DtT , behaves as follows:

M

−1 T DtT := δρT (|xTT −1 − xTT −1 | − |xTT −1 − xTT −1 −1 − Pt |)

T T T −1 −1 = δρT · sign(xTT −1 − xTT −1 − xTt−1 ))), −1 − ((xt − xt−1 ) − (xt

PT

dDtT dxTt

ED

−1 T T T −1 −1 = δρT (|xTT −1 − xTT −1 | − |xTT −1 − xTT −1 − xTt−1 ))|) −1 − ((xt − xt−1 ) − (xt

for which we may consider two cases as earlier. But such a situation would hold for

CE

each t, resulting in an exponential number of matrices to be inverted. For example, distributing transaction cost over last 3 steps T − 1, T − 2 and T − 3 would result

AC

in 2 × 2 × 2 = 8 matrix inversions. A similar problem would arise if the Downside Deviation Ratio, a returns-sensitive

substitute for the SR (cf. (Moody & Saffell, 2001, §II.E)), were to be employed. Also 17

ACCEPTED MANUSCRIPT

note the presence of variable xTT −1 in each expression of

dRtT dxT t

. Eventual simplification

(or any other insights contributing to low computational complexity) in this method

this temporal credit assignment problem.

5

Numerical Experiments

CR IP T

would enable a suitable, fair, transaction cost arrangement that would have solved

AN US

We first undertook an experiment with Geometric Brownian Motion (GBM) simulations of a security’s price with varying, though typical, parameters. A GBM simulation has volatility in the price trajectory of the asset (even if it is of i.i.d.

M

origin) and thus the aim was to see if a trading operation based on SR-BLITS would result in better effective SR. With parameters M = 30 and B = 10 for the algorithm,

ED

we simulated 2000 episodes of GBM with a randomly chosen growth-rate µ and variability σ values. These µ and σ were drawn from respective intervals [0.05, 0.15]

PT

representing growth-rate of between 5 and 15 per cent, and [0.1, 0.3] as volatility.

CE

The ρT values thus correspond to simulated daily prices for approximately a one month period. The experiment pertained to the variant proposed in Section 4, with

AC

transaction cost pegged at 3 basis points (BPs), i.e. δ = 0.0003. Note that transaction cost of 2 BPs in the Foreign Exchange market is not uncommon, e.g. (Dempster & Leemans, 2007). All GBM trajectories began with ρ1 = 100.0, and the trading 18

ACCEPTED MANUSCRIPT

strategy began calculating purchases P T (as also the effective position xT ) only for T ≥ B + 1 with xT = 1.0 for T ≤ B. An SR-maximising purchase P T was engaged

CR IP T

in, only if the SRs satisfied −1 SbT ({xTs }, {ρs }, {xsT −1 }) >= 2 × SaT −1 ({xTs −1 }Ts=1 , {ρs }Ts=1 ),

alongside modification if RHS term is negative, where 2 is replaced by 21 . A summary

AN US

of the results are in this table: The average SR for each episode shows a 70 percent Table 1: SR-BLITS for GBM, M=30, B=10 Metric

Episode SR

Cash-Flow

SR-BLITS

0.037

0.517

0.022

0.603

M

BH

ED

improvement. The average cash-flow in an episode considers outflow to purchase (net of transaction costs) and inflow when a sale occurs, including liquidating the

PT

position at the end of an episode. The mean values of both metrics are justified for

CE

reporting here vide the p-value 0.05 threshold for the one-sided t−test statistic. With M = 60 and B = 20, we found similar statistically-significant results. To explain

AC

these results, note that the SR may be favourable (indicating a better risk-return stance) but the cash-flow may well be worse (since they are a purely returns, i.e. RT related quantity). Indeed, it is clarified in (Moody & Saffell, 2001, (17)) that SR 19

ACCEPTED MANUSCRIPT

Table 2: SR-BLITS for GBM, M=60, B=20 Episode SR

Cash-Flow

SR-BLITS

0.035

1.377

BH

0.026

1.659

CR IP T

Metric

applies a penalty to returns greater than a certain threshold, and this is contrary to typical notions of risk and reward. Since the index T is increasing, tending towards

AN US

the episode upper-index M , it is best to restrict M to a suitably small number e.g. between 30 and 100. In works such as (Moody & Saffell, 2001) and (Molina, 2006), the simulated results employ a model of random walks with autoregressive trend

M

processes.

For the NSE index called Nifty, we used a daily 360−tick record (one per minute

ED

of 6 trading hours), spread over 132 consecutive days between 2014 and 2015. As

following:

PT

in GBM above, we kept M = 30, B = 10 and δ = 0.0003 (3 BPs), to observe the

AC

CE

Table 3: SR-BLITS for NSE Nifty 2014-2015 Metric

Episode SR

Cash-Flow

SR-BLITS

0.004

-5.864

BH

0.001

-5.984

20

ACCEPTED MANUSCRIPT

The NSE series had maximum and minimum values 9109.15, and 7965.25, respectively, making the above cashflow values very small in comparison. The difference between cash-flows per-episode is not statistically significant at the p = 0.05 level

CR IP T

(though Episode SR is). It is worth noting that the cash-flow for BH would be more favourable than the current, episodic, mean if buy-and-hold were to occur over the entire 132 day horizon. However, a buy-and-hold that lasts over multiple episodes is

AN US

not a fair comparison with SR-BLITS, which is essentially a trading strategy.

For the NASDAQ series, we similarly used the 360−tick record, spread over 132 consecutive days in 2014-2015. Here, SR-BLITS gave a per-episode average SR of 0.014 vs. BH 0.013 (with no statistical significance at p = 0.05 level, either).

M

Comparable outcome SRs in ‘QTrader’ of (Moody & Saffell, 1998), for example, are

ED

higher at 0.63 vs the correspondingly higher BH SR of 0.34. Note that this with profits being reinvested while trading, although there is also a transaction cost of 50

Conclusions

CE

6

PT

BPs. The code used here is made available for open-source use via (Abdulla, 2018).

AC

Optimization of a Sharpe-Ratio -based measure that compensates for past buy/sell decisions is described here in the form the algorithm SR-BLITS. The simulation model that we adopt to test SR-BLITS, Geometric Brownian Motion (GBM), in21

ACCEPTED MANUSCRIPT

dicates a 50 percent improvement in the achieved true Sharpe Ratio of returns. In terms of simulation models of a security’s price: random walk, GARCH, or ARMA simulation, adopted by other publications, was however not adopted by us. Further

CR IP T

experiments, with varied settings, could indicate the suitability of SR-BLITS to those models. The RRL algorithm in (Moody & Saffell, 2001) also considers a risk-free asset in a 2−asset portfolio, which we do not consider here. Another extension that

AN US

appears feasible is to further adapt the Orthogonal Bandit Algorithm of changing portfolio weights in (Shen et al., 2015). It should be possible to choose a portfolio weight vector at each decision index T based on not just the recommendation of algorithm in (Shen et al., 2015), but also in terms of whether past losses are optimally

References M.

S.

PT

Abdulla,

ED

M

corrected for, as in SR-BLITS.

(2018).

SR-BLITS

Code.

CE

https://in.mathworks.com/matlabcentral/fileexchange/66220-sr-blits. MathWorks.

AC

Abdulla, M. S., & Bhatnagar, S. (2015). A Transitions-only algorithm for Compact Action Set Markov Decision Processes. In Proceedings of the Indian Control Conference ICC, IEEE, Chennai, 5-7 Jan. 22

ACCEPTED MANUSCRIPT

Dempster, M., & Leemans, V. (2007). Design of an FX Trading System using Adaptive Reinforcement Learning. In 3rd annual carisma seminar, 26-27 june, london.

CR IP T

Gold, C. (2003). FX Trading Via Recurrent Reinforcement Learning. In Ieee international conference on computational intelligence for financial engineering.

Gorse, D. (2011). Application of stochastic recurrent reinforcement learning to index

AN US

trading. In European symposium on artificial neural networks, computational intelligence and machine learning.

Kwon, K.-Y., & Kish, R. J. (2002). A comparative study of technical trading strategies and return predictability: an extension of Brock, Lakonishok, and

M

LeBaron (1992) using NYSE and NASDAQ indices. The Quarterly Review of

ED

Economics and Finance, 42 , 611–631. Li, J., & Chan, L. (2006). Reward Adjustment Reinforcement Learning for Risk-

PT

averse Asset Allocation. In International Joint Conference on Neural Networks (IJCNN), Vancouver, 16-21 Jul. G.

CE

Molina,

AC

ment ford

(2006).

Learning

University.

Stock

(RRL),

Trading

CS In

229

with

Recurrent

Application

Reinforce-

Project,

Stan-

http://cs229.stanford.edu/proj2006/Molina-

StockTradingWithRecurrentReinforcementLearning.pdf.

23

ACCEPTED MANUSCRIPT

Moody, J., & Saffell, M. (1998). Reinforcement Learning for Trading. In Neural Information Processing Systems (NIPS).

Transactions on Neural Networks, 12 (4).

CR IP T

Moody, J., & Saffell, M. (2001). Learning to Trade via Direct Reinforcement. IEEE

Shen, W., Wang, J., Jiang, Y.-G., & Zha, H. (2015). Portfolio Choices with Orthogonal Bandit Learning. In 24th international joint conference on artificial

AC

CE

PT

ED

M

AN US

intelligence (ijcai).

24