Testing the profitability of technical trading rules on stock markets ANDREI ANGHEL Finance, Insurance, Banking and Stock Exchange* andrei.anghel@strade.ro CRISTIANA TUDOR International Business and Economics* cristiana.tudor@net.ase.ro MARIA TUDOR Applied Mathematics* maria.tudor@net.ase.ro *The Bucharest University of Economic Studies 6, PiataRomana, Bucharest 010374 ROMANIA Abstract:This empirical study continues previous research [2] that investigated the profitability of simple technical trading rules. We have added Athena Stock Exchange Composite Index to our universe of three previously researched indices: FTSE in Great Britain, S&P in USA and DAX in Germany. We have applied a new test based on Hansen s SPA Test [6] with the primary goal of rejecting the hypothesis of data mining resulted from a possible survivorship bias of trading rules; we have also provided a detailed explanation of our bootstrap implementation of the test. We did manage to find lower p-values for the new test that compared favorably to White s Reality Check [9], but we were still not able to reject the null hypothesis that data mining could partially be responsible for the apparent superior performance of some of the trading rules that we have tested on the DAX and S&P indices. On the other hand, we are more confident that data mining cannot be ruled out as a cause for perceived superior performance for any of the 100 rules tested on the FTSE index. Finally, we have discovered not only that all the 100 tested strategies manage to beat the buy & hold benchmark on the Greek composite index, but also that they explain that market dynamics better than a random walk with drift, and that is highly unlikely that data mining could have distorted these findings, which are important both for academics and investment professionals. We suspect that the superior performance of these simple rules is influenced by:1. the inclusion of dividends in the researched index and 2. bythe degree of maturity of thatmarket. Key-Words: technical trading strategy, bootstrapping, data mining, reality check, superior predictive ability 1 Introduction and related literature Technical trading rules are one of the first tools used for analyzing stock markets and for taking investment decisions. As the current investment environment is filled with electronic trading platforms that promise to deliver high-speed order execution, technical rules are also the (apparently) easiest tool for the average investor to come up with some documented investment decisions in real time. Due to the aggressive marketing of electronic platforms vendors, technical analysis is been however promoted among inexperienced investors much more than its actual merits would justify. It is exactly these merits that our paper tries to analyze and check for consistency on recent data. Our endeavor is not unique and there are a number of papers debating the subject. One of the first attempts to bring light on the controversy of technical trading usability was the paper of Sidney Alexander, Price Movements in Speculative Markets: Trends or Random Walks (1961): the author seemed to have found compelling evidence that the simplest of trading rules can turn a profit. ISBN: 978-960-474-325-4 144
The study was later rebukedby Fama and Blume (1966) as it was based on a oversimplification of returns (the author had computed returns by using a clever mathematical modeling, that unfortunately did not account for fast markets). However, Brock, Lakonishok and LeBaron (1992) took another look at some of the simplest of the trading rules and using bootstrapping they found some strong evidence that simple moving averages were actually better at modeling market dynamics than some popular null models: random walk with a drift, AR(1) and Garch models. The simple but computer-intensive bootstrapping test did not account however for the danger of spurious results due to data mining, and the authors themselves acknowledged this fact and tried to mitigate the problem by presenting the results of all the tests that they have performed. However, data mining might be the result not only of some researcher s misguided effort, but also of multiple researchers and practitioners trials over the course of so many years since technical trading is been in use: it might very well be a fact that the parameters used by the authors were the results of a survivorship bias, that took place over the years, in exactly the same way that the average return of hedge funds is upwardly biased due to unintentional and hardly avoidable exclusion of poor performing or failed hedge funds. Facing this actual danger brought upon by the sheer quantity of information and computing power, White (2000) developed the Reality Check for data snooping, a test that compared the alleged performance of some trading rule with the supremum of performance for all the rules considered for each of the bootstrapped samples.anghel and Tudor (2013) applied the test to investigate the profitability of a collection of 100 trading rules on some market indices and found that none of trading rules managed to beat the buy and hold benchmark policy on the FTSE and only one rule beat the benchmark on the S&P index, between 1990 and 2013. Some of the rules howeverexplained market dynamics better than a simple random walk with a drift model, for S&P and DAX. For the DAX index 7 out of the 100 tested trading rules managed to beat the benchmark; the Reality Check p-value was however too large to reject the null that the apparent superiority of the best model was due to data mining. The large p-value was an indication that unintended data mining might be partially responsible with the apparently superior performance of some trading rules; it could also be the case that the Reality Check test for data snooping was in fact too weak in order to reject the hypothesis of data mining. Hansen (2005) improved the Reality Check by studentizing the statistic and by invoking a sample dependent null distribution, thus reducing the influence of erratic forecasts. It is this methodology that we use in this paper in a slightly modified form to check whether the assumption of superior performance for some trading rules is not seriously undermined in the presence of data mining. We provide a detailed description of a bootstrap implementation of the Superior Predictive Ability test and its results applied on recent data, on fourmarket indices: FTSE, S&P, DAX and Athex Composite. The remainder of the paper is organized as follows. In Section 2, the data and methodology are presented. The empirical results are reported in Section 3, while Section 4 concludes the study. 2 Data and method Daily index closing values are obtained for the four markets indices mentioned above: FTSE100 in United Kingdom, S&P500 in United States, DAX in Germany and Athex Compos in Greece. The data starts with November 26 th, 1990 and ends with May 17 th, 2013. The length of the samples is different for each of the markets, depending on the actual number of trading days. We follow White s notation on testing whethera financial market trading strategy yields returns superior to a benchmark by taking, where and are signal functions with two permissible values (0 and 1)that convert indicators or and parameters or into market position.we then compute the average excess returns for 100 trading rules of our choice and we use Simple Moving Averages, defined bythe following two parameters and : ISBN: 978-960-474-325-4 145
, that represents the length of the short and, respectively, the long moving average corresponding to the SMA(, ) strategy.more details on exactly the same implementation of the Simple Moving Average (SMA) strategies can be found on Anghel and Tudor (2013). We only note the fact that the best performing strategies for FTSE was SMA (10,500), for S&P was SMA (10,500) and for DAX was SMA (1,300). A short summary with the best trading strategies and the results of simple bootstrapping and Reality check tests for them is presented in Table 1: Table 1 Best SMA strategies and their corresponding p- values Best rule bootstrap p value Reality Check FTSE SMA(10,500) 0.21 0.62 S&P SMA(10,500) 0.04 0.21 DAX SMA(1,300) 0.05 0.22 As of this point, none of the 100 trading rules that we have tested seems to work on empirical data after we take into account data snooping, be it erroneously introduces by researchers or by a survivorship bias. However, since we have obviously tried to chose our coefficients randomly, and since for the DAX index a rather large number of strategies historically outperformed the benchmark (7 out of 100), we are inclined to believe that the Reality Check might incorrectly throw away some otherwise good investment strategies, which leads us to compute Hansen s Test for Superior Predictive Ability, as follows. We start by re-sampling the original return series and chain-linking them back to a pseudo-time series, by keeping the same fixed initial price. The re-sampling scheme that we use is a simple n-1 sampling with replacement, where n is the length of the original price series (there is no return corresponding to the first price, hence there are only n-1 returns to be re-sampled). Our re-sampling scheme is different than that used in Hansen (2005) which employed the stationary bootstrap of Politis and Romano (2005), or the recommended implementation of block bootstrap of Künsch (1989). Instead we rely on Sullivan, Timmerman and White (1999) that found little sensitivity to the choice of q (the smoothing parameter for the two schemes)and we continue by using a standard re-sampling with replace.the result of the selection step, the boostrapp p-value and the Reality Check p- value are provided for comparison purposes intable 1 and are detailed in Anghel and Tudor (2013). The test statistic that we want to compute requires the estimate of, with k =1,, m and m is thenumber of strategies that we test (100).We implement the earlier version of the estimator described by Hansen (2005) and make sure we use a large number of iterations for the bootstrap process (1000), thus obtaining an estimator that is consistent for the true variance as in Goncalves and de Jong (2003): The studentized test statistic becomes: We seek the distribution of the test statistic under the null hypothesis so we impose the null as described in Hansen (2005) by recentering around where, and denotes the indicator function and we also provide lower. We also provide lower and upper bounds for the corresponding p-values so we define where We will approximate the distribution of the test statistic under the null by the empirical ISBN: 978-960-474-325-4 146
distribution obtained from the bootstrapped : we calculate and thus the 3 bootstrap p-values are given by: where the null hypothesis ( Evidence about superior performance might be the resultsof data mining ) is rejected for small p-values. 3 Empirical results The inclusion of Athex Compos Index resulted in the discovery of some very interesting empirical properties of the returns associated with SMA strategies. The most notable finding is that ALL of the 100 strategies that we have tested manage to beat the buy & hold strategy for the Greek market. In Table2we present the annualized daily returns in excess of benchmark for all the tested strategies, for Athex Compos. Table2 Annualized returns in excess of buy & hold for Athex Compos (1 st column short MA, 1 st row long MA) 1 25 50 75 100 150 200 250 300 350 500 36 28 21 23 18 17 14 15 13 11 2 29 26 18 22 18 18 14 15 12 11 3 27 24 17 21 16 15 14 14 12 11 4 25 21 16 20 16 18 12 14 11 13 5 24 19 14 20 14 17 13 14 12 12 6 20 19 15 19 14 18 14 14 14 12 7 19 17 14 20 14 19 13 12 13 12 8 16 18 15 18 14 17 12 12 12 13 9 13 19 17 18 13 17 10 13 13 13 10 13 21 19 18 13 16 10 11 13 13 tested. This strategy would have been able to provide 36% annually above the market since November 1999. This strong result, coupled with the fact that all the tested strategies proved to be profitable, would entitle us to expect that data mining bias to be strongly ruled out. Indeed, both the Reality Check and the newly applied SPA test manage to reject the null hypothesis, as can be seen from Table3which summarizes the empirical results of our study for the four markets. Table3 Excess returns and test results for the four indices Index Best rule Excess return (%) p value Reality Check FTSE SMA(10,500) -2.05.21.62.99 S&P SMA(10,500) 0.46.04.21.20 DAX SMA(1,300) 1.05.22.21 ATC SMA(1,25) 36.1.00.00.00 SPA In these four cases, we can see that the SPA test is able to provide stronger evidence than the Reality Check, with a larger value for the obviously random results (.99 vs..62 for the FTSE index) and a slightly smaller value for the weaker results of the Reality Check (.20 vs..21 for S&P, and.21 vs..22 for DAX). Both tests manage to reject the null for the Greek Market strongly suggesting that the superior performance of the SMA rules has nothing to do with data mining. We are not able however to reject the null ( Superior performance might be related to data mining ) for the historically best performing rules for the S&P and DAX indices even after we employ the stronger SPA test, although we manage to obtain slightly improved (lower) p-values. The lower and upper bounds for the computed SPA values are presented in Tabel4and give additional insights related to the strategies included in our study. Table4 SPA test and lower and upper bounds Annualized returns were derived using: Index Best rule Lower bound SPA Upper bound where the fraction captures the average percent of trading days for the period. As can be seen from Table2 the best strategy for Athex Compos (the one yielding the highest excess return) is the SMA (1,25) the fastest among the 100 strategies that we have FTSE SMA(10,500).99.99.99 S&P SMA(10,500).20.20 0.95 DAX SMA(1,300).21.21 0.61 ATC SMA(1,25).00.00.00 ISBN: 978-960-474-325-4 147
The lower and upper bounds for the SPA test further support our findings about FTSE and Athex Compos. However, the larger discrepancies between the lower and the upper limit for the SPA test computed for DAX and S&P suggest that the result might be influenced by the inclusion of poor performing models, and this is more of a concern for the S&P family of strategies than for those applied on DAX. 4 Conclusions We have attempted to reject the null hypothesis of data mining for 100simple trading strategies applied on three mature markets and Greeceby using an adapted version of Hansen s test for Superior Predictive Ability. We have provided a detailed description of our bootstrap implementation of the test.the results stronger support our previous finding, that there might be other models (for example, random walk with a drift) that fit the FTSE index better than the SMA rules; and that data mining could not be ruled out as a possible cause for any apparent superior performance of the rules that we have tested on the FTSE. We could not reject however the null hypotheses of data mining for the performance of the SMA rules on S&P and DAX, although we did manage to find lower p-value by applying the SPA test. Thus we are forced to conclude that unintended data mining could have influenced our finding for the two markets: their apparent predictive ability could have been the result of survivorship bias manifested amongst trading rules. Of course, there is also a possibility that the SPA testpenalizes to harsh our tested strategies,though is less susceptible of type II errorsthan White s Reality Check. Surprisingly, all the SMA rules seem to exploit some inefficiencies of a less mature market as is the case of Athena Stock Exchange, and all the tests that we have applied, including the SPA test, suggests that this is not the spurious result of data mining. Consequently, we are more confident that the rules we have tested are able to behave remarkably different when applied to different markets. We suspect that the inclusion of dividends in an index makes it more suitable for technical trading strategies (as it is DAX in comparison with S&P or FTSE) but also that the maturity of a market might influence the profitability of these strategies (as they seem to behave remarkably better on Athens Stock Exchange as compared to the other three analyzed indices). Of course, remains to be seen which of the actual traits of a mature market makes it in fact un-exploitable for such simple strategies. Acknowledgements This research was supported by CNCS- UEFISCDI, Project number IDEI 303, code PN-II-ID-PCE-2011-3-0593 References: [1] Alexander, S. (1961) Price Movements in Speculative Markets: Trends or Random Walks, Industrial Management Review, 2:2, 7-26 [2] Anghel, A. and Tudor, C. (2013) The profitability of technical trading rules: empirical application on mature stock markets, 8th Annual London Business Research Conference Proceedings [3] Brock, W., Lakonishok, J. and LeBaron, B. (1992) Simple Technical Trading Rules and the Stochastic Properties of Stock Returns, Journal of Finance, Volume 47, Issue 5, 1731-1764 [4] Fama,E. and Blume, M. (1966) Filter Rules and Stock Market Trading, The Journal of Business, Volume 39, Issue 1, Part 2, 226-241 [5] Goncalves, S. and R. de Jong (2003) Consistency of the Stationary Bootstrap under Weak Moment Conditions, Economics Letters, 81, 273 278. [6] Hansen, P. (2005) A Test for Superior Predictive Ability, Journal of Business & Economic Statistics, 23:4, 365-380 [7] Künsch, H.R. (1989) The Jackknife and the Bootstrap for General Stationary Observations, Annals of Statistics, Volume 17, Number 3, 1217-1241 [8] Sullivan R., Timmermann A. and White H. Dangers of Data-Driven Inference: The Case of Calendar Effects in Stock Returns, University of California San Diego Discussion Paper 98-31 [9] Tudor, C., (2008), An empirical study on risk-return tradeoff using GARCH-class models: evidence from Bucharest Stock Exchange, Proceedings of the ICBE International Conference, ISBN: 978-960-474-325-4 148
SpiruHaretUniversitaty, Constanta, Ed. Muntenia [10] White, H. (2000) A Reality Check for Data-Snooping, Econometrica, vol. 68, No. 5, 1097 1126 ISBN: 978-960-474-325-4 149