Linear and Nonlinear Trading Models with Gradient Boosted Random Forests and Application to Singapore Stock Market *



Similar documents
Modified Line Search Method for Global Optimization

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

CHAPTER 3 THE TIME VALUE OF MONEY

PSYCHOLOGICAL STATISTICS

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

1 Correlation and Regression Analysis

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

INVESTMENT PERFORMANCE COUNCIL (IPC)

Lesson 17 Pearson s Correlation Coefficient

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

How to read A Mutual Fund shareholder report

Trading rule extraction in stock market using the rough set approach

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

LECTURE 13: Cross-validation

Hypothesis testing. Null and alternative hypotheses

Volatility of rates of return on the example of wheat futures. Sławomir Juszczyk. Rafał Balina

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Output Analysis (2, Chapters 10 &11 Law)

Soving Recurrence Relations

Evaluating Model for B2C E- commerce Enterprise Development Based on DEA

Lesson 15 ANOVA (analysis of variance)

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

Department of Computer Science, University of Otago

I. Chi-squared Distributions

A Fuzzy Model of Software Project Effort Estimation

Determining the sample size

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

5 Boolean Decision Trees (February 11)

The Forgotten Middle. research readiness results. Executive Summary

1 Computing the Standard Deviation of Sample Means

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Installment Joint Life Insurance Actuarial Models with the Stochastic Interest Rate

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Institute of Actuaries of India Subject CT1 Financial Mathematics

Domain 1: Designing a SQL Server Instance and a Database Solution

Subject CT5 Contingencies Core Technical Syllabus

BENEFIT-COST ANALYSIS Financial and Economic Appraisal using Spreadsheets

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

5: Introduction to Estimation

Bond Valuation I. What is a bond? Cash Flows of A Typical Bond. Bond Valuation. Coupon Rate and Current Yield. Cash Flows of A Typical Bond

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Review: Classification Outline

Chapter 7: Confidence Interval and Sample Size

Hypergeometric Distributions

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Forecasting. Forecasting Application. Practical Forecasting. Chapter 7 OVERVIEW KEY CONCEPTS. Chapter 7. Chapter 7

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

JJMIE Jordan Journal of Mechanical and Industrial Engineering

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Confidence Intervals for One Mean

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

1. C. The formula for the confidence interval for a population mean is: x t, which was

Estimating Probability Distributions by Observing Betting Practices

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

CHAPTER 3 DIGITAL CODING OF SIGNALS

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Systems Design Project: Indoor Location of Wireless Devices

HCL Dynamic Spiking Protocol

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

FM4 CREDIT AND BORROWING

Now here is the important step

Chapter 7 Methods of Finding Estimators

ADAPTIVE NETWORKS SAFETY CONTROL ON FUZZY LOGIC

Investing in Stocks WHAT ARE THE DIFFERENT CLASSIFICATIONS OF STOCKS? WHY INVEST IN STOCKS? CAN YOU LOSE MONEY?

A probabilistic proof of a binomial identity

Lecture 2: Karger s Min Cut Algorithm

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

A Guide to the Pricing Conventions of SFE Interest Rate Products

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Ordinal Classification Method for the Evaluation Of Thai Non-life Insurance Companies

France caters to innovative companies and offers the best research tax credit in Europe

THE ROLE OF EXPORTS IN ECONOMIC GROWTH WITH REFERENCE TO ETHIOPIAN COUNTRY

THE ACCURACY OF SIMPLE TRADING RULES IN STOCK MARKETS

Convention Paper 6764

Measures of Spread and Boxplots Discrete Math, Section 9.4

Statistical inference: example 1. Inferential Statistics

Learning objectives. Duc K. Nguyen - Corporate Finance 21/10/2014

Reliability Analysis in HPC clusters

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Page 1. Real Options for Engineering Systems. What are we up to? Today s agenda. J1: Real Options for Engineering Systems. Richard de Neufville

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Research Article Sign Data Derivative Recovery

ODBC. Getting Started With Sage Timberline Office ODBC

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Time Value of Money. First some technical stuff. HP10B II users

Transcription:

Joural of Itelliget Learig Systems ad Applicatios, 2013, 5, 1-10 http://dx.doi.org/10.4236/jilsa.2013.51001 Published Olie February 2013 (http://www.scirp.org/joural/jilsa) 1 Liear ad Noliear radig Models with Gradiet Boosted Radom Forests ad Applicatio to Sigapore Stock Market * Qi Qi, Qig-Guo Wag, Ji Li, Shuzhi Sam Ge Departmet of Electrical ad Computer Egieerig, Natioal Uiversity of Sigapore, Sigapore, Sigapore. Email: elewqg@us.edu.sg Received October 1 st, 2012; revised November 6 th, 2012; accepted November 13 th, 2012 ABSRAC his paper presets ew tradig models for the stock market ad test whether they are able to cosistetly geerate excess returs from the Sigapore Exchage (SGX). Istead of covetioal ways of modelig stock prices, we costruct models which relate the market idicators to a tradig decisio directly. Furthermore, ulike a reversal tradig system or a biary system of buy ad sell, we allow three modes of trades, amely, buy, sell or stad by, ad the stad-by case is importat as it caters to the market coditios where a model does ot produce a strog sigal of buy or sell. Liear tradig models are firstly developed with the scorig techique which weights higher o successful idicators, as well as with the Least Squares techique which tries to match the past perfect trades with its weights. he liear models are the made adaptive by usig the forgettig factor to address market chages. Because stock markets could be highly oliear sometimes, the Radom Forest is adopted as a oliear tradig model, ad improved with Gradiet Boostig to form a ew techique Gradiet Boosted Radom Forest. All the models are traied ad evaluated o ie stocks ad oe idex, ad statistical tests such as radomess, liear ad oliear correlatios are coducted o the data to check the statistical sigificace of the iputs ad their relatio with the output before a model is traied. Our empirical results show that the proposed tradig methods are able to geerate excess returs compared with the buy-ad-hold strategy. Keywords: Stock Modelig; Scorig echique; Least Square echique; Radom Forest; Gradiet Boosted Radom Forest 1. Itroductio No matter how difficult it is, how to forecast stock price accurately ad trade at the right time is obviously of great iterest to both ivestors ad academic researchers. he methods used to evaluate stocks ad make ivestmet decisios fall ito two broad categories fudametal aalysis ad techical aalysis [1]. Fudametalist examies the health ad potetial to grow a compay. hey look at fudametal iformatio about the compay, such as the icome statemet, the balace sheet, the cash-flow statemet, the divided records, the ews release, ad the policies of the compay. echical aalysis, i cotrast, does t study the value of a compay but the statistics geerated by market motios, such as historical prices ad volumes. Ufortuately oe of * his paper is a exteded versio of paper: Wag Qig-Guo, Ji Li, Qi Qi Shuzhi Sam Ge, Liear, Adaptive ad Noliear radig Models for with Radom Forests, he 9th IEEE Iteratioal Coferece o Cotrol & Automatio, Satiago, Chile, December 19-21, 2011. them will always work due to the certai radomess ad o-statioary behavior of the market. Hece, a icreasig umber of studies have bee carried out i effort to costruct a adaptive tradig method to suit the ostatioary market. Soft Computig is becomig a popular applicatio i predictio of stock price. Soft computig methods exploit quatitative iputs, such as fudametal idicators ad techical idicators, to simulate the behavior of stock market ad automate market tred aalysis. Oe method is Geetic Programmig GA. Dempster ad Joes [2] use GP ad build a profitable tradig system that cosists of a combiatio of multiple idicator-based rules. Aother popular approach is Artificial Neural Network (ANN). Geerally, a ANN is a multilayered etwork cosistig of the iput layer, hidde layer ad output layer. Oe of the well-recogized eural etwork models is the Back Propagatio (BP) or Hopfield etwork. Sutheebajard ad Premchaiswadi [3] use a BPNN model to forecast the Stock Exchage of hai-

2 Liear ad Noliear radig Models with Gradiet Boosted Radom Forests ad Applicatio to lad Idex (SE Idex) ad experimets show that the systems work with less tha 2% error. Vicezo ad Michele [4] compare differet architectures of ANN i predictig the credit risk of a set of Italia maufacture compaies. hey coclude that it is difficult to produce cosistet results based o ANNs with differet architectures. Vicezo ad Vitoatoio [5] use ANN model based o geetic algorithm to forecast the tred of a Forex pair (Euro/USD). he experimets show that the traied ANN model ca well predict the exchage rate of three days. Vicezo [6] compare ANN model with ARCH model ad GARCH model i predictig exchage rate of Euro/USD. It is foud that ARCH ad GARCH models ca produce better results i predictig the dyamics of Euro/USD. A extesio of ANN is Adaptive Neuro-Fuzzy Iferece System (ANFIS) [7]. ANFIS is a combiatio of eural etworks ad fuzzy logic, which, itroduced by Zadeh [8], is able to aalyze qualitative iformatio without precise quatitative iputs. Agrawal, Jidal ad Pillai [9] propose mometum aalysis based o stock market predictio usig ANFIS ad the success rate of their system is aroud 80% with the highest 96% ad lowest 76%. I additio, some weight adjustmet techiques are also developed to select tradig rules adaptively. Begtsso ad Ekma [10] discuss three weighig techiques, amely Scorig techique, Least-Squares echique ad Liear Programig echique. All the three techiques are used to dyamically adjust the weights of the rules so that the system will act accordig to the rules that perform best at the momet. he model is tested o Stockholm Stock Exchage i Swede. As a result, it is able to cosistetly geerate risk-adjusted returs. Istead of fidig out the weightig of predictors, Decisio ree [11] adopts a alterative approach by idetifyig various ways of splittig a data set ito brachlike segmets. Decisio ree is further improved by the Radom Forest algorithm [12], which cosists of multiple decisio trees. Radom forest is prove to be efficiet i prevetig over-fittig. Maragoudakis ad Serpaos [13] apply radom forest to stock market predictios. he portfolio budget for their proposed ivestig strategy outperforms the buy-ad-hold ivestmet strategy by a mea factor of 12.5% to 26% for the first 2 weeks ad from 16% to 48% for the remaiig oes. Besides soft computig methods which use past data to forecast future tred, there is a alterative area of study which does t predict the future at all but oly follow the historical tred. hat is kow as red-followig (F). Fog ad ai [14] desig a F model with both static ad adaptive versios. Both yield impressive results: mothly ROI is 67.67% ad 75.63% respectively for the two versios. his paper presets various tradig models for the stock market ad test whether they are able to cosistetly geerate excess returs from the Sigapore Exchage (SGX). Istead of covetioal ways of modelig stock prices, we costruct models which relate the market idicators to a tradig decisio directly. Furthermore, ulike a reversal tradig system or a biary system of buy ad sell, we allow three modes of trades, amely, buy, sell or stad by, ad the stad-by case is importat as it caters to the market coditios where a model does ot produce a strog sigal of buy or sell. Liear tradig models are firstly developed with the scorig techique which weights higher o successful idicators, as well as with the Least Squares techique which tries to match the past perfect trades with its weights. he liear models are the made adaptive by usig the forgettig factor to address market chages. Because stock markets could be highly oliear sometimes, the decisio trees with the Radom Forest method are fially employed as a oliear tradig model [15]. Gradiet Boostig [16,17] is a techique i the area of machie learig. Give a series of iitial models with weak predictio power, it produces a fial model based o the esemble of those models with iterative learig i gradiet directio. We apply Gradiet Boostig to each tree of Radom Forest to form a ew techique Gradiet Boosted Radom Forest for performace ehacemet. All the models are traied ad evaluated o ie stocks ad oe idex, ad statistical tests such as radomess, liear ad oliear correlatios are coducted o the data to check the statistical sigificace of the iputs ad their relatio with the output before a model is traied. Our empirical results show that the proposed tradig methods are able to geerate excess returs compared with the buy-ad-hold strategy. he rest of this paper is orgaized as: the iput selectio is preseted i Sectio 2. I Sectio 3, we describe the liear model ad adaptatio. he decisio tree ad radom forest are give i Sectio 4. he ew method gradiet boosted radom forest is preseted i Sectio 5. he Sectio 6 shows the simulatio studies ad the coclusio is i the last sectio. 2. Iput Selectio 2.1. Basic Rules he model is built upo a fixed set of basic rules. All the rules use historical stock data as iputs ad geerate buy, sell or o actio sigals. Rules [18] adopted i this paper are some popular, wide-recogized ad published rules as summarized below. Simple Movig Average (SMA). SMA is the average price of a stock over a specific period. It smoothes out the day-to-day fluctuatios of the price ad gives a clearer view of the tred. A buy sigal is geerated if the SMA chages directio from dowwards to

Liear ad Noliear radig Models with Gradiet Boosted Radom Forests ad Applicatio to 3 upwards, ad vice versa. I this paper, 5-day, 10-day, 20-day, 30-day, 60-day ad 120-day SMAs are studied. Expoetial Movig Average (EMA). EMA is similar to SMA, but EMA assigs higher weight to the more recet prices. hus, EMA is more sesitive to recet prices. he expressio is: 2 EMAm i P close i EMAmi 1 m 1 (1) EMA i 1 m where m is the umber of days i oe period, Pclose i is the closig price of day i, EMAm i is the m-day EMA of day i, EMA m i 1 is the m- day EMA of the previous day. Movig Average Covergece/Divergece (MACD). MACD measures the differece betwee a fast (shortperiod) ad slow (log-period) EMA. Whe the MACD is positive, i other words, whe the shortperiod EMA crosses over the log-period EMA, it sigals upward mometum, ad vice versa. wo sigal lies are used zero lie ad 9-day SMA. A buy sigal is geerated if the MACD moves up ad crosses over the sigal lie, ad vice versa. he expressio is: MACD i EMA i EMA i (2) where m. he most commo movig average values used i the calculatio are the 26-day ad 12- day EMA, so m = 12 ad = 26. Relative Stregth Idex (RSI). RSI sigals overbought ad oversold coditios. Its value rages from 0 to 100. A readig below 30 suggests a oversold coditio ad thus a buy sigal is geerated; a readig above 70 usually suggests a overbought coditio ad thus a sell sigal is geerated; take o actio, otherwise. Stochastics Oscillator. Stochastics Oscillator is a wellrecogized mometum idicator. he idea behid this idicator is that price close to the highs of the tradig period sigals upward mometum i the security, ad vice versa. It s also useful to idetify overbought ad oversold coditios. It cotais two lies the %K lie ad the %D lie. he latter is a 3-day SMA of the former. Its value rages from 0 to 100. A buy sigal is geerated if the %D value is below 20; a sell sigal is geerated if the %D value is above 80; take o actio, otherwise. Bolliger Bads. Bolliger Bads are used to idetify overbought ad oversold coditios. It cotais a ceter lie ad two outer bads. he ceter lie is a expoetial movig average; the outer bads are the stadard deviatios above ad below the ceter lie. m he stadard deviatios measure the volatility of a stock. It sigals oversold whe the price is below the lower bad ad thus a buy sigal is geerated; it sigals overbought whe the price is above the upper bad ad thus a sell sigal is geerated; take o actio, otherwise. Accumulatio/Distributio Lie. A/D lie measures volume ad the flow of moey of a stock. May times before a stock advaces, there will be period of icreased volume just prior to the move. Whe the A/D Lie forms a positive divergece whe the idicator moves higher while the stock is decliig it gives a bullish sigal. herefore, a buy sigal is geerated whe the idicator icreases while the stock is decliig; a sell sigal is geerated whe the idicator decreases while the stock is risig; take o actio, otherwise. he expressio is: AD Pclose i Plow i Phigh i Pclose i (3) V i P i P i high low where Plow i is the lowest price of day i, Phigh i is the highest price of day i, Pope i is the ope price of day i, ad V i is the tradig volume of day i. 2.2. Statistics ests Statistical tests are carried out for iput selectio. Firstly, it is useful to test whether the iput data are radom or ot. If the iput data are radom, it is rather difficult to make predictios based o radom data. Secodly, it is ecessary to test whether the iput data are related to the output data. If a correlatio betwee iput data ad output data caot be foud, the output of the model may be radomly geerated istead of beig a result of the iput data. herefore, oly whe the iputs are statistically sigificat, they are used to build models. hree statistical tests are itroduced the Wilcoxo- Ma-Whitey test for the radomess of sigs, the Durbi-Watso est for liear iteractio effect betwee iput ad output, ad the Spearma s correlatio [19] for oliear correlatio betwee iput ad output. he Wilcoxo-Ma-Whitey est. his test performs a test of the ull hypothesis that the occurrece of + ad sigs i a data set is ot radom, agaist the alterative that the + ad sigs comes i radom sequeces. Mathematically, the test works i the followig way. Let 1 be the umber of or + sigs, whichever is larger ad 2 be umber of opposite sigs. N = 1 + 2. From the order/idices of the sigs i the data sequece, the rak sum R of the smallest umber of sigs is determied. Let R =

4 Liear ad Noliear radig Models with Gradiet Boosted Radom Forests ad Applicatio to 2 N 1 R. he miimum of R ad R is used as the test statistics. I this project, it is desired to test whether the iput data are radom or ot. If iput data are already radom, it is rather difficult to make predictios based o radom iput. Note that the test oly takes care of + ad sigs. hus, for ay sequece of data, a sequece of + ad sigs ca be easily geerated by takig the differece of two cosecutive data poits. Durbi-Watso est. his test is used to check if the residuals are ucorrelated, agaist the alterative that there is autocorrelatio amog them. his test is based o the differece betwee adjacet residuals ad is give by d e 2 t et 1 t 2 2 et t 1 where e t is the regressio residual for period t, ad is the umber of time periods used i fittig the regressio model. Correlatios are useful because they ca idicate a predictive relatioship that ca be exploited i practice. I this project, it is desired to fid the iteractive effect of the iput data R o the output data Y. he error terms e t are the residuals of the liear regressio of the resposes i Y o the predictors i R. Spearma s Correctios. Previously, the Durbi-Watso test is used to measure the stregth of the liear associatio betwee two variables. However, it is less appropriate whe the poits o a scatter graph seem to follow a curve or whe there are outliers (or aomalous values) o the graph [19]. Spearma s correlatio coefficiet measures oliear correlatio. o fid the Spearma s correlatio coefficiet of two data sequeces X ad Y, first rak each data vector i either ascedig or descedig order. he calculate the Spearma s correlatio coefficiet by the followig formula: 2 6 d rs 1 (5) 2 1 (4) where is the umber of data pairs, ad d = rak x i - rak y i. 3. Liear Model ad Adaptatio 3.1. radig Method A fixed set of the basic rules are used as the iputs to our models. Each basic rule returs a iteger to represet its recommeded tradig actios 1 represets a buy sigal; 1 represets a sell sigal; ad 0 represets o actio. he model works i two steps traiig ad testig. Durig the traiig period, each rule is evaluated o a set of the kow historical data ad assiged a certai weight usig some rule-weightig algorithms. wo weightig algorithms are proposed ad described later. After traiig is the testig period, durig which the model predicts tradig decisios based o ukow data ad its performace is evaluated. O a daily basis, the model collects tradig recommedatios 1 or 1 or 0 geerated by those rules. hese recommedatios are the fed ito a weightig filter. I the weightig filter, each rule s recommedatio is multiplied by its correspodig weight, which is previously determied by the traiig process. Fially, the weighted rules are summed up as a overall weighted recommedatio. he greater the value of the sum, the stroger the recommedatio is. he weighted recommedatio is iterpreted as buy, or sell, or o actio depedig o the stregth of the recommedatio ad whether the stock is curretly owed or ot. Mathematically, 1 1 2 2 3 3 i i i i i s rw r w rw rw r w rw (6) where s is the fial tradig decisio, r i is the output value of each rule ( r i takes the value 1, or 1, or 0), w i is the adjusted weight of each rule obtaied from the traiig period. A larger s idicates a stroger sigal ad thus the system is more cofidet about the derived tradig decisio. herefore, oly whe the resultig sigal is above a certai threshold, it is regarded as a buy sigal; similarly, oly whe it is below a certai threshold, it is regarded as a sell sigal; if it is betwee the upper ad lower thresholds, o actio is take. 3.2. Rule-Weightig Algorithms At the begiig all weights are equal with a value of 0. As already metioed, the weight of each rule is adjusted durig traiig. he higher the weight, the stroger the tradig recommedatio geerated by the rule. After adjustig the weights, bad rules will ed up with small or eve egative weights. wo rule-weightig algorithms the scorig ad the least-squares techiques are described as follows. Scorig echique: he scorig techique grades the rules accordig to how well they have performed. Durig the traiig period wheever a rule gives a buy or sell recommedatio, the weight of the rule is adjusted weight is icreased by 1 if the recommedatio is correct; otherwise, weight is decreased by 1. No actio sigal does t trigger weight adjustmet. Here, a correct buy recommedatio

Liear ad Noliear radig Models with Gradiet Boosted Radom Forests ad Applicatio to 5 refers to the situatio that the close price ext day is a certai amout higher tha the close price today whe the recommedatio is give; while a correct sell recommedatio is similarly defied; ad if oe of the above is true, o recommedatio is give. Least-Squares echique: he secod rule-weightig algorithm is the least-squares techique with the liear regressio. Firstly, whe it is kow that the price is icreased or decreased the ext day durig the traiig period, it is also kow what tradig actio should be take o that day. Secodly, the tradig recommedatios of each rule are also kow. he least-squares techique is a algorithm that fids a set of weights that maps the tradig recommedatios by all the rules to the correct tradig actios to be take. Mathematically, if there are rules ad m traiig days i total, r1,1 r2,1 r,1 r1,2 r2,2 r,2 R r1, m r2, m r, m W w1 w2 w S s1 s2 sm where R is a m matrix that stores the recommedatios r i,j give by each rule every day, W is a 1 matrix that stores the weight of each rule w j. S is a m 1 matrix that stores the correct tradig decisio s i which should be take every day. s i is determied by the followig reasoig. O day i, 1) If price icreases more tha a threshold the ext day, the the correct sigal is buy ad thus s i = 1. 2) If price decreases more tha a threshold the ext day, the the correct sigal is sell ad thus s i = 1. 3) If price oly fluctuates withi more tha a threshold, the correct sigal is o actio ad thus s i = 0. Ideally, S = RW. here are as may equatios as the umber of days i the traiig period, ad ukow variables the weights for the rules. With S ad R kow, if W ca be solved, the W is the desired weights that ca give correct sigals. However, there are more traiig days tha the umber of rules ad thus more equatios tha ukow variables, so the solutio is over-specified. hus, the objective ow becomes to miimize the error e = S RW. Defie the cost fuctio as E w 1e e 1SRW SRW (7) 2 2 At the optimal poit w *, hus, * E w R SRW 0 (8) 1 W * R R R S (9) Hece, the least-squares techique fids the optimal weights to match the correct tradig decisios as much as possible. he Least-Squares techique ca be further improved by the itroducig oliear terms ad forgettig factors. First, ay two of the rules are combied together by multiplicatio to form a ew rule the o-liear term. Now the tradig decisio s becomes s rw 1 1r2w2 rw (10) rr w r r w 1 2 12 1 1, By itroducig the forgettig factor, as its ame suggests, more recet data have more ifluece o the fial decisio. Besides, this adaptive model is implemeted by usig the Recursive Least Squares algorithm, which updates the weight o a daily basis. Hece, Recursive Least Squares method does t require the etire traiig data set for each iteratio. Every day, a ew set of data comes i ad the weights are updated based o the ew set of data. he calculatio is summarized below. Give data set R1, R2,, Rm ad s1, s2,, sm, 1, Iitialize W0 0, P0 I 2, For each time istat, k 1,2,, m, compute ek sk Wk 1Rk Pk 1Rk Gk RP k k1rk 1 Pk Pk1GkRkPk1 W k W G k 1 ek k where is the forgettig factor with a value ragig from 0 to 1. Here is set to be 0.95 i this paper. 4. Decisio ree ad Radom Forest 4.1. Decisio ree Ulike the previous tradig models, which derive the fial tradig decisio by summig up the weighted rules, decisio tree [2] adopts a differet approach to arrive at the fial decisio. As its ame suggests, decisio tree idetifies various ways of splittig a data set ito brachlike segmets (Figure 1). hese segmets form a tree-like structure that origiates with a root ode at the top of the tree with the etire set of data. A termial ode, or a leaf, represets a value of the output variable give the values of the iput variables represeted by the path from the root to the leaf. A simple decisio tree is illustrated below. Growig the tree: he goal is to build a tree that dis-

6 Liear ad Noliear radig Models with Gradiet Boosted Radom Forests ad Applicatio to tiguishes a set of iputs amog the classes. he process starts with a traiig set for which the target classificatio is kow. For this project, we wat to make tradig decisios based o various idicators. I other words, the tree model should be built i a way that distiguish each day s rule data to either buy, sell or o actio. he iput data set is split ito braches. o choose the best splittig criterio at a ode, the algorithm cosiders each iput variable. Every possible split is tried ad cosidered, ad the best split is the oe which produces the largest decrease i diversity of the classificatio label withi each partitio. he process is cotiued at the ext ode ad, i this maer, a full tree is geerated. Pruig the tree: For a large data set, the tree ca grow ito may small braches. I that case, pruig is carried out to remove some small braches of the tree i order to reduce over-fittig. Durig the traiig process, the tree is built startig from the root ode where there is plety of iformatio. Each subsequet split has smaller ad less represetative iformatio with which to work. owards the ed, the iformatio may reflect patters that are oly related to certai traiig records. hese patters ca become meaigless ad sometimes harmful for predictio if you try to exted rules based o them to larger populatios. herefore, pruig effectively reduces overfittig by removig smaller braches that fail to geeralize. 4.2. Radom Forest Radom forest [7], as its ame suggests, is a esemble of multiple decisio trees. For a give data set, istead of oe sigle huge decisio tree, multiple trees are plated as a forest. Radom forest is prove to be efficiet i prevetig over-fittig. Oe way of growig such a esemble is baggig [21]. he mai idea is summarized below: Give a traiig set D = {y, x}, oe wats to make predictio of y for a observatio of x. 1) Radomly select observatios from D to form Sample B-data sets. 2) Grow a Decisio ree of each B data set. 3) For every observatio, each tree gives a classificatio, ad we say the tree votes for that class. he fial output classifier is the oe with the highest votes. I the process of growig a tree (step 2 above), radomly select a fixed-size subset of iput variables istead of usig all the iput variables. he best split is used to split the tree. For example, if there are te iput variables, oly select three of them at radom for each decisio split. Each tree is grow to the largest extet possible without pruig. Radom selectio of features reduces Root Node otal data set Splittig criterio Node 1 Node 2 Splittig Segmet formed by splittig rule criterio Node 3 Node 4 Figure 1. Decisio tree. correlatio betwee ay two trees i the esemble ad thus icreases the overall predictive power. I this project, the values of techical idicators are used as the iput data set of the radom forest ad the output is the vector of percetages of votes for the tradig actios for 1 (sell), 0 (o actio) or 1 (buy). Compared to other data miig techologies itroduced before, Decisio ree ad Radom Forest algorithms have a umber of advatages. ree methods are oparametric ad oliear. he fial results of usig tree methods for classificatio or regressio ca be summarized i a series of logical if-the coditios (tree odes). herefore, there is o implicit assumptio that the uderlyig relatioships betwee the predictor variables ad the target classes follow certai liear or oliear fuctios. hus, tree methods are particularly well suited for data miig tasks, where there is ofte little kowledge or theories or predictios regardig which variables are related ad how. Just like stock data, there are great ucertaities ad usure relatioships betwee techical idicators ad the desired tradig decisio. 5. Gradiet Boosted Radom Forest 5.1. Gradiet Boostig Gradiet boostig is a techique i the field of machie learig. his techique was origially used for regressio problems. With the help of loss fuctio, classificatio problems ca also be solved by this techique. his techique works by producig a esemble of models with weak predictio power. Suppose a traiig set of x1, y 1, x 2, y 2,, x, y ad a differetiable loss fuctio, G y f x, ad the iteratio umber M. Firstly, a model is iitialized from: f 0 arg mi Gy, (11) i1 For the p-th iteratio till p = M, calculate: z ip G yi, f xi f xi f x f x i p1 i 1, 2,,. (12)

Liear ad Noliear radig Models with Gradiet Boosted Radom Forests ad Applicatio to 7 Let N be the umber of termial odes of a decisio tree. Correspodigly, the iput space is divided ito N discoected regios by the tree, which are represeted as r 1p, r 2 p,, r Np at the p-th iteratio. he we ca use the traiig data x1, z 1, x 2, z 2,, x, z to trai a basic learer: N p ip ip i1 l x a I xr (13) where aip is the predicted value i regio rip. he approximatio f x ca be expressed as: ad, N 1 f x f x I xr ip (14) p p ip i1 arg mi, ip 5.2. Size of rees x G y fp 1. (15) xrip As metioed above, N represets the umber of leaves of decisio trees. Hastie et al. [22] show that it works well whe 4 N 8 ad the results will ot chage much whe the N is i this rage. N beyod this rage is ot recommeded. 5.3. Gradiet Boosted Radom Forest Firstly, we set the size of the decisio trees ad create a radom forest with equal size of its trees. Next, we set the iteratio umber for gradiet boostig ad apply it to each of the above trees. As a result, the radom forest with each tree boosted is formed ad called a gradiet boosted radom forest. 6. Simulatio Studies 6.1. Data his paper focuses o the Sigapore stock market, ad our simulatio tests ie represetative stocks traded o SGX as well as the Straits imes Idex (SI) as a bechmark as listed i able 1. I order to test how the proposed tradig strategy performs i differet ecoomic sectors, the ie stocks are from various sectors of the Sigapore ecoomy. For all te stocks, a set of five-year historical data ragig from September 1, 2005 to August 31, 2010 are studied, ad the first three years data are used for traiig while the last two years for testig. Historical data are retrieved from Yahoo Fiace (Sigapore). Historical data provided here are daily data icludig daily ope, close, high, low prices ad volume. No pre-processig for chagig the distributio of the classes is performed. I fact, differet stocks have differet distributios of classes. We use the real data to do able 1. List of stocks tested. Compay Symbol Sector Capitalad C31 Properties DBS D05 Fiace UOB U11 Fiace SGX (Sigapore Exchage) S68 Fiace Starhub CC3 Commuicatios Sigtel Z74 Commuicatios Semb Corp U96 Multi-idustry SMR S53 rasport SIA (Sigapore Airlie) C6L rasport SI (Straits imes Idex) SI Idex the experimet. hus, the results are realistic without artificial distortio. ake the stock Sigtel for example: the data have three classes: A, B ad C. he correspodig distributio is A, 11.26%, B, 65.39% ad C, 23.35%. We use the data directly i the experimet ad o chage for the distributio is doe. his is true for all the stocks. 6.2. Statistical ests Results Before actually traiig ay model, statistical tests are coducted. Firstly, Wilcoxo-Ma-Whitey test is applied o historical price, techical idicators ad tradig rules. As a result, the statistics show o-radom data. he the Durbi-Watso test is performed i order to fid the iteractive effect of the iput data o the output data. However, little liear correlatio is observed. hus, Spearma s Correlatio Coefficiets for iput ad output data are calculated ad as a result, some oliear correlatio ca be foud. herefore, it is proved statistically that iput data are ot radom ad iput ad output are oliearly correlated. Now, it is meaigful to evaluate the modelig results. 6.3. Performace Evaluatio A tradig strategy is typically evaluated by comparig to the buy & hold strategy, i which the stock is bought o the first day of the testig period ad held all the time util the last day of testig period. At the begiig of the testig period, assume that we iitially have $10,000 cash o had. At the ed of the testig period, all holdig stocks are sold for cash. I order to replicate the real world as close as possible, trasactio cost must be take ito cosideratio. I Sigapore, the trasactio cost is aroud 0.34% [23]. o esure fair evaluatio, i the proposed tradig method, cash is ivested ito risk free asset whe it is ot use to buy ay stock [24].

8 Liear ad Noliear radig Models with Gradiet Boosted Radom Forests ad Applicatio to 6.4. Performace est For the techiques of Scorig ad Least-Squares, we test the sesitivity of threshold. It is foud that the performace is ot chaged obviously whe differet threshold values are adopted. However, whe the threshold value is equal to 0.55, most cases ca produce good results. I this regard, we use 0.55 as threshold value to proceed. For the techiques of Radom Forest ad Gradiet Boosted Radom Forest, we set the forest size as 50. Each tree will give a predictio ad the fial output will be based o the majority of the trees votes. We applied each techique to each stock ad obtaied their wealth dyamics. ake the stock Sigtel for example, ad its wealth time resposes over five techiques are show i Figure 2. For ivestmet performace evaluatio, able 2 shows the simulatio results of the proposed tradig methods scorig, least squares, radom forest ad gradiet boosted radom forest agaist the buy & hold strategy. he proposed tradig methods are able to geerate higher returs tha the buy & hold strategy. O average, the gradiet boosted radom forest algorithm yields the highest total retur followed by the radom forest algorithm, scorig techique ad the least-squares techique, while the buy ad hold strategy makes a loss at 1.94%. Despite the positive results, the directioal accuracy of the model is low, ad the percet positive trade is lower tha 50%. From the able 2, we ca see that for the Gradiet Boosted Radom Forest, the total retur of is 25.14%. he performace is slight better tha the performace of Radom Forest. he other two methods which are Score ad Least Square produce worse results. his results show that usig Gradiet Boosted Radom Forest ca help gai more profit i the stock market tha buy & hold strategy. Although the system geerates higher total returs, the directioal accuracy is geerally lower tha 50%. As metioed earlier, directioal accuracy is oly measured for buy ad sell tradig sigals. I additio, there are Performace Measuremets otal Retur (%) Number of rade Best rade (max daily retur %) Worst rade (mi daily retur %) % Positive rade Directio Accuracy (%) Max Drawdow (%) Sharpe Ratio able 2. Compariso results. radig echique Average Buy & Hold 1.94 Score 21.71 Least Square 16.77 Radom Forest 24.32 Gradiet Boosted Radom Forest 25.14 Score 13.30 Least Square 12.20 Radom Forest 19.10 Gradiet Boosted Radom Forest 18.60 Score 0.17 Least Square 0.95 Radom Forest 0.95 Gradiet Boosted Radom Forest 0.95 Score 1.16 Least Square 0.59 Radom Forest 10.62 Gradiet Boosted Radom Forest 9.54 Score 19.12 Least Square 21.97 Radom Forest 26.38 Gradiet Boosted Radom Forest 30.25 Score 24.12 Least Square 35.16 Radom Forest 26.27 Gradiet Boosted Radom Forest 24.87 Buy & Hold 47.34 Score 10.20 Least Square 6.66 Radom Forest 12.05 Gradiet Boosted Radom Forest 10.16 Buy & Hold 0.01 Score 0.02 Least Square 0.03 Radom Forest 0.02 Gradiet Boosted Radom Forest 0.03 Figure 2. Wealth curve. ime Holdig Stocks i Market (%) Buy & Hold 100.00 Score 52.83 Least Square 36.32 Radom Forest 70.05 Gradiet Boosted Radom Forest 66.21

Liear ad Noliear radig Models with Gradiet Boosted Radom Forests ad Applicatio to 9 oly 12 to 17 tradig actios o average for a stock i the two years time ad o most days o actio is take. he tradig frequecy is iflueced by the threshold set for the system. If the threshold is high, oly strog tradig sigals are put ito effect. Furthermore, fewer tha half of the trades geerate positive returs, however, the total retur ca be positive. It is due to the great icrease i several trades. his is also reflected i the maximum drawdow. Sharpe ratio measures the excess retur per uit risk geerated by the proposed tradig method compared to ivestig ito a risk free asset. Some of the te stocks result i egative sharpe ratios. I other words, tradig those two stocks fails to geerate excess returs as compared to risk free asset. I additio to five-year data, te-year data of three stocks are evaluated the traiig period is exteded from two years to eight years while the testig period remais two years. Nevertheless, o obvious improvemet is obtaied. Comparisos betwee the liear model ad the recursive oliear least squares model are also studied, but agai o obvious improvemet is foud. he reaso may be that makig the algorithm too adaptive may over-fit the model. Besides, stock data itself are rather difficult to model. 7. Coclusio his paper presets various tradig models for the stock market ad test whether they are able to cosistetly geerate excess returs from the Sigapore Exchage (SGX). Istead of covetioal ways of modelig stock prices, we costruct models which relate the market idicators to a tradig decisio directly. Furthermore, ulike a reversal tradig system or a biary system of buy ad sell, we allow three modes of trades, amely, buy, sell or stad by, ad the stad-by case is importat as it caters to the market coditios where a model does ot produce a strog sigal of buy or sell. Liear tradig models are firstly developed with the scorig techique which weights higher o successful idicators, as well as with the least squares techique which tries to match the past perfect trades with its weights. Sice stock markets could be highly oliear sometimes, the radom forest method is the employed as a oliear tradig model. Gradiet boostig is a techique i the area of machie learig. Give a series of iitial models with weak predictio power, it produces a fial model based o the esemble of those models with iterative learig i gradiet directio. We apply gradiet boostig to each tree of radom forest to form a ew techique Gradiet Boosted Radom Forest for performace ehacemet. All the models are traied ad evaluated ad statistical tests such as radomess, liear ad oliear correla- tios are coducted o the data to check the statistical sigificace of the iputs ad their relatio with the output before a model is traied. Our empirical results show that the proposed tradig methods are able to geerate excess returs compared with the buy-ad-hold strategy. I particular, for the experimetal data, the gradiet boosted radom forest yields highest total retur with a value of 25.14%, followed by radom forest whose total retur is 24.32%, while the buy ad hold strategy makes a loss at 1.94% every year. REFERENCES [1] R. D. Edwards, J. Magee ad W. H. C. Bassetti, echical Aalysis of Stock reds, Chapter 1. he echical Approach to radig ad Ivestig, 9th Editio, CRC Press, Boca Rato, 2007, pp. 3-7. [2] M. A. H. Dempster ad C. M. Joes, A Real-ime Adaptive radig System Usig Geeric Programmig, Quatitative Fiace, Vo. 1, No. 4, 2001, pp. 397-413. doi:10.1088/1469-7688/1/4/301 [3] P. Sutheebajard ad W. Premchaiswadi, Stock Exchage of hailad Idex Predictio Usig Back Propagatio Neural Networks, 2d Iteratioal Coferece o Computer ad Network echology, Bagkok, 23-25 April 2010. [4] V. Pacelli ad M. Azzollii, A Artificial Neural Network Approach for Credit Risk Maagemet, Joural of Itelliget Learig Systems ad Applicatios, Vol. 3, No. 2, 2011, pp. 103-112. [5] V. Pacelli, V. Bevilacqua ad M. Azzollii, A Artificial Neural Network Model to Forecast Exchage Rates, Joural of Itelliget Learig Systems ad Applicatios, Vol. 3, No. 2, 2011. pp. 57-69. [6] V. Pacelli, Forecastig Exchage Rates: A Comparative Aalysis, Iteratioal Joural of Busiess ad Social Sciece, Vol. 3, No. 10, 2012, pp. 145-156. [7] Wikipedia. http://e.wikipedia.org/wiki/adaptive_euro_fuzzy_ifer ece_system [8] A. La Zadeh, Outlie of a New Approach to the Aalysis of Complex Systems ad Decisio Processes, IEEE rasactios o Systems, Ma ad Cyberetics, Vol. 3, No. 1, 1973, pp. 28-44. doi:10.1109/smc.1973.5408575 [9] S. Agrawal, M. Jidal ad G. N. Pillai, Mometum Aalysis Based Stock Market Predictio Usig Adaptive Neuro-Fuzzy Iferece System (ANFIS), Proceedigs of the Iteratioal Multi Coferece of Egieers ad Computer Scietists, Hog Kog, 17-19 March 2010. [10] M. Ekma ad M. Begtsso, Adaptive Rule-Based Stock radig: A est of the Efficiet Market Hypothesis, MS hesis, Uppsala Uiversity, Uppsala, 2004. [11] J. M. Athoy, N. F. Robert, L. Yag, A. W. Nathaiel, ad D. B. Steve, A Itroductio to Decisio ree Modelig, Joural of Chemometrics, Vol. 18, No. 6,

10 Liear ad Noliear radig Models with Gradiet Boosted Radom Forests ad Applicatio to 2004, pp. 275-285. doi:10.1002/cem.873 [12] L. Breima, Radom Forests, Machie Learig Joural, Vol. 45, No. 1, 2001, p. 532. [13] M. Maragoudakis ad D. Serpaos, owards Stock Market Data Miig Usig Eriched Radom Forests from extual Resources ad echical Idicators, Artificial Itelligece Applicatios ad Iovatios IFIP Advaces i Iformatio ad Commuicatio echology, Laraca, 6-7 October 2010, pp. 278-286. [14] S. Fog ad J. ai, he Applicatio of red Followig Strategies i Stock Market radig, 5th Iteratioal Joit Coferece o INC, IMS ad IDC, Seoul, 25-27 August 2009. [15] Q.-G. Wag, J. Li, Q. Qi ad S. Z. Sam Ge, Liear, Adaptive ad Noliear radig Models for Sigapore Stock Market with Radom Forests, Proceedigs of 9th IEEE Iteratioal Coferece o Cotrol ad Automatio, Satiago, 19-21 December 2011, pp. 726-731. [16] J. H. Friedma, Greedy Fuctio Approximatio: A Gradiet Boostig Machie, echical Report, Staford Uiversity, Staford, 1999. [17] J. H. Friedma, Stochastic Gradiet Boostig, echical Report, Staford Uiversity, Staford, 1999. [18] Chart School, 2010. http://stockcharts.com/school/doku.php?id=chart_school:t echical_idicators [19] Schoolworkout Math, Spearma s Rak Correlatio Coefficiet, 2011. http://www.schoolworkout.co.uk/documets/s1/spearma s%20correlatio%20coefficiet.pdf [20] J. M. Athoy, N. F. Robert, L. Yag, A. W. Nathaiel ad D. B. Steve, A Itroductio to Decisio ree Modelig, Joural of Chemometrics, Vol. 18, No. 6, 2004, pp. 275-285. doi:10.1002/cem.873 [21] L. Breima, Baggig Predictors, Machie Learig, Vol. 26, No. 2, 1996, pp. 123-140. doi:10.1007/bf00058655 [22] Wikipedia. http://e.wikipedia.org/wiki/adaptive_euro_fuzzy_ifer ece_system [23] Fees: Schedules, DBS Vikers Olie, 2010. http://www.dbsvolie.com/eglish/pfee.asp?sessioid= &RefNo=&AccoutNum=&Alterate=&Level=Public& Residetype= [24] Moetary Authority of Sigapore, Fiacial Database- Iterest Rate of Baks ad Fiace Compaies, 2010. http://secure.mas.gov.sg/msb/iterestratesofbaksadfi acecompaies.aspx