Machine Learning in Statistical Arbitrage



Similar documents
A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH

Simple Linear Regression Inference

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Statistical Arbitrage in the U.S. Equities Market

Volatility in the Overnight Money-Market Rate

Drugs store sales forecast using Machine Learning

Chapter 4: Vector Autoregressive Models

Some Quantitative Issues in Pairs Trading

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Forecasting methods applied to engineering management

A Trading Strategy Based on the Lead-Lag Relationship of Spot and Futures Prices of the S&P 500

Risk Decomposition of Investment Portfolios. Dan dibartolomeo Northfield Webinar January 2014

Detecting Network Anomalies. Anant Shah

Stock Price Dynamics, Dividends and Option Prices with Volatility Feedback

Statistical Machine Learning

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Financial Market Efficiency and Its Implications

Multiple Kernel Learning on the Limit Order Book

VI. Real Business Cycles Models

Machine Learning and Pattern Recognition Logistic Regression

DATA ANALYSIS II. Matrix Algorithms

A Study on the Comparison of Electricity Forecasting Models: Korea and China

Is the Forward Exchange Rate a Useful Indicator of the Future Exchange Rate?

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Cointegration Pairs Trading Strategy On Derivatives 1

Chapter 6: Multivariate Cointegration Analysis

University of Lille I PC first year list of exercises n 7. Review

3. Regression & Exponential Smoothing

Time Series Analysis of Aviation Data

Algorithmic Trading: Model of Execution Probability and Order Placement Strategy

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Online Appendix. Supplemental Material for Insider Trading, Stochastic Liquidity and. Equilibrium Prices. by Pierre Collin-Dufresne and Vyacheslav Fos

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Data Mining: Algorithms and Applications Matrix Math Review

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Subspace Analysis and Optimization for AAM Based Face Alignment

Sales forecasting # 2

High Frequency Equity Pairs Trading: Transaction Costs, Speed of Execution and Patterns in Returns

Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1

Working Papers. Cointegration Based Trading Strategy For Soft Commodities Market. Piotr Arendarski Łukasz Postek. No. 2/2012 (68)

2.2 Elimination of Trend and Seasonality

CHAPTER 11: ARBITRAGE PRICING THEORY

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Least-Squares Intersection of Lines

Forecasting in supply chains

1.3. DOT PRODUCT If θ is the angle (between 0 and π) between two non-zero vectors u and v,

Impulse Response Functions

Earnings Announcement and Abnormal Return of S&P 500 Companies. Luke Qiu Washington University in St. Louis Economics Department Honors Thesis

Estimating Volatility

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Least Squares Estimation

Introduction to General and Generalized Linear Models

STRATEGIC CAPACITY PLANNING USING STOCK CONTROL MODEL

6.2 Permutations continued

Neural Networks for Sentiment Detection in Financial Text

On the long run relationship between gold and silver prices A note

Dynamic Relationship between Interest Rate and Stock Price: Empirical Evidence from Colombo Stock Exchange

How can we discover stocks that will

Structural Analysis of Network Traffic Flows Eric Kolaczyk

Yao Zheng University of New Orleans. Eric Osmer University of New Orleans

Analysis of Bayesian Dynamic Linear Models

D-optimal plans in observational studies

17. SIMPLE LINEAR REGRESSION II

Lecture 3: Linear methods for classification

Testing for Granger causality between stock prices and economic growth

Estimating an ARMA Process

Jim Gatheral Scholarship Report. Training in Cointegrated VAR Modeling at the. University of Copenhagen, Denmark

by Maria Heiden, Berenberg Bank

Trading Basket Construction. Mean Reversion Trading. Haksun Li

Maximum likelihood estimation of mean reverting processes

Vector Spaces; the Space R n

Studying Achievement

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Designing a learning system

Part 2: Analysis of Relationship Between Two Variables

Univariate and Multivariate Methods PEARSON. Addison Wesley

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Component Ordering in Independent Component Analysis Based on Data Power

Discussion of Momentum and Autocorrelation in Stock Returns

JetBlue Airways Stock Price Analysis and Prediction

4. Simple regression. QBUS6840 Predictive Analytics.

TWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS

Nonlinear Regression:

The Effect of Short-selling Restrictions on Liquidity: Evidence from the London Stock Exchange

Private Equity Fund Valuation and Systematic Risk

Modelling electricity market data: the CARMA spot model, forward prices and the risk premium

Trend and Seasonal Components

TEMPORAL CAUSAL RELATIONSHIP BETWEEN STOCK MARKET CAPITALIZATION, TRADE OPENNESS AND REAL GDP: EVIDENCE FROM THAILAND

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Stock Market Forecasting Using Machine Learning Algorithms

Using Options Trading Data to Algorithmically Detect Insider Trading

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Econometrics Simple Linear Regression

8.1 Summary and conclusions 8.2 Implications

Transcription:

Machine Learning in Statistical Arbitrage Xing Fu, Avinash Patra December 11, 2009 Abstract We apply machine learning methods to obtain an index arbitrage strategy. In particular, we employ linear regression and support vector regression (SVR) onto the prices of an exchange-traded fund and a stream of stocks. By using principal component analysis (PCA) in reducing the dimension of feature space, we observe the benefit and note the issues in application of SVR. To generate trading signals, we model the residuals from the previous regression as a mean reverting process. At the end, we show how our trading strategies beat the market. 1 Introduction In the field of investment, statistical arbitrage refers to attempting to profit from pricing inefficiencies identified through mathematical models. The basic assumption is that prices will move towards a historical average. The most commonly used and simplest case of statistical arbitrage is pairs trading. If stocks P and Q are in the same industry or have similar characteristics, we expect the returns of the two stocks to track each other. Accordingly, if P t and Q t denote the corresponding price time series, then we can model the system as dp t P t = αdt + β dq t Q t + dx t, where X t is a mean reverting Ornstein-Uhlenbeck stochastic process. In many cases of interest, the drift α is small compared to the fluctuations of X t and can therefore be neglected. The model suggests a contrarian investment strategy in which we go long one dollar of stock P and short beta dollars of stock Q if X t is small and, conversely, go short P and long Q if X t is large. Opportunities for this kind of pairs trading depend upon the existence of similar pairs of assets, and thus are naturally limited. Here, we use a natural extension of pairs trading called index arbitrage, as described in [1]. The trading system we develop tries to exploit discrepancies between a target asset, the ishares FTSE/MACQ traded in the London Stock Exchange, and a synthetic artificial asset represented by a data stream obtained as a linear combination of a possibly large set of explanatory streams assumed to be correlated with the target stream. In our case, we use the stock prices of the 100 constituents of FTSE 100 Index to reproduce the target asset. We start with linear regression on the 100 constituents and take the window as 101 days from April to September 2009. After extracting significant factors by Principal Component Analysis, we calibrate the model cutting the window size down to 60 days. Closed prices of each asset are used. The emphasis here is to decompose the stock data into systematic and idiosyncratic components and statistically model the idiosyncratic part. We apply both Linear Regression and Supported Vector Regression, and collect the residual that remains after the decomposition is done. Finally, we study the mean-reversion using auto-regression model in the residuals to generate trading signals for our target asset. 1

2 Linear Regression We consider the data of 101 days from Apr.28 to Sep.18, 2009. The reason we choose this time period is that given 100 feature variables, we need at least 101 observations to train the 101 parameters in the linear regression model. To avoid overfitting parameters, we only use 101 training examples. Denote P t to be target asset and Q it to be the ith stock constituent at time t. The linear model can be written as 100 P t = θ 0 + θ i Q it. i=1 Implementing the ordinary least squares algorithm in MATLAB, we get parameters θ and training errors, i.e. residuals. we apply Principal Component Analysis to reduce the dimension of the model. Figure 2: Generalization error on 30 testing days 3 Principal Component Analysis (PCA) Now we apply PCA to analyze the 100 stocks. The estimation window for the correlation matrix is 101 days. The eigenvalues at the top of the spectrum which are isolated from the bulk spectrum are obviously significant. The problem becomes evident by looking at eigenvalues of the correlation matrix in Figures 3. Apparently, the Figure 1: Residuals of linear regression over 100 constituents From Figure 1, we see that the empirical error is acceptable. However, when we test this linear model using 30 examples not in the training set, i.e. prices of each asset from Sep. 21 to Oct. 30, 2009, the generalization error is far from satisfying, as shown in Figure 2. This implies that we are facing an overfitting problem, even though we ve used the smallest possible training set. There are so many parameters in the linear model, which gets us into this problem. Hence, Figure 3: eigenvalue of the correlation matrix eigenvalues within the first twenty reveal almost 2

all information of the matrix. Now we apply validation rules to find how many principal components to use that can give us the least generalization error. Given that the dimension of the model is reduced, we reset our window size to 60 days to avoid overfitting problem. After running multivariate linear regression within the first 20 components using 60 days training set, we find that the first 12 components give the smallest generalization error on the 30 testing days. The out-of-sample residual is shown in Figure 4. Figure 5: Empirical error on the first 12 components over 60 training days 4 Support Vector Regression (SVR) Figure 4: Generalization error on the first 12 components over 30 out-of-sample days We apply SVR on the 12 feature attributes obtained through PCA. We use a Gaussian kernel and empirically decide on the kernel bandwidth, cost and epsilon (slack variable) parameters. (We used SVM and Kernel Methods MAT- LAB toolbox, available online for implementation). We do not notice any improvement in this approach,as seen in the plots of training error and test error, Figure 6 and Figure 7. A main issue here is to be able to determine appropriate SVR parameters. We take a quick look back at the empirical error of these 12 components over 60 training days. From Figure 5, we see that the residuals are not as satisfying as in Figure 1, in terms of magnitude, but residuals here successfully reveal the trend of residuals using 100 constituents. Thus, by applying PCA to reduce the dimension of the model, we avoid overfitting parameters. We will use the in-sample residuals from regression on principal components to generate trading signals in a following section. 5 Mean Reverting Process We want to model the price of the target asset such that it accounts for a drift which measures systematic deviations from the sector and a price fluctuation that is mean-reverting to the overall industry level. We construct a trading strategy when significant fluctuations from equilibrium are observed. We introduce a parametric mean reverting model for the asset, the Ornstein-Uhlembeck 3

Figure 6: Empirical error on the first 12 components over 60 training days using SVR process, dx(t) = κ(m X(t))dt + σdw (t), κ > 0. The term dx(t) is assumed to be the increment of a stationary stochastic process which models price fluctuations corresponding to idiosyncratic fluctuations in the prices which are not reflected in the industry sector, i.e. the residuals from linear regression on principal components in the previous section. Note that the increment dx(t) has unconditional mean zero and conditional mean equal to E{dX(t) X(s), s t} = κ(m X(t))dt. The conditional mean, i.e. the forecast of expected daily returns, is positive or negative according to the sign of m X(t). This process is stationary and can be estimated by an auto-regression with lag 1. We use the residuals on a window of length 60 days, assuming the parameters are constant over the window. In fact, the model is shown as X t+1 = a + bx t + ɛ t+1, t = 1, 2,, 59, where we have a = m(1 e κ t ) Figure 7: Generalization error on the first 12 components over 30 testing days using SVR Hence, we get b = e κ t V ar(ɛ) = σ 2 1 e κ t. 2κ κ = log(b) 101 = 178.2568 a m = 1 b = 0.0609 V ar(ɛ) 2κ σ = 1 b 2 = 135.9869 V ar(ɛ) σ eq = 1 b 2 = 7.2021. 6 Signal Generation We define a scalar varaible called the s-score s = X(t) m σ eq. The s-score measures the distance of the cointegrated residual from equilibrium per unit standard deviation, i.e. how far away a given stock is from the theoretical equilibrium value associated with our model. According to empirical studies in this field, our basic trading signal based on mean-reversion is 4

1. Buy ishare FTSE if s < 1.25 2. Sell ishare FTSE if s > 1.25 3. Close short position in ishare FTSE if s < 0.75 4. Close long position in ishare FTSE if s > 0.5 The rationale for opening trades only when the s-score s is far from equilibrium is to trade only when we think that we detected an anomalous escursion of the co-integration residual. We then need to consider when we close trades. Closing trades when the s-score is near zero makes sense, since we expect most stocks to be near equilibrium most of the time. Thus, our trading rule detects stocks with large excursions and trades assuming these excursions will revert to the mean. Figure 8 shows the signals we generate over 60 days. 60 days data) that using our strategy based on the trading signals clearly makes a trading profit as on average we purchase ishare FTSE when it is low and sell it when it is high. On the other hand, in the future, idiosyncratic factors behaving erratically (such as occurrence of some specific unforeseen event) might lead to the index systematically under-performing or doing much better than the significant factors found from PCA and this could seriously undermine the effectiveness of our approach. To achieve a systematic method, online learning would be a good way to try out, in order to update our feature set with the latest information. References [1] G. Montana, K. Triantafyllopoulos and T. Tsagaris, Flexible least squares for temporal data mining and statistical arbitrage, Expert Systems with Applications, Vol. 36, Issue 2, Part 2, pp. 2819-2830, 2009. [2] A. Marco and Jeong-Hyun Lee, Statistical Arbitrage in the U.S. Equities Market, Social Science Research Network, 2008. [3] T. Fletcher, Support Vector Macines Explained, 2009 Figure 8: Trading signals over 60 days 7 Conclusion We note that PCA helped in getting rid of overfitting problem with 100 feature attributes, while conducting linear regression. However, we see that to effectively apply Support Vector Regression, a technique to learn SVR-parameters might have to be developed. We see on testing (over 5