Stock Trading with Recurrent Reinforcement Learning (RRL) CS229 Application Project Gabriel Molina, SUID 5055783



Similar documents
Hedging with Forwards and Futures

Morningstar Investor Return

Conceptually calculating what a 110 OTM call option should be worth if the present price of the stock is

Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

Small Menu Costs and Large Business Cycles: An Extension of Mankiw Model *

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

Markit Excess Return Credit Indices Guide for price based indices

BALANCE OF PAYMENTS. First quarter Balance of payments

Multiprocessor Systems-on-Chips

SPEC model selection algorithm for ARCH models: an options pricing evaluation framework

The Transport Equation

Why Did the Demand for Cash Decrease Recently in Korea?

Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary

Forecasting, Ordering and Stock- Holding for Erratic Demand

Economics Honors Exam 2008 Solutions Question 5

Making a Faster Cryptanalytic Time-Memory Trade-Off

Chapter 6 Interest Rates and Bond Valuation

I. Basic Concepts (Ch. 1-4)

Chapter 1.6 Financial Management

Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1

Chapter 6: Business Valuation (Income Approach)

WHAT ARE OPTION CONTRACTS?

4. International Parity Conditions

Equities: Positions and Portfolio Returns

Measuring macroeconomic volatility Applications to export revenue data,

The Grantor Retained Annuity Trust (GRAT)

Individual Health Insurance April 30, 2008 Pages

Market Liquidity and the Impacts of the Computerized Trading System: Evidence from the Stock Exchange of Thailand

CHARGE AND DISCHARGE OF A CAPACITOR

Chapter 8: Regression with Lagged Explanatory Variables

Chapter Four: Methodology

Dynamic programming models and algorithms for the mutual fund cash balance problem

Chapter 2 Problems. 3600s = 25m / s d = s t = 25m / s 0.5s = 12.5m. Δx = x(4) x(0) =12m 0m =12m

1. y 5y + 6y = 2e t Solution: Characteristic equation is r 2 5r +6 = 0, therefore r 1 = 2, r 2 = 3, and y 1 (t) = e 2t,

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

Predicting Stock Market Index Trading Signals Using Neural Networks

Table of contents Chapter 1 Interest rates and factors Chapter 2 Level annuities Chapter 3 Varying annuities

THE FIRM'S INVESTMENT DECISION UNDER CERTAINTY: CAPITAL BUDGETING AND RANKING OF NEW INVESTMENT PROJECTS

11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

Distributing Human Resources among Software Development Projects 1

Optimal Investment and Consumption Decision of Family with Life Insurance

Double Entry System of Accounting

Rationales of Mortgage Insurance Premium Structures

A Re-examination of the Joint Mortality Functions

µ r of the ferrite amounts to It should be noted that the magnetic length of the + δ

CRISES AND THE FLEXIBLE PRICE MONETARY MODEL. Sarantis Kalyvitis

How To Price An Opion

Fifth Quantitative Impact Study of Solvency II (QIS 5) National guidance on valuation of technical provisions for German SLT health insurance

OPTIMAL PORTFOLIO MANAGEMENT WITH TRANSACTIONS COSTS AND CAPITAL GAINS TAXES

Answer, Key Homework 2 David McIntyre Mar 25,

The Impact of Surplus Distribution on the Risk Exposure of With Profit Life Insurance Policies Including Interest Rate Guarantees.

Model Embedded Control: A Method to Rapidly Synthesize Controllers in a Modeling Environment

COMPARISON OF AIR TRAVEL DEMAND FORECASTING METHODS

Supplementary Appendix for Depression Babies: Do Macroeconomic Experiences Affect Risk-Taking?

Capacity Planning and Performance Benchmark Reference Guide v. 1.8

Name: Algebra II Review for Quiz #13 Exponential and Logarithmic Functions including Modeling

AP Calculus AB 2010 Scoring Guidelines

Chapter 9 Bond Prices and Yield

The Greek financial crisis: growing imbalances and sovereign spreads. Heather D. Gibson, Stephan G. Hall and George S. Tavlas

Risk Modelling of Collateralised Lending

Financial Market Microstructure and Trading Algorithms

The Impact of Surplus Distribution on the Risk Exposure of With Profit Life Insurance Policies Including Interest Rate Guarantees

Performance Center Overview. Performance Center Overview 1

NASDAQ-100 Futures Index SM Methodology

FORWARD AND FUTURES CONTRACTS

Option Put-Call Parity Relations When the Underlying Security Pays Dividends

Inductance and Transient Circuits

Analysis of Pricing and Efficiency Control Strategy between Internet Retailer and Conventional Retailer

Chapter 7. Response of First-Order RL and RC Circuits

Present Value Methodology

17 Laplace transform. Solving linear ODE with piecewise continuous right hand sides

LEASING VERSUSBUYING

Chapter 10 Social Security 1

The Application of Multi Shifts and Break Windows in Employees Scheduling

Research. Michigan. Center. Retirement. Behavioral Effects of Social Security Policies on Benefit Claiming, Retirement and Saving.

UNDERSTANDING THE DEATH BENEFIT SWITCH OPTION IN UNIVERSAL LIFE POLICIES. Nadine Gatzert

Vector Autoregressions (VARs): Operational Perspectives

Multi-camera scheduling for video production

Modelling and Trading the Gasoline Crack Spread: A Non-Linear Story

ABSTRACT KEYWORDS. Term structure, duration, uncertain cash flow, variable rates of return JEL codes: C33, E43 1. INTRODUCTION

Skewness and Kurtosis Adjusted Black-Scholes Model: A Note on Hedging Performance

Disability Insurance Applications near Retirement Age

Tax Externalities of Equity Mutual Funds

The yield curve, and spot and forward interest rates Moorad Choudhry

Chapter 4: Exponential and Logarithmic Functions

The Interest Rate Risk of Mortgage Loan Portfolio of Banks

Methodology brief Introducing the J.P. Morgan Emerging Markets Bond Index Global (EMBI Global)

DEMAND FORECASTING MODELS

B-Splines and NURBS Week 5, Lecture 9

Transcription:

Sock raing wih Recurren Reinforcemen Learning (RRL) CS9 Applicaion Projec Gabriel Molina, SUID 555783

I. INRODUCION One relaively new approach o financial raing is o use machine learning algorihms o preic he rise an fall of asse prices before hey occur. An opimal raer woul buy an asse before he price rises, an sell he asse before is value eclines. or his projec, an asse raer will be implemene using recurren reinforcemen learning (RRL). he algorihm an is parameers are from a paper wrien by Mooy an Saffell. I is a graien ascen algorihm which aemps o maximize a uiliy funcion known as Sharpe s raio. By choosing an opimal parameer w for he raer, we aemp o ake avanage of asse price changes. es examples of he asse raer s operaion, boh real-worl an conrive, are illusrae in he final secion. III. UILIY UNCION: SHARPE S RAIO One commonly use meric in financial engineering is Sharpe s raio. or a ime series of invesmen reurns, Sharpe s raio can be calculae as: Average( R ) S for inerval,..., Sanar Deviaion( R ) where R is he reurn on invesmen for raing perio. Inuiively, Sharpe s raio rewars invesmen sraegies ha rely on less volaile rens o make a profi. IV. RADER UNCION he raer will aemp o maximize Sharpe s raio for a given price ime series. or his projec, he raer funcion akes he form of a neuron: anh( w x ) where M is he number of ime series inpus o he raer, he parameer vecor x, r,..., r M,, an he reurn r p p. w M, he inpu Noe ha r is he ifference in value of he asse beween he curren perio an he previous perio. herefore, r is he reurn on one share of he asse bough a ime. Also, he funcion [, ] represens he raing posiion a ime. here are hree ypes of posiions ha can be hel: long, shor, or neural. A long posiion is when. In his case, he raer buys an asse a price p an hopes ha i appreciaes by perio. A shor posiion is when. In his case, he raer sells an asse which i oes no own a price p, wih he expecaion o prouce he shares a perio. If he price a is higher, hen he raer is force o buy a he higher price o fulfill he conrac. If he price a is lower, hen he raer has mae a profi. J Mooy, M Saffell, Learning o rae via Direc Reinforcemen, IEEE ransacions on Neural Neworks, Vol, No 4, July.

A neural posiion is when. In his case, he oucome a ime has no effec on he raer s profis. here will be neiher gain nor loss. hus, represens holings a perio. ha is, n shares are bough (long posiion) or sol (shor posiion), where is he maximum possible number of shares per ransacion. he reurn a ime, consiering he ecision, is: R r where is he cos for a ransacion a perio. If (i.e. no change in our invesmen his perio) hen here will be no ransacion penaly. Oherwise he penaly is proporional o he ifference in shares hel. he firs erm ( r ) is he reurn resuling from he invesmen ecision from he perio. or example, if shares, he ecision was o buy half he maximum allowe (. 5 ), an each share increase r 8 price unis, his erm woul be 8, he oal reurn profi (ignoring ransacion penalies incurre uring perio ). V. GRADIEN ASCEN Maximizing Sharpe s raio requires a graien ascen. irs, we efine our uiliy funcion using basic formulas from saisics for mean an variance: We have S E[ R ] A where A E[ R ] ( E[ R ]) B A R an B R hen we can ake he erivaive of S using he chain rule: S S A B A A A S B S A B A S B B S A A S B B he necessary parial erivaives of he reurn funcion are: r sgn( ) r r r sgn( ) hen, he parial erivaives an mus be calculae:

anh( w x ) ( anh( w x ) ) w x ( anh( w ) x M x ) w 3 Noe ha he erivaive is recurren an epens on all previous values of. his means ha o rain he parameers, we mus keep a recor of from he beginning of our ime series. Because sock aa is in he range of - samples, his slows own he graien ascen bu oes no presen an insurmounable compuaional buren. An alernaive is o use online learning an o approximae using only he previous erm, effecively making he algorihm a sochasic graien ascen as in Mooy & Saffell s paper. However, my chosen approach is o insea use he exac expressions as wrien above. Once he S erm has been calculae, he weighs are upae accoring o he graien ascen rule wi wi S. he process is repeae for N e ieraions, where N e is chosen o assure ha Sharpe s raio has converge. VI. RAINING he mos successful meho in my exploraion has been he following algorihm:. rain parameers w M using a hisorical winow of size. Use he opimal policy w o make real ime ecisions from o N preic 3. Afer N preic preicions are complee, repea sep one. Inuiively, he sock price has unerlying srucure ha is changing as a funcion of ime. Choosing large assumes he sock price s srucure oes no change much uring samples. In he ranom process example below, an N are large because he srucure of he process is consan. If long erm rens o no appear o preic ominae sock behavior, hen i makes sense o reuce, since shorer winows can be a beer soluion han raining on large amouns of pas hisory. or example, aa for he years IBM 98-6 migh no lea o a goo sraegy for use in Dec. 6. A more accurae policy woul likely resul from raining wih aa from 4-6. VII. EXAMPLE price, p().95 3 4 5 6 7 8 9 Sharpe' raio.6.4...8.6 3 4 5 6 7 raining ieraion igure. raining resuls for auoregressive ranom process., 75 N e he firs example of raining a policy is execue on an auoregressive ranom process (ranomness by injecing Gaussian noise ino couple equaions). In figure, he op graph is he generae price series. he boom graph is Sharpe s raio on he ime series using he parameer w for each ieraion of raining. So, as raining progresses, we fin beer values of w unil we have achieve an opimum Sharpe s raio for he given aa.

hen, we use his opimal w parameer o form a preicion for he nex N preic aa samples, shown below: 4 igure. Preicion performance using opimal policy from raining. N preic As is apparen from he above graph, he raer is making ecisions base on he w parameer. Of course, w is subopimal for he ime series over his preice inerval, bu i oes beer han a monkey. Afer inervals our reurn woul be %. he nex experimen, presene in he same forma, is o preic real sock aa wih some precipious rops (Ciigroup): price series, p 6 4 3 4 5 6 Sharpe's raio..5 3 4 5 6 7 8 9 raining ieraion, N e igure 3. raining w on Ciigroup sock aa. 6

5 5 reurns, r -5 - -5 6 65 7 75 8 85 9.5 (ecisions) -.5-6 65 7 75 8 85 9 3 percen gains (%) 6 65 7 75 8 85 9 igure 4. r (op), (mile), an percenage profi (cumulaive) for Ciigroup. Noe ha alhough he general r ) wipes ou our gains aroun = 75. policy is goo, he precipious rop in price (ownwar spike in he recurren reinforcemen learner seems o work bes on socks ha are consan on average, ye flucuae up an own. In such a case, here is less worry abou a precipious rop like in he above example. Wih a relaively consan mean sock price, he reinforcemen learner is free o play he ups an owns. he recurren reinforcemen learner seems o work, alhough i is ricky o se up an verify. One imporan rick is o properly scale he reurn series aa o mean zero an variance one, or he neuron canno separae he resuling aa poins. VII. CONCLUSIONS he primary ifficulies wih his approach res in he fac ha cerain sock evens o no exhibi srucure. As seen in he secon example above, he reinforcemen learner oes no preic precipious rops in he sock price an is jus as vulnerable as a human. Perhaps i woul be more effecive if combine wih a mechanism o preic such precipious rops. Oher changes o he moel migh be incluing sock volumes as feaures ha coul help in preicing rises an falls. Aiionally, i woul be nice o augmen he moel o incorporae fixe ransacion coss, as well as less frequen ransacions. or example, a moel coul be creae ha learns from long perios of aa, bu only perioically makes a ecision. his woul reflec he case of a casual raer ha paricipaes in smaller volume raes wih fixe ransacion coss. Because i is oo expensive for small-ime invesors o rae every perio wih fixe ransacion coss, a moel wih a perioic rae sraegy woul more financially feasible for such users. I woul probably be worhwhile o ry aaping his moel o his sor of perioic raing an see he resuls. Gol, Carl, X raing via Recurren Reinforcemen Learning, Compuaional Inelligences for inancial Engineering, 3. Proceeings. 3 IEEE Inernaional Conference on. p. 363-37. March 3. Special hanks o Carl for email avice on algorihm implemenaion.