ALGORITHMIC TRADING WITH MARKOV CHAINS



Similar documents
Continued Fractions and the Euclidean Algorithm

Stochastic Inventory Control

LECTURE 15: AMERICAN OPTIONS

ALMOST COMMON PRIORS 1. INTRODUCTION

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

arxiv: v1 [math.pr] 5 Dec 2011

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

Numerical methods for American options

Chapter 7. Sealed-bid Auctions

No: Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

EMBEDDING COUNTABLE PARTIAL ORDERINGS IN THE DEGREES

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

Gambling Systems and Multiplication-Invariant Measures

2.3 Convex Constrained Optimization Problems

1 if 1 x 0 1 if 0 x 1

FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z

Financial Market Microstructure Theory

6.2 Permutations continued

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Fairness in Routing and Load Balancing

Lecture 3: Finding integer solutions to systems of linear equations

Optimal order placement in a limit order book. Adrien de Larrard and Xin Guo. Laboratoire de Probabilités, Univ Paris VI & UC Berkeley

FINITE DIMENSIONAL ORDERED VECTOR SPACES WITH RIESZ INTERPOLATION AND EFFROS-SHEN S UNIMODULARITY CONJECTURE AARON TIKUISIS

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

A Decomposition Approach for a Capacitated, Single Stage, Production-Inventory System

Single item inventory control under periodic review and a minimum order quantity

The Heat Equation. Lectures INF2320 p. 1/88

ON LIMIT LAWS FOR CENTRAL ORDER STATISTICS UNDER POWER NORMALIZATION. E. I. Pancheva, A. Gacovska-Barandovska

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

1. Prove that the empty set is a subset of every set.

Notes from Week 1: Algorithms for sequential prediction

1 Approximating Set Cover

Persuasion by Cheap Talk - Online Appendix

3. INNER PRODUCT SPACES

Notes on Factoring. MA 206 Kurt Bryan

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Revenue Management for Transportation Problems

The Trip Scheduling Problem

E3: PROBABILITY AND STATISTICS lecture notes

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

The Basics of Interest Theory

CHAPTER 5. Number Theory. 1. Integers and Division. Discussion

Inner Product Spaces

Lectures on Stochastic Processes. William G. Faris

Lecture 13: Martingales

Reinforcement Learning

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

Trading and Price Diffusion: Stock Market Modeling Using the Approach of Statistical Physics Ph.D. thesis statements. Supervisors: Dr.

FIRST YEAR CALCULUS. Chapter 7 CONTINUITY. It is a parabola, and we can draw this parabola without lifting our pencil from the paper.

SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 3. Spaces with special properties

Mathematics for Econometrics, Fourth Edition

Pricing of Limit Orders in the Electronic Security Trading System Xetra

When Promotions Meet Operations: Cross-Selling and Its Effect on Call-Center Performance

Inventory Management with Auctions and Other Sales Channels: Optimality of (s, S) Policies

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Practical Guide to the Simplex Method of Linear Programming

F. ABTAHI and M. ZARRIN. (Communicated by J. Goldstein)

Lecture notes: single-agent dynamics 1

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

Duality of linear conic problems

CPC/CPA Hybrid Bidding in a Second Price Auction

TOPIC 4: DERIVATIVES

α = u v. In other words, Orthogonal Projection

A Simple Model for Intra-day Trading

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

Math 55: Discrete Mathematics

8.1 Min Degree Spanning Tree

Mathematical Finance

Analysis of a Production/Inventory System with Multiple Retailers

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Lecture 7: Finding Lyapunov Functions 1

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

Adaptive Online Gradient Descent

Lecture 10. Finite difference and finite element methods. Option pricing Sensitivity analysis Numerical examples

Reinforcement Learning

BANACH AND HILBERT SPACE REVIEW

Lecture 4: BK inequality 27th August and 6th September, 2007

A Sublinear Bipartiteness Tester for Bounded Degree Graphs

Lecture 3: Growth with Overlapping Generations (Acemoglu 2009, Chapter 9, adapted from Zilibotti)

9 More on differentiation

Notes on Determinant

Lecture 4 Online and streaming algorithms for clustering

Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

A Single-Unit Decomposition Approach to Multi-Echelon Inventory Systems

Moreover, under the risk neutral measure, it must be the case that (5) r t = µ t.

FE570 Financial Markets and Trading. Stevens Institute of Technology

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

4: SINGLE-PERIOD MARKET MODELS

Sequences and Series

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

8 Divisibility and prime numbers

Scheduling Real-time Tasks: Algorithms and Complexity

SPECTRAL POLYNOMIAL ALGORITHMS FOR COMPUTING BI-DIAGONAL REPRESENTATIONS FOR PHASE TYPE DISTRIBUTIONS AND MATRIX-EXPONENTIAL DISTRIBUTIONS

Introduction to Auction Design

Transcription:

June 16, 2010 ALGORITHMIC TRADING WITH MARKOV CHAINS HENRIK HULT AND JONAS KIESSLING Abstract. An order book consists of a list of all buy and sell offers, represented by price and quantity, available to a market agent. The order book changes rapidly, within fractions of a second, due to new orders being entered into the book. The volume at a certain price level may increase due to limit orders, i.e. orders to buy or sell placed at the end of the queue, or decrease because of market orders or cancellations. In this paper a high-dimensional Markov chain is used to represent the state and evolution of the entire order book. The design and evaluation of optimal algorithmic strategies for buying and selling is studied within the theory of Markov decision processes. General conditions are provided that guarantee the existence of optimal strategies. Moreover, a value-iteration algorithm is presented that enables finding optimal strategies numerically. As an illustration a simple version of the Markov chain model is calibrated to high-frequency observations of the order book in a foreign exchange market. In this model, using an optimally designed strategy for buying one unit provides a significant improvement, in terms of the expected buy price, over a naive buy-one-unit strategy. 1. Introduction The appearance of high frequency observations of the limit order book in order driven markets have radically changed the way traders interact with financial markets. With trading opportunities existing only for fractions of a second it has become essential to develop effective and robust algorithms that allow for instantaneous trading decisions. In order driven markets there are no centralized market makers, rather all participants have the option to provide liquidity through limit orders. An agent who wants to buy a security is therefore faced with an array of options. One option is to submit a market order, obtaining the security at the best available ask price. Another alternative is to submit a limit order at a price lower than the ask price, hoping that this order will eventually be matched against a market sell order. What is the best alternative? The answer will typically depend both on the agent s view of current market conditions as well as on the current state of the order book. With new orders being submitted at a very high frequency the optimal choice can change in a matter of seconds or even fractions of a second. In this paper the limit order book is modelled as a high-dimensional Markov chain where each coordinate corresponds to a price level and the state of the Markov chain represents the volume of limit orders at every price level. For this model many tools from applied probability are available to design and evaluate the performance of different trading strategies. Throughout the paper the emphasis will be on what we call buy-one-unit strategies and making-the-spread strategies. In the first case an 1 c the Authors

2 H. HULT AND J. KIESSLING agent wants to buy one unit of the underlying asset. Here one unit can be thought of as an order of one million EUR on the EUR/USD exchange. In the second case an agent is looking to earn the difference between the buy and sell price, the spread, by submitting a limit buy order and a limit sell order simultaneously, hoping that both orders will be matched against future market orders. Consider strategies for buying one unit. A naive buy-one-unit strategy is executed as follows. The agent submits a limit buy order and waits until either the order is matched against a market sell order or the best ask level reaches a predefined stop-loss level. The probability of the agent s limit order to be executed, as well as the expected buy price, can be computed using standard potential theory for Markov chains. It is not optimal to follow the naive buy-one-unit strategy. For instance, if the order book moves to a state where the limit order has a small probability of being executed, the agent would typically like to cancel and replace the limit order either by a market order or a new limit order at a higher level. Similarly, if the market is moving in favor of the agent, it might be profitable to cancel the limit order and submit a new limit order at a lower price level. Such more elaborate strategies are naturally treated within the framework of Markov decision processes. We show that, under certain mild conditions, optimal strategies always exist and that the optimal expected buy price is unique. In addition a value-iteration algorithm is provided that is well suited to find and evaluate optimal strategies numerically. Sell-one-unit strategies can of course be treated in precisely the same way as buy-one-unit strategies, so only the latter will be treated in this paper. In the final part of the paper we apply the value-iteration algorithm to find close to optimal buy strategies in a foreign exchange market. This provides an example of the proposed methodology, which consists of the following steps: 1) parametrize the generator matrix of the Markov chain representing the order book, 2) calibrate the model to historical data, 3) compute optimal strategies for each state of the order book, 4) apply the model to make trading decisions. The practical applicability of the method depends on the possibility to make sufficiently fast trading decisions. As the market conditions vary there is a need to recalibrate the model regularly. For this reason it is necessary to have fast calibration and computational algorithms. In the simple model presented in Sections 5 and 6 the calibration step 2) is fast and the speed is largely determined by how fast the optimal strategy is computed. In this example the buy-one-unit strategy is studied and the computation of the optimal strategy step 3) took roughly ten seconds on an ordinary notebook, using Matlab. Individual trading decision can then be made in a few milliseconds step 4). Today there is an extensive literature on order book dynamics. In this paper the primary interest is in short-term predictions based on the current state and recent history of the order book. The content of this paper is therefore quite similar in spirit to [4] and [2]. This is somewhat different from studies of market impact and its relation to the construction of optimal execution strategies of large market orders through a series of smaller trades. See for instance [13] and [1]. Statistical properties of the order book is a popular topic in the econophysics literature. Several interesting studies have been written over the years, two of

ALGORITHMIC TRADING WITH MARKOV CHAINS 3 which we mention here. In the enticingly titled paper What really causes large price changes?, [7], the authors claim that large changes in share prices are not due to large market orders. They find that statistical properties of prices depend more on fluctuations in revealed supply and demand than on their mean behavior, highlighting the importance of models taking the whole order book into account. In [3], the authors study certain static properties of the order book. They find that limit order prices follow a power law around the current price, suggesting that market participants believe in the possibility of very large price variations within a rather short time horizon. It should be pointed out that the mentioned papers study limit order books for stock markets. Although the theory presented in this paper is quite general the applications provided here concern a particular foreign exchange market. There are many similarities between order books for stocks and exchange rates but there are also some important differences. For instance, orders of unit size e.g. one million EUR) keep the volume at rather small levels in absolute terms compared to stock markets. In stock market applications of the techniques provided here one would have to bundle shares by selecting an appropriate unit size of orders. We are not aware of empirical studies, similar to those mentioned above, of order books in foreign exchange markets. Another approach to study the dynamical aspects of limit order books is by means of game theory. Each agent is thought to take into account the effect of the order placement strategy of other agents when deciding between limit or market orders. Some of the systematic properties of the order book may then be explained as properties of the resulting equilibrium, see e.g. [10] and [14] and the references therein. In contrast, our approach assumes that the transitions of the order book are given exogeneously as transitions of a Markov chain. The rest of this paper is organized as follows. Section 2 contains a detailed description of a general Markov chain representation of the limit order book. In Section 3 some discrete potential theory for Markov chains is reviewed and applied to evaluate a naive buy strategy and an elementary strategy for making the spread. The core of the paper is Section 4, where Markov decision theory is employed to study optimal trading strategies. A proof of existence of optimal strategies is presented together with an iteration scheme to find them. In Section 5, a simple parameterization of the Markov chain is presented together with a calibration technique. For this particular choice of model, limit order arrival rates depend only on the distance from the opposite best quote, and market order intensities are assumed independent of outstanding limit orders. The concluding Section 6 contains some numerical experiments on data from a foreign exchange EUR/USD) market. The simple model from Section 5 is calibrated on high-frequency data and three different buy-one-unit strategies are compared. It turns out that there is a substantial amount to be gained from using more elaborate strategies than the naive buy strategy. 2. Markov chain representation of a limit order book We begin with a brief description of order driven markets. An order driven market is a continuos double auction where agents can submit limit orders. A limit order, or quote, is a request to to buy or sell a certain quantity together with a worst allowable price, the limit. A limit order is executed immediately if there are

4 H. HULT AND J. KIESSLING outstanding quotes of opposite sign with the same or better) limit. Limit orders that are not immediately executed are entered into the limit order book. An agent having an outstanding limit order in the order book can at any time cancel this order. Limit orders are executed using time priority at a given price and price priority across prices. Following [7], orders are decomposed into two types: an order resulting in an immediate transaction is an effective market order and an order that is not executed, but stored in the limit order book, an effective limit order. For the rest of this paper effective market orders and effective limit orders will be referred to simply as market orders and limit orders, respectively. As a consequence, the limit of a limit buy sell) order is always lower higher) than the best available sell buy) quote. For simplicity it is assumed that the limit of a market buy sell) order is precisely equal to the best available sell buy) quote. Note that it is not assumed that the entire market order will be executed immediately. If there are fewer quotes of opposite sign at the level where the market order is entered, then the remaining part of the order is stored in the limit order book. 2.1. Markov chain representation. A continuous time Markov chain X =X t ) is used to model the limit order book. It is assumed that there are d N possible price levels in the order book, denoted π 1 < <π d. The Markov chain X t = Xt 1,..., Xt d ) represents the volume at time t 0 of buy orders negative values) and a sell orders positive values) at each price level. It is assumed that X j t Z for each j =1,..., d. That is, all volumes are integer valued. The state space of the Markov chain is denoted S Z d. The generator matrix of X is denoted Q =Q xy ), where Q xy is the transition intensity from state x =x 1,..., x d ) to state y =y 1,..., y d ). The matrix P =P xy ) is the transition matrix of the jump chain of X. Let us already here point out that for most of our results only the jump chain will be needed and it will also be denoted X =X n ) n=0, where n is the number of transitions from time 0.. For each state x S let j B = j B x) = max{j : x j < 0}, j A = j A x) = min{j : x j > 0}, be the highest bid level and the lowest ask level, respectively. For convenience it will be assumed that x d > 0 for all x S; i.e. there is always someone willing to sell at the highest possible price. Similarly x 1 < 0 for all x S; someone is always willing to buy at the lowest possible price. It is further assumed that the highest bid level is always below the lowest ask level, j B <j A. This will be implicit in the construction of the generator matrix Q and transition matrix P. The bid price is defined to be π B = π j B and the ask price is π A = π j A. Since there are no limit orders at levels between j B and j A, it follows that x j = 0 for j B < j < j A. The distance j A j B between the best ask level and the best bid level is called the spread. See Figure 1 for an illustration of the state of the order book. The possible transitions of the Markov chain X defining the order book are given as follows. Throughout the paper e j = 0,..., 0, 1, 0,..., 0) denotes a vector in Z d with 1 in the jth position. Limit buy order. A limit buy order of size k at level j is an order to buy k units at price π j. The order is placed last in the queue of orders at price π j. It may be

ALGORITHMIC TRADING WITH MARKOV CHAINS 5 % +,-.,/0112 *!!!!*!%!" %! ' ) Figure 1. State of the order book. The negative volumes to the left indicate limit buy orders and the positive volumes indicate limit sell orders. In this state j A = 44, j B = 42, and the spread is equal to 2. interpreted as k orders of unit size arriving instantaneously. Mathematically it is a transition of the Markov chain from state x to x ke j where j < j A and k 1. That is, a limit buy order can only be placed at a level lower than the best ask level j A. See Figure 2. % +,-.,/0112 % +,-.,/01123/45657/089/1,-., % +,-.,/0112 % +,-.,/01123/45657/8.44/1,-., * * * *!!!!!!!!!!!!!*!*!*!*!%!" %! ' )!%!" %! ' )!%!" %! ' )!%!" %! ' ) Figure 2. Left: Limit buy order of size 1 arrives at level 42. Right: Limit sell order of size 2 arrives at level 45. Limit sell order. A limit sell order of size k at level j is an order to sell k units at price π j. The order is placed last in the queue of orders at price π j. It may be interpreted as k orders of unit size arriving instantaneously. Mathematically it is a transition of the Markov chain from state x to x + ke j where j > j B and k 1. That is, a limit sell order can only be placed at a level higher than the best bid level j B. See Figure 2. Market buy order. A market buy order of size k is an order to buy k units at the best available price. It corresponds to a transition from state x to x ke j A. Note that if k x j A the market order will knock out all the sell quotes at j A, resulting in a new lowest ask level. See Figure 3. Market sell order. A market sell order of size k is an order to sell k units at the best available price. It corresponds to a transition from state x to x + ke j B. Note that if k x j B the market order will knock out all the buy quotes at j B, resulting in a new highest bid level. See Figure 3.

6 H. HULT AND J. KIESSLING % +,-.,/0112 % +,-.,/01123/45,2.6/078/1,-., % +,-.,/0112 % +,-.,/01123/45,2.6/7.88/1,-., * * * *!!!!!!!!!!!!!*!*!*!*!%!" %! ' )!%!" %! ' )!%!" %! ' )!%!" %! ' ) Figure 3. Left: Market buy order of size 2 arrives and knocks out level 44. Right: Market sell order of size 2 arrives. Cancellation of a buy order. A cancellation of a buy order of size k at level j is an order to instantaneously withdraw k limit buy orders at level j from the order book. It corresponds to a transition from x to x + ke j where j j B and 1 k x j. See Figure 4. % +,-.,/0112 % +,-.,/01123/4564.7758916/1:/0;</1,-., % +,-.,/0112 % +,-.,/01123/4564.7758916/1:/;.77/1,-., * * * *!!!!!!!!!!!!!*!*!*!*!%!" %! ' )!%!" %! ' )!%!" %! ' )!%!" %! ' ) Figure 4. Left: A cancellation of a buy order of size 1 arrives at level 40. Right: A cancellation of a sell order of size 2 arrives at level 47. Cancellation of a sell order. A cancellation of a sell order of size k at level j is an order to instantaneously withdraw k limit sell orders at level j from the order book. It corresponds to a transition from x to x ke j where j j A and 1 k x j. See Figure 4. Summary. To summarize, the possible transitions are such that Q xy is non-zero if and only if y is of the form x + ke j, j j B x), k 1, x ke y = j, j j A x), k 1. x ke j, j j A x), 1 k x j 1) x + ke j, j j B x), 1 k x j. To fully specify the model it remains to specify what the non-zero transition intensities are. The computational complexity of the model does not depend heavily on the specific choice of the non-zero transition intensities, but rather on the dimensionality of the transition matrix. In Section 5 a simple model is presented which is easy and fast to calibrate. 3. Potential theory for evaluation of simple strategies Consider an agent who wants to buy one unit. There are two alternatives. The agent can place a market buy order at the best ask level j A or place a limit buy order at a level less than j A. In the second alternative the buy price is lower but

ALGORITHMIC TRADING WITH MARKOV CHAINS 7 there is a risk that the order will not be executed, i.e. matched by a market sell order, before the price starts to move up. Then the agent may be forced to buy at a price higher than j A. It is therefore of interest to compute the probability that a limit buy order is executed before the price moves up as well as the expected buy price resulting from a limit buy order. These problems are naturally addressed within the framework of potential theory for Markov chains. First, a standard result on potential theory for Markov chains will be presented. A straightforward application of the result enables the computation of the expected price of a limit buy order and the expected payoff of a simple strategy for making the spread. 3.1. Potential theory. Consider a discrete time Markov chain X =X n ) on a countable state space S with transition matrix P. For a subset D S the set D = S \ D is called the boundary of D and is assumed to be non-empty. Let τ be the first hitting time of D, that is τ = inf{n : X n D}. Suppose a running cost function v C =v C s)) s D and a terminal cost function v T =v T s)) s D are given. The potential associated to v C and v T is defined by φ =φs)) s S where φs) =E ] v C X n )+v T X τ )I{τ < } X 0 = s. [ τ 1 n=0 The potential φ is characterized as the solution to a linear system of equations. Theorem 3.1 e.g. [12], Theorem 4.2.3). Suppose v C and v T are non-negative. Then φ satisfies { φ = Pφ + vc, in D, φ = v T, in D. 2) Theorem 3.1 is all that is needed to compute the success probability of buying/selling a given order and the expected value of simple buy/sell strategies. Details are given in the following section. 3.2. Probability that a limit order is executed. In this section X =X n ) denotes the jump chain of the order book described in Section 2 with possible transitions specified by 1). Suppose the initial state of the order book is X 0. The agent places a limit buy order at level J 0. The order is placed instantaneously at time 0 and after the order is placed the state of the order book is X 0 = X 0 e J0. Consider the probability that the order is executed before the best ask level is at least J 1 >j A X 0 ). As the order book evolves it is necessary to keep track of the position of the agent s buy order. For this purpose, an additional variable Y n is introduced representing the number of limit orders at level J 0 that are in front of the agent s order, including the agent s order, after n transitions. Then, Y 0 = X J0 0 1 and Y n can only move up towards 0 and does so whenever there is a market order at level J 0 or an order in front of the agent s order is cancelled. The pair X n,y n ) is also a Markov chain in S Z d {0, 1, 2,... } and whose jump chain has transition matrix denoted P. The state space is partitioned into two disjoint sets: S = D D where D = {x, y) S : y =0, or x j 0 for all J 0 < j < J 1 }.

8 H. HULT AND J. KIESSLING Define the terminal cost function v T : D R by { 1 if y =0 v T x, y) = 0 otherwise and let τ denote the first time X n,y n ) hits D. The potential φ =φs)) s S given by φs) =E[v T X τ,y τ )I{τ < } X 0,Y 0 )=s], is precisely the probability of the agent s market order being executed before best ask moves to or above J 1 conditional on the initial state. To compute the desired probability all that remains is to solve 2) with v C = 0. 3.3. Expected price for a naive buy-one-unit strategy. The probability that a limit buy order is executed is all that is needed to compute the expected price of a naive buy-one-unit strategy. The strategy is implemented as follows: 1) Place a unit size limit buy order at level J 0. 2) If best ask moves to level J 1, cancel the limit order and buy at level J 1. This assumes that there will always be limit sell orders available at level J 1. If p denotes the probability that the limit buy order is executed from the previous subsection) then the expected buy price becomes E[ buy price ] = pπ J0 + 1 p)π J1. Recall that, at the initial state, the agent may select to buy at the best ask price π j AX 0). This suggests that it is better to follow the naive buy-one-unit strategy than to place a market buy order whenever E[ buy price ] <π j AX 0). In Section 4 more elaborate buy strategies will be evaluated using the theory of Markov decision processes. 3.4. Making the spread. We now proceed to calculate the expected payoff of another simple trading strategy. The aim is to earn the difference between the bid and the ask price, the spread. Suppose the order book starts in state X 0. Initially an agent places a limit buy order at level j 0 and a limit sell order at level j 1 >j 0. In case both are executed the profit is the price difference between the two orders. The orders are placed instantaneously at n = 0 and after the orders are placed the state of the order book is X 0 e j0 + e j1. Let J 0 and J 1 be stop-loss levels such that J 0 <j 0 <j 1 <J 1. The simple making-the-spread strategy proceeds as follows. 1) If the buy order is executed first and the best bid moves to J 0 before the sell order is executed, cancel the limit sell order and place a market sell order at J 0. 2) If the sell order is executed first and the best ask moves to J 1 before the buy order is executed, cancel the limit buy order and place a market buy order at J 1. This strategy assumes that there will always be limit buy orders available at J 0 and limit sell orders at J 1. It will be necessary to keep track of the positions of the agent s limit orders. For this purpose two additional variables Y 0 n and Y 1 n are introduced that represent the number of limit orders at levels j 0 and j 1 that are in front of and including the agent s orders, respectively.

ALGORITHMIC TRADING WITH MARKOV CHAINS 9 It follows that Y 0 0 = X j0 0 1 and Y 1 0 = X j1 0 + 1, Y 0 n is non-decreasing, and Y 1 n is non-increasing. The agent s buy sell) order has been executed when Y 0 n =0 Y 1 n = 0). The triplet X n,y 0 n,y 1 n ) is also a Markov chain with state space S Z d {0, 1, 2,...} { 0, 1, 2,...}. Let P denote the its transition matrix. The state space S is partitioned into two disjoint subsets: S = D D, where D = {x, y 0,y 1 ) S : y 0 =0, or y 1 =0}. Let the function p B x, y 0 ) denote the probability that a limit buy order placed at j 0 is executed before best ask moves to J 1. This probability is computed in Section 3.2. If the sell order is executed first so y 1 = 0, then there will be a positive income of π j1. The expected expense in state x, y 0,y 1 ) for buying one unit is p B x, y 0 )π j0 + 1 p B x, y 0 ))π J1. Similarly, let the function p A x, y 1 ) denote the probability that a limit sell order placed at j 1 is executed before best bid moves to J 0. This can be computed in a similar manner. If the buy order is executed first so y 0 = 0, then this will result in an expense of π j0. The expected income in state x, y 0,y 1 ) for selling one unit is p A x, y 1 )π j1 + 1 p A x, y 1 ))π J0. The above agument leads us to define the terminal cost function v T : D R by { v T x, y 0,y 1 π j 1 p )= B x, y 0 )π j0 1 p B x, y 0 ))π J1 for y 1 =0 p A x, y 1 )π j1 + 1 p A x, y 1 ))π J0 π j0 for y 0 =0. Let τ denote the first time X n,yn 0,Yn 1 ) hits D. defined by The potential φ =φs)) s S φs) =E[v T X τ,y 0 τ,y 1 τ )I{τ < } X 0,Y 0 0,Y 1 0 )=s], is precisely the expected payoff of this strategy. It is a solution to 2) with v C = 0. 4. Optimal strategies and Markov decision processes The framework laid out in Section 3 is too restrictive for many purposes, as it does not allow the agent to change the initial position. In this section it will be demonstrated how Markov decision theory can be used to design and analyze more flexible trading strategies. The general results on Markov decision processes are given in Section 4.1 and the applications to buy-one-unit strategies and strategies for making the spread are explained in the following sections. The general results that are of greatest relevance to the applications are the last statement of Theorem 4.3 and Theorem 4.5, which lead to Algorithm 4.1. 4.1. Results for Markov decision processes. First the general setup will be described. We refer to [12] for a brief introduction to Markov decision processes and [6] or [9] for more details. Let X n ) n=0 be a Markov chain in discrete time on a countable state space S with transition matrix P. Let A be a finite set of possible actions. Every action can be classified as either a continuation action or a termination action. The set of continutation actions is denoted C and the set of termination actions T. Then A = C T where C and T are disjoint. When a termination action is selected the Markov chain is terminated. Every action is not available in every state of the chain. Let A : S 2 A be a function associating a non-empty set of actions As) to each state s S. Here 2 A is the power set consisting of all subsets of A. The set of continuation actions

10 H. HULT AND J. KIESSLING available in state s is denoted Cs) =As) C and the set of termination actions Ts) =As) T. For each s S and a Cs) the transition probability from s to s when selecting action a is denoted P ss a). For every action there are associated costs. The cost of continuation is denoted v C s, a), it can be non-zero only when a Cs). The cost of termination is denoted v T s, a), it can be non-zero only when a Ts). It is assumed that both v C and v T are non-negative and bounded. A policy α =α 0,α 1,... ) is a sequence of functions: α n : S n+1 A such that α n s 0,..., s n ) As n ) for each n 0 and s 0,..., s n ) S n+1. If after n transitions the Markov chain has visited X 0,..., X n ), then α n X 0,..., X n ) is the action to take when following policy α. In the sequel we often encounter policies where the nth decision α n is defined as a function S k+1 A for some 0 k n. In that case the corresponding function from S n+1 to A is understood as function of the last k + 1 coordinates; s 0,..., s n ) α n s n k,..., s n ). The expected total cost starting in X 0 = s and following a policy α until termination is denoted by V s, α). In the applications to come it could be intepreted as the expected buy price. The purpose of Markov decision theory is to analyze optimal policies and optimal minimal) expected costs. A policy α is called optimal if, for all states s S and policies α, V s, α ) V s, α). The optimal expected cost V is defined by V s) = inf V s, α). α Clearly, if an optimal policy α exists, then V s) =V s, α ). It is proved in Theorem 4.3 below that, if all policies terminate in finite time with probability 1, an optimal policy α exists and the optimal expected cost is the unique solution to a Bellman equation. Furthermore, the optimal policy α is stationary. A stationary policy is a policy that does not change with time. That is α =α,α,... ), with α : S A, where α denotes both the policy as well as each individual decision function. The termination time τ α of a policy α is the first time an action is taken from the termination set. That is, τ α = inf{n 0:α n X 0,..., X n ) TX n )}. The expected total cost V s, α) is given by V s, α) =E [ τ α 1 v C X n,α n X 0,..., X n )) n=0 + v T X τα,α τα X 0,..., X τα )) ] X0 = s. Given a policy α =α 0,α 1,... ) and a state s S let θ s α be the shifted policy θ s α = α 0,α 1,... ), where α n : S n A with α ns 0,..., s n 1 )=α n s, s 0,..., s n 1 ). Lemma 4.1. The expected total cost of a policy α satisfies V s, α) =I{α 0 s) Cs)} v C s, α 0 s)) + P ss α 0 s))v s,θ s α 0 s)) + I{α 0 s) Ts)a}v T s, α 0 s)). 3) )

ALGORITHMIC TRADING WITH MARKOV CHAINS 11 Proof. The claim follows from a straightforward calculation: V s, α) = [ τ α 1 v C s, a)+e v C X n,α n X 0,..., X n )) a Cs) n=1 + v T X τα,α τα X 0,..., X τα )) ]) X0 = s I{α 0 s) =a} + v T s, α 0 s))i{α 0 s) =a} = a Ts) a Cs) + = a Ts) a Cs) + a Ts) v C s, a)+e[v X 1,θ s α) ) X0 = s] I{α 0 s) =a} v T s, α 0 s))i{α 0 s) =a} v C s, a)+ ) P ss a)v s,θ s α) I{α 0 s) =a} v T s, a)i{α 0 s) =a}. A central role is played by the function V n ; the minimal incurred cost before time n with termination at n. It is defined recursively as V 0 s) = min v T s, a), a Ts) V n+1 s) = min min a Cs) v Cs, a)+ ) P ss a)v n s ), min T s, a) a Ts), 4) for n 0. It follows by induction that V n+1 s) V n s) for each s S. To see this, note first that V 1 s) V 0 s) for each s S. Suppose V n s) V n 1 s) for each s S. Then V n+1 s) min min a Cs) v Cs, a)+ ) P ss a)v n 1 s ), min v T s, a) = V n s), a Ts) which proves the induction step. For each s S the sequence V n s)) n 0 is non-increasing and bounded below by 0, hence convergent. Let V s) denote its limit. Lemma 4.2. V satisfies the Bellman equation V s) = min min v Cs, a)+ ) P ss a)v s ), min v T s, a). 5) a Cs) a Ts) Proof. Follows by taking limits. Indeed, V s) = lim V n+1 s) n = min min v Cs, a) + lim a Cs) n = min min v Cs, a)+ a Cs) ) P ss a)v n s ), min v T s, a) a Ts) P ss a)v s ), min a Ts) v T s, a) ), where the last step follows by monotone convergence.

12 H. HULT AND J. KIESSLING The following theorem states that there is a collection of policies ℵ for which V is optimal. Furthermore, if all policies belong to ℵ, which is quite natural in applications, then V is in fact the expected cost of a stationary policy α. Theorem 4.3. Let ℵ be the collection of policies α that terminate in finite time, i.e. P[τ α < X 0 = s] = 1 for each s S. Let α =α,α,... ) be a stationary policy where α s) is a minimizer to a min The following statements hold. min a Cs) v Cs, a)+ P ss a)v s ), min a Ts) v T s, a) ). 6) a) For each α ℵ, V s, α) V s). b) V is the optimal expected cost for ℵ. That is, V = inf α ℵ V s, α). c) If α ℵ, then V s) =V s, α ). d) Suppose that W is a solution to the Bellman equation 5) and let α w denote the minimizer of 6) with V replaced by W. If α w,α ℵthen W = V. In particular, if all policies belong to ℵ, then V is the unique solution to the Bellman equation 5). Moreover, V is the optimal expected cost and is attained by the stationary policy α. Remark 4.4. It is quite natural that all policies belong to ℵ. For instance, suppose that P ss a) =P ss does not depend on a A and the set {s : Cs) = } is nonempty. Then the chain will terminate as soon as it hits this set. It follows that all policies belong to ℵ if P[τ < X 0 = s] = 1 for each s S, where τ its first hitting time of {s : Cs) = }. Proof. a) Take α ℵ. Let T n α =α 0,α 1,..., α n 1,α T ) be the policy α terminated at n. Here α T s) = min a Ts) v T s, a) is an optimal termination action. That is, the policy T n α follows α until time n 1 and then terminates. In particular, P[τ Tnα n X 0 = s] = 1 for each s S. We claim that i) V s, T n α) V n s) for each policy α and each s S, and ii) lim n V s, T n α)=v s, α). Then a) follows since V s, α) = lim n V s, T n α) lim n V n s) =V s). Statement i) follows by induction. First note that V s, T 0 α) = min a Ts) v T s, a) =V 0 s). Suppose V s, T n α) V n s) for each policy α and each s S. Then V s, T n+1 α)= v T s, a)i{α 0 s) =a} a Ts) + a Cs) v C s, a)+ ) P ss a)v s,θ s T n+1 α) I{α 0 s) =a}. Since θ s T n+1 α =α 1,..., α n,α T )=T n θ s α it follows by the induction hypothesis that V s,θ s T n+1 α) V n s ). The expression in the last display is then greater or

ALGORITHMIC TRADING WITH MARKOV CHAINS 13 equal to a Ts) v T s, a)i{α 0 s) =a} + min a Cs) min a Cs) v Cs, a)+ Proof of ii). Note that one can write and V s, α) =E V s, T n α)=e [ τ α 1 t=1 v C s, a)+ ) P ss a)v n s ) I{α 0 s) =a} ) P ss a)v n s ), min v T s, a) = V n+1 s). a Ts) v C X t,α t X 0,..., X t )) ] X 0 = s [ + E v T X τα,α τα X 0,..., X τα )) ] X0 = s, [ τ α n 1 t=1 From monotone convergence it follows that E [ τ α 1 t=1 [ = E v C X t,α t X 0,..., X t )) ] X 0 = s [ + E v T X τα n,α τα nx 0,..., X ] τα n)) X0 = s. v C X t,α t X 0,..., X t )) ] X0 = s lim n τ α n 1 t=1 t=1 v C X t,α t X 0,..., X t )) ] X0 = s [ τα n 1 = lim E v C X t,α t X 0,..., X t )) ] X0 = s. n Let C 0, ) be an upper bound to v T. It follows that [ E v T X τα,α τα ) v T X τα n,α τα nx 0,..., X τα n)) X0 = s] 2CE[I{τ α >n} X 0 = s] 2CP[τ α >n X 0 = s]. By assumption α ℵso P[τ α n X 0 = s] 0 as n, for each s S. This shows that lim n V s, T n α)=v s, α), as claimed. b) Note that by a) inf α ℵ V s, α) V s). From the first part of Theorem 4.5 below it follows that there is a sequence of policies denoted α 0:n with V s, α 0:n )= V n s). Since V n s) V s) it follows that inf α ℵ V s, α) V s). This proves b). c) Take s S. Suppose first α s) Ts). Then V s) = min a Ts) v T s, a) =V s, α s)). It follows that V s) =V s, α ) for each s {s : α s) Ts)}. Take another s S such that α s) Cs). Then V s) =v C s, α s)) + P ss α s))v s ),

14 H. HULT AND J. KIESSLING and V s, α )=v C s, α s)) + P ss α s))v s,α ). It follows that V s) V s, α ) P ss α s)) V s ) V s,α ) = s :α s ) Cs ) P ss α s)) V s ) V s,α ) = E[ V X 1 ) V X 1,α ) I{τ α > 1} X 0 = s] = E[E[ V X 2 ) V X 2,α ) I{τ α > 2} X 1 ] X 0 = s]... E[ V X n ) V X n,α ) I{τ α >n} X 0 = s] 2CP[τ α >n X 0 = s], where n 1 is arbitrary. Since P[τ α < X 0 = s] = 1 the last expression converges to 0 as n. This completes the proof of c). Finally to prove d), let W be a solution to 5). That is, W satisfies W s) = min min v Cs, a)+ ) P ss a)w s ), min v T s, a). a Cs) a Ts) Proceeding as in the proof of c) it follows directly that W s) =V s, α w ). By a) it follows that W s) V s). Consider the termination regions {s : α w s) Ts)} and {s : α s) Ts)} of α w and α. Since W s) V s), and both are solutions to 5) it follows that {s : α s) Ts)} { s : α w s) Ts)}, and V s) = min a Ts) v T s, a) =W s) on {s : α s) Ts)}. To show equality for all s S it remain to consider the continuation region of α. Take s {s : α s) Cs)}. As in the proof of c) one writes W s) min v Cs, a)+ P ss a)w s ) a Cs) = min v Cs, a)+ P ss a)w s ) V s )) + P ss a)v s ) a Cs) = V s)+ P ss a)w s ) V s )) = V s)+ s :α s ) Cs ) P ss a)w s ) V s )) [ = V s)+e W X 1 ) V X 1 ))I{τ α > 1} ] X0 = s [ = V s)+e W X n ) V X n ))I{τ α >n} ] X0 = s. Since E[W X n ) V X n ))I{τ α >n} X 0 = s] 0 as n 0 it follows that W s) V s) on {s : α s) Cs)}. This implies W s) =V s) for all s S and the proof is complete.

ALGORITHMIC TRADING WITH MARKOV CHAINS 15 In practice the optimal expected total cost V may be difficult to find, and hence also the policy α that attains V. However, it is easy to come close. Since V n s) converges to V s) a close to optimal policy is obtained by finding one that attains the expected cost at most V n s) for large n. For s S. Let α 0 s) be a minimizer of a v T s, a) and for n 1, α n s) is a minimizer of a min min a Cs) v Cs, a)+ ) P ss a)v n 1 s ), min T s, a) a Ts). 7) Theorem 4.5. The policy α n:0 =α n,α n 1,..., α 0 ) has expected total cost given by V s, α n:0 )=V n s). Moreover, if the stationary policy α n =α n,α n,... ) satisfies α n ℵ, then the expected total cost of α n satisfies V n s) V s, α n ) V s). Proof. Note that α 0 is a termination action and V s, α 0 )=V 0 s). The first claim then follows by induction. Suppose V s, α n:0 )=V n s). Then V s, α n+1:0 )= v C s, a)+ ) P ss a)v s,α n:0 ) I{α n+1 s) =a} a Cs) + = a Ts) a Cs) + a Ts) = min = V n+1 s), v T s, a)i{α n+1 s) =a} v C s, a)+ ) P ss a)v n s ) I{α n+1 s) =a} v T s, a)i{α n+1 s) =a} min a Cs) v Cs, a)+ ) P ss a)v n s ), min v T s, a), a Ts) and the induction proceeds. The proof of the second statement proceeds as follows. For n 0 and k 0 let αn:0 k =α n,..., α } {{ n,α } n 1,..., α 0 ). k times Then αn:0 0 = α n 1:0. By induction it follows that V s, αn:0) k V s, αn:0 k+1 ). Indeed, note first that V s, α 0 n:0) V s, α 1 n:0) =V n 1 s) V n s) 0. Suppose V s, α k 1 n:0 ) V s, αk n:0) 0. If s is such that α n s) Ts), then If α n s) Cs), then V s, α k n:0) V s, α k+1 n:0 )=v T s, α n s)) v T s, α n s)) = 0. V s, α k n:0) V s, α k+1 n:0 )= P ss α n s)) ) V s,αn:0 k 1 ) V s,αn:0) k 0. This completes the induction step and the induction proceeds. Since α n ℵit follows that V s, α n ) = lim k V s, α k n:0). Indeed, V s, α n ) V s, α k n:0) CPτ αn >k) 0,

16 H. HULT AND J. KIESSLING as k. Finally, by Theorem 4.3, V s) V s, α n ) = lim k V s, α k n:0) V s, α 1 n:0) =V n s), and the proof is complete. From the above discussion it is clear that the stationary policy α n converges to an optimal policy and that V n provides an upper bound for the final expected cost following this strategy. In light of the above discussion it is clear that Algorithm 4.1 in the limit determines the optimal cost and an optimal policy. Algorithm 4.1 Optimal trading strategies Input: Tolerance TOL, transition matrix P, state space S, continuation actions C, termination actions T, continuation cost v C, termination cost v T. Output: Upper bound V n of optimal cost and almost optimal policy α n. Let V 0 s) = min a Ts) v T s, a), for s S. Let n = 1 and d>tol. while d>tol do Put V n s) = min min v Cs, a)+ ) P ss a)v n 1 s ), min v T s, a), a Cs) a Ts) and d = max s S V n 1s) V n s), for s S n = n +1. end while Define α : S C T as a minimizer to min min v Cs, a)+ ) P ss a)v n 1 s ), min v T s, a). a Cs) a Ts) Algorithm 4.1 is an example of a value iteration algorithm. There are other methods that can be used to solve Markov decision problems, such as policy iteration algorithms. See Chapter 3 in [16] for an interesting discussion on algorithms and Markov decision theory. Typically value iteration algorithms are well suited to solve Markov decision problems when the state space of the Markov chain is large. 4.2. The keep-or-cancel strategy for buying one unit. In this section a buyone-unit strategy is considered. It is similar to the buy strategy outlined in Section 3.3 except that the agent has the additional optionality of early cancellation and submission of a market order. Only the jump-chain of X t ) is considered. Recall that the jump chain, denoted X n ) n=0, is a discrete Markov chain where X n is the state of the order book after n transitions. Suppose the initial state is X 0. An agent wants to buy one unit and places a limit buy order at level j 0 <j A X 0 ). After each market transition, the agent has two choices. Either to keep the limit order or to cancel it and submit a market buy order at the best available ask level j A X n ). It is assumed that the cancellation

ALGORITHMIC TRADING WITH MARKOV CHAINS 17 and submission of the market order is processed instantaneously. It will also be assumed that the agent has decided upon a maximum price level J > j A X 0 ). If the agent s limit buy order has not been processed and j A X n )=J, then the agent will immediately cancel the buy order and place a market buy order at level J. It will be implicitly assumed that there always are limit sell orders available at level J. Buying at level J can be thought of as a stop-loss. From a theoretical point of view assuming an upper bound J for the price level is not a serious restriction as it can be chosen very high. From a practical point of view, though, it is convenient not to take J very high because it will significantly slow down the numerical computation of the solution. This deficiency may be compensated by defining π J appropriately large, say larger than π J 1 plus one price tick. Recall the Markov chain X n,y n ) defined in Section 3.2. Here X n represents the order book after n transitions and Y n is negative with Y n being the number of quotes in front and including the agent s order, at level j 0. The state space in this case is S Z d {..., 2, 1, 0}. Let s =x, y) S. Suppose y<0 and j A x) <Jso the agent s order has not been executed and the stop-loss has not been reached. Then there is one continuation action Cs) ={0} representing waiting for the next market transition. The continuation cost v C s) is always 0. If j A x) =J stop-loss is hit) or y =0 limit order executed) then Cs) = so it is only possible to terminate. There are are two termination actions T = { 2, 1}. If y<0 the only termination action available is 1 T, representing cancellation of the limit order and submission of a market order at the ask price. If y = 0 the Markov chain always terminates since the limit order has been executed. This action is represented by 2 T. The termination costs are v T s, 1) = π j Ax) v T s, 2) = π j0. The expected total cost may, in this case, be interpreted as the expected buy price. In a state s =x, y) with j A x) <Jfollowing a stationary policy α = α, α,... ) it is given by see Lemma 4.1) s P ss V s,α) for αs) = 0, V s, α) = π j Ax) for αs) = 1, 8) π j0 for αs) = 2. When s =x, y) is such that j A x) =J, then V s, α) =π J. It follows immediately that π j0 V s, α) π J, for all s S and all policies α. The motivation of the expression 8) for the expected buy price is as follows. If the limit order is not processed, so y<0, there is no cost of waiting. This is the case αs) = 0. The cost of cancelling and placing the market buy order is π j Ax) ; the current best ask price. When the limit order is processed y = 0 the incurred cost is π j0 ; the price level of the limit order. The collection ℵ of policies with P[τ α < X 0 = s] = 1 for each s S are the only reasonable policies. It does not seem desirable to risk having to wait an infinite amount of time to buy one unit.

18 H. HULT AND J. KIESSLING By Theorem 4.3 an optimal keep-or-cancel strategy for buying one unit is the stationary policy α, with expected buy price V satisfying, see Lemma 4.2, ) V s) = min min P ss V s ), min v T s) a Ts) = a Cs) min P ss V s ),π j Ax) ), for j A x) < J, y < 0, π J, for j A x) =J, y < 0, π j0, for y =0. The stationary policy α n in Theorem 4.5 provides a useful numerical approximation of an optimal policy, and V n s) in 4) provides an upper bound of the expected buy price. Both V n and α n can be computed by Algorithm 4.1. 4.3. The ultimate buy-one-unit strategy. In this section the keep-or-cancel strategy considered above is extended so that the agent may at any time cancel and replace the limit order. Suppose the initial state of the order book is X 0. An agent wants to buy one unit. After n transitions of the order book, if the agent s limit order is located at a level j, then j n = j represents the level of the limit order, and Y n represents the outstanding orders in front of and including the agent s order at level j n. This defines the discrete Markov chain X n,y n,j n ). It will be assumed that the agent has decided upon a best price level J 0 and a worst price level J 1 where J 0 <j A X 0 ) <J 1. The agent is willing to buy at level J 0 and will not place limit orders at levels lower than J 0. The level J 1 is the worst case buy price or stop-loss. If j A X n )=J 1 the agent is committed to cancel the limit buy order immediately and place a market order at level J 1. It will be assumed that it is always possible to buy at level J 1. The state space in this case is S Z d {..., 2, 1, 0} { J 0,..., J 1 1}. The set of possible actions depend on the current state x, y, j). In each state where y<0 the agent has three options: 1) Do nothing and wait for a market transition. 2) Cancel the limit order and place a market buy order at the best ask level j A x). 3) Cancel the existing limit buy order and place a new limit buy order at any level j with J 0 j <j A x). This action results in the transition to j n = j, X n = x + e j e j and Y n = x j 1. In a given state s =x, y, j) with y<0 and j A x) <J the set of continuation actions is Cx, y, j) ={0,J 0,..., j A x) 1}, Here a = 0 represents the agent being inactive and awaits the next market transition and the actions j, where J 0 j <j A x), corresponds to cancelling the outstanding order and submitting a new limit buy order at level j. The cost of continuation is always 0, v C s, 0) = v C s, j ) = 0. If y = 0 or j A x) =J 1, then Cs) = and only termination is possible. As in the keep-or-cancel strategy there are are two termination actions T = { 2, 1}. If y<0 the only termination action available is 1, representing cancellation of the limit order and submission of a market order at the ask price. If

ALGORITHMIC TRADING WITH MARKOV CHAINS 19 y = 0 the Markov chain always terminates since the limit order has been executed. This action is represented by 2. The expected buy price V s, α) from a state s =x, y, j) with j A x) <J following a stationary policy α =α, α,... ) is s P ss V s,α) for αs) = 0, V s j,α), for αs) =j, J 0 j <j A x), V s, α) = π j Ax) for αs) = 1, π j BX 0) for αs) = 2. In the second line s j refers to the state x,y,j ) where x = x+e j e j, y = x j 1. If s =x, y, j) with j A x) =J 1, then V s, α) =π J1. Since the agent is committed to buy at level J 0 and it is assumed that it is always possible to buy at level J 1 it follows immediately that π J0 V s, α) π J1, for all s S and all policies α. By Theorem 4.3 an optimal buy strategy is the stationary policy α, with expected buy price V satisfying, see Lemma 4.2, V s) = min min a Cs) P ss V s ), min a Ts) v T s, a) which implies that ) V s) = min P ss V s ),V s J0 ),..., V s ja x) 1),π jax), for j A x) <J 1, y<0, and { π V s) = J, for j A x) =J 1, y < 0, π j, for y =0. The stationary policy α n in Theorem 4.5 provides a useful numerical approximation of an optimal policy, and V n s) in 4) provides an upper bound of its expected buy price. Both α n and V n can be computed by Algorithm 4.1. 4.4. Making the spread. In this section a strategy aimed at earning the difference between the bid and the ask price, the spread, is considered. An agent submits two limit orders, one buy and one sell. In case both are executed the profit is the price difference between the two orders. For simplicity it is assumed at first that before one of the orders has been executed the agent only has two options after each market transition: cancel both orders or wait until next market transition. The extension which allows for cancellation and resubmission of both limit orders with new limits is presented at the end of this section. Suppose X 0 is the initial state of the order book. The agent places the limit buy order at level j 0 and the limit sell order at level j 1 >j 0. The orders are placed instantaneously and after the orders are placed the state of the order book is X 0 e j0 + e j1. Consider the extended Markov chain X n,yn 0,Yn 1,jn,j 0 n). 1 Here X n represents the order book after n transitions and Yn 0 and Yn 1 represent the limit buy negative) and sell positive) orders at levels jn 0 and jn 1 that are in front of and including the agent s orders, respectively. It follows that Y 0 0 = X j0 n 0 1 and Y 1 0 = X j1 n 0 + 1, where )

20 H. HULT AND J. KIESSLING Y 0 n is non-decreasing and Y 1 n is non-increasing. The agent s buy sell) order has been processed when Y 0 n = 0 Y 1 n = 0). Suppose the agent has decided on a best buy level J B0 <j A X 0 ) and a worst buy level J B1 >j A X 0 ). The agent will never place a limit buy order at a level lower than J B0 and will not buy at a level higher than J B1, and it is assumed to always be possible to buy at level J B1. Similarly, the agent has decided on a best sell price J A1 >j B X 0 ) and a worst sell price J A0 <j B X 0 ). The agent will never place a limit sell order at a level higher than J A1 and will not sell at a level lower than J A0, and it is assumed to always be possible to sell at level J A0. The state space of this Markov chain is S Z d {..., 2, 1, 0} { 0, 1, 2,... } { J B0,..., J B1 1 } {J A0+1,..., J A1 }. The possible actions are: 1) Before any of the orders has been processed the agent can wait for the next market transition or cancel both orders. 2) When one of the orders has been processed, say the sell order, the agent has an outstanding limit buy order. Then the agent proceeds according to the ultimate buy-one-unit strategy presented in Section 4.3. Given a state s =x, y 0,y 1,j 0,j 1 ) of the Markov chain the optimal value function V is interpreted as the optimal expected payoff. Note, that for making-the-spread strategies it is more natural to have V as a payoff than as a cost and this is how it will be interpreted. The general results in Section 4.1 still hold since the value functions are bounded from below and above. The optimal expected payoff can be computed as follows. Let V B x, y, j) denote the optimal minimal) expected buy price in state x, y, j) for buying one unit, with best buy level J B0 and worst buy level J B1. Similarly, V A x, y, j) denotes the optimal maximal) expected sell price in state x, y, j) for selling one unit, with best sell level J A1 and worst sell level J A0. The optimal expected payoff is then given by V s) = { } max P ss V s ), 0, for y 0 < 0,y 1 > 0, π j1 V x, B y 0,j 0 ), for y 1 =0,y 0 < 0, V x, A y 1,j 1 ) π j0, for y 0 =0,y 1 > 0. The term P ss V s ) is the value of waiting and 0 is the value of cancelling both orders. In the extended version of the making-the-spread strategy it is also possible to replace the two limit orders before the first has been executed. Then the possible actions are as follows. 1) Before any of the orders has been processed the agent can wait for the next market transition, cancel both orders or cancel both orders and resubmit at new levels k 0 and k 1. 2) When one of the orders have been processed, say the sell order, the agent has an outstanding limit buy order. Then the agent proceeds according to the ultimate buy-one-unit strategy presented in Section 4.3.