Learning Exercise Policies for American Options
|
|
|
- Silvia Armstrong
- 9 years ago
- Views:
Transcription
1 Yuxi Li Dept. of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 Csaba Szepesvari Dept. of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 Dale Schuurmans Dept. of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 Abstract Options are important instruments in modern finance. In this paper, we investigate reinforcement learning (RL) methods in particular, least-squares policy iteration (LSPI) for the problem of learning exercise policies for American options. We develop finite-time bounds on the performance of the policy obtained with LSPI and compare LSPI and the fitted Q-iteration algorithm (FQI) with the Longstaff-Schwartz method (LSM), the standard least-squares Monte Carlo algorithm from the finance community. Our empirical results show that the exercise policies discovered by LSPI and FQI gain larger payoffs than those discovered by LSM, on both real and synthetic data. Furthermore, we find that for all methods the policies learned from real data generally gain similar payoffs to the policies learned from simulated data. Our work shows that solution methods developed in machine learning can advance the state-of-the-art in an important and challenging application area, while demonstrating that computational finance remains a promising area for future applications of machine learning methods. 1 Introduction A call (put) option gives the holder the right, not the obligation, to buy (respectively, sell) an underlying asset for example, a share of a stock by a certain date (the maturity date) for a certain price (the strike Appearing in Proceedings of the 12 th International Conference on Artificial Intelligence and Statistics (AISTATS) 2009, Clearwater Beach, Florida, USA. Volume 5 of JMLR: W&CP 5. Copyright 2009 by the authors. price). An American option can be exercised at any time up to the maturity date. A challenging problem that arises in connection to trading options is to determine the option s fair price, called option pricing. This problem can be solved by finding an optimal exercise policy, i.e., a feedback rule that determines when to exercise the option given a strike price so as to maximize expected total discounted return. This is an optimal stopping problem that belongs to the general class of Markov Decision Processes (MDP) (Puterman 1994; Bertsekas 1995). In theory, assuming perfect knowledge of the MDP, dynamic programming can be used to find the optimal policy (Bertsekas and Tsitsiklis 1996). When the size of an MDP is large, one encounters the curse of dimensionality, which calls for effective approximate, sampling based techniques. In cases when the MDP is unknown and no simulator is available, only sample trajectories can be used to infer a good policy. In this paper we will look into reinforcement learning (RL) methods (Bertsekas and Tsitsiklis 1996; Sutton and Barto 1998) that are able to address these issues and study their suitability for learning exercise policies for option pricing. This paper makes two contributions: On the one hand we derive a high-probability, finite-time bound on the performance of least-squares policy iteration (LSPI) in Lagoudakis and Parr (2003) for option pricing following the work of Antos et al. (2008). This bound complements the results of Tsitsiklis and Van Roy (2001), who showed that it is crucial to use the distribution of price trajectories in order to avoid an exponential blow-up of the approximation error with the timehorizon of the problem when considering one version of fitted Q-iteration. Here we also find that it is crucial to use this distribution and demonstrate a very mild dependence on the horizon. In comparison with the asymptotic results of Tsitsiklis and Van Roy (2001), our results make both the dependence on the approximation and estimation errors (due to the finiteness of the sample) explicit. 352
2 The second contribution is an empirical comparison of LSPI, fitted Q-iteration (FQI) as proposed under the name of approximate value iteration by Tsitsiklis and Van Roy (2001) and the Longstaff-Schwartz method (LSM) (Longstaff and Schwartz 2001), the latter of which is a standard approach from the finance literature for pricing American options. Our results show that the RL techniques are superior to the standard approach: Exercise policies discovered by LSPI and FQI can achieve larger payoffs than those found by LSM. Furthermore, for LSPI, FQI and LSM, policies discovered based on sample paths composed directly from real data gain similar payoffs when tested on separate data to policies discovered based on sample paths generated by simulation models whose model parameters were estimated from the same dataset. There is a vast literature on option pricing. For example, Hull (2006) provides an introduction to options and other financial derivatives and their pricing methods; Broadie and Detemple (2004) survey option pricing methods; and Glasserman (2004) provides a book length treatment for Monte Carlo methods. However, most previous work focuses on obtaining the option price, but not the exercise policy. The remainder of this paper is organized as follows. First, we introduce basic concepts and notations to describe MDPs. Next, we present the option pricing problem, followed by the presentation of the algorithms, LSPI, FQI and LSM. We then develop finitetime bounds on the performance of the policy obtained with LSPI. After this we study empirically the performance of the algorithms, followed by our conclusions. 2 Markov decision processes The problem of sequential decision making is common in economics, science and engineering. Many of these problems can be modeled as MDPs. A (discounted) MDP is defined by the 5-tuple (S, A, P, r, γ): S is a set of states; A is a set of actions; P is the transition probability kernel with P(. s, a) specifying the probability of next states when the process is at state s and action a is taken; r is a reward function, with r(s s, a) being the reward received when transitioning to state s starting from state s and taking action a; and γ (0, 1) is a discount factor. A (stationary) policy π is a rule for selecting actions based on the last state visited: π(s, a) specifies the probability of selecting action a in state s while following policy π. We shall also use the notation π(a s) to denote these quantities. When π(a s) = 1 for some action a in each state s, policy π is called deterministic. In this case we identify it with a mapping from the states to the action set. An optimal policy maximizes the total, expected, discounted rewards obtained over the long run, for example, t=0 γt r t, where r t is the reward obtained in the tth time step. The action-value function of a policy π, Q π (s, a) = E [ t=0 γt r t s 0 = s, a 0 = a], assigns to the state-action pair (s, a) the expected, discounted, total reward achieved when the initial state is s, the first action is a after which policy π is followed and where the expectation is taken with respect to the measure generated by π and P. The optimal action-value function satisfies Q (s, a) = sup π Q π (s, a). A policy that in every state s chooses an action maximizing Q (s, a) is optimal in the sense that for any state no other policy can achieve a higher expected, discounted, total reward. Q can be found by either policy iteration or value iteration, at least in finite, known MDPs (Bertsekas and Tsitsiklis 1996). Value-function based reinforcement learning techniques mimic these basic algorithms to learn Q (or Q π ) but perform the calculations based on sampled trajectories instead of the exact calculations and employ function approximation to allow generalization to unseen states. One function approximation method that is both simple, yet powerful is to use a linear architecture, i.e., a linear combination of some basis functions: Q(s, a; w) = d i=1 ϕ i(s, a)w i, where d is the number of basis functions, ϕ i (, ) is the ith basis function and w i R is its weight, w = (w 1,..., w d ) T. If we define ϕ(s, a) = (ϕ 1 (s, a),..., ϕ d (s, a)) T then we can write Q(s, a; w) = w T ϕ(s, a). 3 Learning exercise policies for American options The problem of learning an exercise policy (in discrete time) can be formulated as an MDP, more specifically as a finite-horizon optimal stopping problem as follows: 1 Let T denote the number of stages which will be indexed by t = 0, 1, 2,..., T 1. We assume that the stock price S t at time t underlying the option evolves according to a Markov process: S t+1 P t ( S t ) (t = 0, 1,..., T 2). Here S t S t, S 0 P 0 ( ), where P 0 ( ) is a stochastic kernel over S 0, and for S t S t, P t ( S t ) is a stochastic kernel over S t+1. Let S = T 1 t=0 S t {t} and S = {e} S, where e is a special state, called the exit state. Any f with domain S can also be viewed as a sequence (f 0,..., f T 1 ) of functions, where the domain of f t is S t, and f t (S t ) = f((s t, t)). This equivalent representation will be used when convenient. The exercise policy is a mapping π from S to the two- 1 We discretize the time, thus, strictly speaking, the options become Bermudan. As the number of time steps approaches infinity, the solution approaches that for an American option. 353
3 Li, Szepesvari, Schuurmans element action set, A = {0, 1}. (For simplicity we deal with deterministic, Markov policies only as this class of policies is sufficiently large to contain an optimal policy.) Policy π = (π 0,..., π T 1 ) determines when to exercise the option: Exercising happens when π t (S t ) = 1. In this case the next state becomes the exit state e, which is an absorbing state. Otherwise, the next state is obtained from the stochastic kernel P t ( S t ). The rewards, r : S A R are determined as follows: When the option is exercised at time step t in state s S t, the reward is r((s, t), 1) = g(s). In all other cases the reward is zero. For an American stock put options, the reward is g(s) = max(0, K s), where K > 0 is the strike price. The returns are discounted with a discount factor γ (0, 1). 4 Algorithms We present the algorithms, LSPI, FQI and LSM. 4.1 LSPI for learning exercise policies Policy iteration is a method of discovering an optimal solution for an MDP. LSPI combines the data efficiency of the least squares temporal difference (LSTD) method (Bradtke and Barto 1996) and the policy search efficiency of policy iteration. The algorithm uses a linear architecture for representing action-value functions, as described in Section 2. The linear architecture is initialized by choosing w 0 arbitrarily. In iteration i 0, we are given w (i), which implicitly defines π (i), the policy that selects in each state s the action a that maximizes Q (i) (s, a) = (w (i) ) T ϕ(s, a). Then LSTD is used to compute an approximation to the action-value function of this policy. This is done as follows: The data is assumed to be a trajectory (or a set of trajectories) of the form s 1, a 1, R 1, s 2, a 2, R 2,.... Given the data, one computes A (i) = t ϕ(s t, a t )(ϕ(s t, a t ) γϕ(s t+1, π (i) (s t+1 ))) T and b = t ϕ(s t, a t )R t (b can be computed at the beginning of the iterations) and then solves A (i) w (i+1) = b to obtain the next weight vector. In this way LSTD finds the solution of the TD(0) equilibrium equation t (R t + γq(s t+1, π (i) (s t+1 ); w) Q(s t, a t ; w))ϕ(s t, a t ) = 0. The iteration is typically continued until the changes in the weight vector become small. The boundedness property of LSPI with respect to the L -norm was established in Lagoudakis and Parr (2003). More recently, tighter finite-time bounds were obtained in Antos et al. (2008) for continuous state spaces who gave an interpretation of LSTD in terms of empirical loss minimization. Below we will derive a finite-time bound for our proposed algorithm using the techniques developed in Antos et al. (2008). We need to consider the peculiarities of option pricing when applying LSPI to it. Since this problem is an episodic, optimal stopping problem we must incorporate time as a component in the state space. The action space has two elements: exercise (stop) and continue. The state-action value function of exercising the option, that is, the intrinsic value of the option, can be calculated exactly. Hence, we only need to learn the state-action value function for continuation. Since the states in this case are pricetime pairs of the form (s, t), we let Q((s, t), 0; w) = w T ϕ(s, t) and let Q((s, t), 1; w) = g(s). Let us assume that the data consists of sample trajectories (S j t ; j = 1,..., N, t = 0,..., T 1), where Sj t is the value of the tth stock price S t in the j-th trajectory. Solving the TD(0) equilibrium equation for the weights of the redefined action-value function we get that w (i+1) is defined by A (i) w (i+1) = b (i), where A (i) = t,j ϕ(s(j) t, t)(ϕ(s (j) t, t) γi {π (i) (S (j) t+1 )=0}ϕ(S(j) t+1, t + 1)) T and b (i) = γ t,j ϕ(s(j) t, t)i {π (i) (S (j) t+1 )=1}g(S(j) t+1 ). Here we exploited that the immediate rewards are always zero. Further, I {L} denotes the indicator function that is 1 when the argument L is true, and otherwise it is zero. 4.2 FQI for learning exercise policies In Section 6 of their paper (Tsitsiklis and Van Roy 2001), the authors proposed the following algorithm: Let Q (i) ((s, t), 0) = (w (i) ) T ϕ(s, t) be the estimate of the value of the continuation action obtained in the ith step of the algorithm, where ϕ(s, t), w R d. Let A (i) = t,j ϕ(sj t, t)ϕ(s j t, t) T and b (i) = γ t,j ϕ(sj t, t) max(g(s j t+1 ), Q(i) ((S (j) t+1, t + 1), 0)). Then w (i+1) = (A (i) ) 1 b (i). In fact, w (i+1) is the solution of the least-squares problem t,j [Q((S(j) t, t), 0; w) γ max a A Q (i) ((S (j) t+1, t + 1), a)] 2 w min, where we used Q (i) ((s, t), 1) = g(s) for the sake of uniformity. Hence, this algorithm can be seen to implement a least-squares approach to fitted Q-iteration. 4.3 Least squares Monte Carlo LSM (Longstaff and Schwartz 2001) follows the backward-recursive dynamic programming approach: Let ϕ : R R d be the basis functions used. The continuation value at stage t is estimated using Q((s, t), 0; w) = w T ϕ(s), except for the last stage t = T 1, when we set Q((s, T 1), 0; w) = g(s). The continuation value at stage t < T 1 is then estimated by finding the minimizer w t of L t (w) = M j=1 [Q((s, t), 0; w) max(g(s j t+1 ), γq((sj t+1, t + 1); w t+1))] 2. Note that 354
4 this is an ordinary least squares problem in w and the method can indeed be interpreted as solving the backward-recursive dynamic programming equations in a recursive manner. While Longstaff and Schwartz (2001) obtained asymptotic results as the number of basis functions is increased, Tsitsiklis and Van Roy (2001) obtained (in the first part of their paper) asymptotic error bounds for this algorithm when the number of samples grows to infinity but the number of basis function is kept fixed. 5 Theoretical results In this section we develop finite-time bounds on the performance of the policy obtained with LSPI. Similar bounds can be developed for other algorithms with similar arguments, which is left as future work. Let Q max = R max /(1 γ), where R max is an upper bound on the reward function r. Let ν t denote the distribution of prices at stage t and let ν = (ν 0,..., ν T 1 ). We assume that the training sample consists of a number of independently sampled price trajectories (correlation between the subsequent trajectories is not considered for the sake of simplicity, but can be handled without any difficulties). Let N be the number of these trajectories. Hence, the total number of training samples is N T. The bound will depend on the so-called total inherent Bellman-error associated with the function space F = { h : S A R : h(s, 1) = r(s, 1), h(s, 0) = θ T ϕ(s), θ R d, h(e) = 0 }, where ϕ : S R d is the feature extraction method used in the procedure and e is the exit state of the process. Let π Q be the policy that is greedy w.r.t. a bounded action value function Q (in the case when multiple such policies exist, we choose one in a systematic way). Define and E (F) = sup E 1 (F) = Q F sup Q,Q F inf Q T π Q Q ν Q F inf Q T π Q Q ν, Q F where ν denotes the L 2 (ν) norm. In Antos et al. (2008) the first quantity is called the inherent Bellmanerror of F, while the second is called the inherent one-step Bellman-error. They characterize the suitability of the features for representing (i) the fixed points of policies that can be obtained in a policy iteration procedure (the set of accessible policies) and (ii) the one-step Bellman-image of the available functions under the Bellman operator of accessible policies. We let E(F) = (E 2 (F) + E2 1 (F))1/2. Following an argument in Munos and Szepesvári (2008) one can show that when the power of the feature set, ϕ, is increased the total inherent Bellman-error will converge to zero when the kernels P t ( S t ) depend smoothly on S t. In the result below we need S c = { (s, 0) : s S } S A. Our main result is as follows: Theorem 1 (Bound on the Performance of LSPI for Pricing). Execute LSPI with a training set of size n = NT. Consider the policy π M,n that is greedy policy with respect to the action-value function obtained by LSPI after M > 0 iterations. Then, for any 0 < δ < 1, with probability at least 1 δ, (Q Q πm,n ) Sc ν 2γ ( ) (1 γ) 2 E(F) + n + γ M/2 R max. Here n = C(Λ max(λ, 1)/n) 1/4, where Λ = d log(n/(1 γ)) + log(m/δ) + T and where C > 0 is a universal constant. The theorem follows by some adjustments to the proof of Theorem 4 of Antos et al. (2008) and some calculations where we estimate constants in the bound of Theorem 4. In the following, we explain the bound in the result. We give highlights of the proof in the Appendix. First, note that when n, (Q Q πm, ) Sc ν 2γ ( ) (1 γ) 2 E(F) + γ M/2 R max. Here we see the factor 2γ/(1 γ) 2 that is usual in results that use contraction arguments to bound the performance of a policy that is greedy with respect to an approximate value function (see Bertsekas and Tsitsiklis (1996)). The term E(F), as explained above, depends only on the features and bounds the approximation error. The two terms contributing to E(F) arise from rewriting the LSTD fixed point equation in terms of an empirical loss function that has two corresponding terms. The supremum over Q in the definition of these quantities arises because after the first iteration all we can know about the policy to be evaluated is that it is greedy w.r.t. some function in our chosen function set. The reason for the supremum over Q is similar in the definition of E1 2(F). Finally, γ M/2 R max bounds the error due to running the algorithm only for a finite number of iterations. We also see that the rate of convergence as a function of the number of samples is n 1/4, which, considering the squared-error, gives the familiar rate n 1/2. The nice property of this bound compared to the general bound of Antos et al. (2008) is that in their bound a so-called concentration coefficient (denoted there by 355
5 Li, Szepesvari, Schuurmans C ρ,ν ) appears in a multiplicative manner. This quantity depends on the MDP and is typically difficult to control. The role of this quantity is to bound the error resulting from changing measures, which arises since typically the distribution of the training samples is different from both the test distribution and the stationary distributions of the policies encountered during the iterations of the algorithm. Interestingly, in our case we are able to bound this constant by 1. This is because the special nature of stopping problems: Consider a stochastic policy π and the distribution µ π defined over the state-action pairs that is obtained for a given t by sampling the prices from their underlying distribution (not caring about if π could have stopped before) and then sampling the actions from π. Imagine that we follow the process for one step from µ π (thus the action probabilities come from π) and upon reaching the next state we generate actions at random from some other stochastic policy π. Let µ π,π be the resulting distribution. Then, since π either exits or stays in, i.e., it can only decrease the probability masses carried to the next stage, the marginal of µ π,π on the price-time pairs can be upper bounded by the corresponding marginal of µ π. Hence, if the distribution used to sample the prices and to evaluate the performance of the policy are the same, as in all the procedures described above, the concentration coefficient can be upper bounded by 1. Note that the result, just like the one for the LSM algorithm by Tsitsiklis and Van Roy (2001) bounds the L 2 error. However, our bound is for the value function of the policy, while the bound in Tsitsiklis and Van Roy (2001) is for the action-value function estimate. Further, the bound in Tsitsiklis and Van Roy (2001) considers only the approximation error (i.e., when n ) and the proof is considerably simpler given that the solutions in LSM are defined with a backward recursion. In fact, the bounding technique in Tsitsiklis and Van Roy (2001) can possibly used to bound E(F). 6 Empirical study We study empirically the performance of LSPI, FQI and LSM. We study the plain American put stock options. We focus on at-the-money options, that is, the strike price is equal to the initial stock price. For simplicity, we assume the risk-free interest rate r is constant and stocks are non-dividend-paying. We assume 252 trading days in each year. We study options with quarterly, semi-annual and annual maturity terms, with 63, 126 and 252 days duration respectively. Each time step is one trading day, that is, 1/252 trading year. We set the discount factor to γ = e r/252, corresponding to the daily interest rate. LSPI and FQI iterate on the sample paths until the difference between two successive policies is sufficiently small, or when it has run 15 iterations (LSPI and FQI usually converge in 4 or 5 iterations). We obtain five years daily stock prices from January 2002 to December 2006 for Dow Jones 30 companies from WRDS, Wharton Research Data Services. 6.1 Simulation models In our experiments when a simulation model is used, synthetic data may be generated from either a geometric Brownian motion (GBM) model or a generalized autoregressive conditional heteroskedasticity (GARCH) model, two of the most widely used models for stock price movement (Hull 2006). Geometric Brownian motion model. Suppose S t, the stock price at time t, follows a GBM: ds t = µs t dt + σs t dw t, where µ is the risk-neutral expected stock return, σ is the stock volatility and W is a standard Brownian motion. For a non-dividend-paying stock, µ = r, the risk-free interest rate. It is usually more accurate to simulate lns t in practice. Using Itô s lemma, the process followed by lns t is: dlns t = (µ σ 2 /2)dt + σdw t. We can obtain the following discretized version and use it to generate stock price sample paths: S t+1 = S t exp{(µ σ 2 /2) t + σ tε}, where t is a small time step, and ε N(0, 1), the standard normal distribution. To estimate the constant σ from real data, we use the method of maximum likelihood estimation (MLE). GARCH model. In a GBM, the volatility is assumed to be a constant. In reality, the volatility may itself be stochastic. We use GARCH(1,1) as a stochastic volatility model: σt 2 = ω + αu 2 t 1 + βσt 1, 2 where u t = ln(s t /S t 1 ), and α and β are weights for u 2 t 1 and σt 1 2 respectively. It is required that α + β < 1 for the stability of GARCH(1,1). The constant ω is related to the long term average volatility σ L by ω = (1 α β)σ L. The discretized version is: S t+1 = S t exp{(µ σt 2/2) t + σ t tε}. To estimate the parameters for a GARCH model and to generate sample paths, we use the MATLAB GARCH toolbox functions garchfit and garchsim. 6.2 Basis functions LSPI, FQI and LSM need to choose basis functions to approximate the expected continuation value. As suggested in Longstaff and Schwartz (2001),we use ϕ 0 (S) = 1 and the following Laguerre polynomials to generalize over the stock price: ϕ 1 (S) = exp( S /2), ϕ 2 (S) = exp( S /2)(1 S ), and ϕ 3 (S) = exp( S /2)(1 2S +S 2 /2). We use S = S/K instead of S in the basis functions, where K is the strike price, 356
6 since the function exp( S/2) goes to zero fast. LSPI and FQI also generalize over time t. We use the following functions for time t: ϕ t 0 (t) = sin( tπ/2t + π/2), ϕ t 1 (t) = ln(t t), ϕt 2 (t) = (t/t )2, guided by the observation that the optimal exercise boundary for an American put option is a monotonic increasing function, as shown in Duffie (2001). American put options. As we already noted, the intrinsic value of an American stock put option is g(s) = max(0, K S). LSM uses the functions ϕ 0 (S), ϕ 1 (S), ϕ 2 (S), and ϕ 3 (S). LSM computes different sets of weight vectors for the basis functions for different time steps. LSPI and FQI use the functions: ϕ 0 (S, t) = ϕ 0 (S), ϕ 1 (S, t) = ϕ 1 (S), ϕ 2 (S, t) = ϕ 2 (S), ϕ 3 (S, t) = ϕ 3 (S), ϕ 4 (S, t) = ϕ t 0(t), ϕ 5 (S, t) = ϕ t 1(t), and ϕ 6 (S, t) = ϕ t 2 (t). LSPI (FQI) determines a single weight vector over all time steps to calculate the continuation value. LSM only uses the 4 features over the prices (giving altogether 4(T 2) weights), while LSPI and FQI uses both sets of basis functions (i.e., 7 weights). Note that neither LSPI, nor FQI models correlations directly between time and stock prices, i.e., the decomposition of the continuation values corresponds to the first level of an ANOVA decomposition. 6.3 Results for American put options For real data, a pricing method can learn an exercise policy either 1) from sample paths generated from a simulation model; or, 2) from sample paths composed from real data directly. The testing sample paths are from real data and separate from data used in trainings. We scale the stock prices, so that, for each company, the initial price for each training path and each testing path is the same as the first price of the whole price series of the company. Now we proceed with the first approach. The simulation model for the underlying stock process follows a GBM model or a GARCH model, with parameters estimated from real data. For options with quarterly, semi-annual and annual maturities respectively, the first 662, 625 and 751 stock prices are used for estimating parameters. Then LSPI, FQI and LSM learn exercise policies with (the same) 50,000 sample paths. We call this approach of generating sample paths from a simulation model with parameters estimated from real data as LSPI gbm, LSPI garch, FQI gbm, FQI garch, LSM gbm and LSM garch, respectively. In the second approach, a pricing method learns the exercise policy from sample paths composed from real data directly. Due to the scarcity of real data, as there is only a single trajectory of stock price time series for each company, we construct multiple trajectories following a windowing technique. For each company, for quarterly, semi-annual, annual maturity terms, we obtain 600, 500, 500 training paths, each with duration = 63, 126, 252 prices. The first path is the first duration days of stock prices. Then we move one day ahead and obtain the second path, and so on. LSPI and LSM then learn exercise policies on these training paths. We call this approach of generating sample paths from real data directly as LSPI data, FQI data and LSM data, respectively. After the exercise policies are found, we compare their performances on the test paths. For each company, for the quarterly, semi-annual and annual maturity terms, we obtain 500, 450, 250 testing paths, each with duration = 63, 126, 252 prices, as follows. The first path is the last duration days of stock prices. Then we move one day back and obtain the second path, and so on. For each maturity term of each of the Dow Jones 30 companies, we average payoffs over the testing paths. Then we average the average payoffs over the 30 companies. Table 1 presents the average results. These results show that LSPI and FQI gain larger average payoffs than LSM. These results are (highly) significant when test by a paired two-tailed t-test. The differences between the performances of LSPI and FQI are not significant. A likely explanation for why LSPI and FQI are gaining larger payoffs than LSM is that LSPI and FQI generalize across all time steps and use a parsimonious representation, whereas LSM uses many weights and does not attempt to generalize across time. Since the training data is typically limited, the parsimonious representation can reduce estimation errors, even though the approximation error of this representation might be larger (see Theorem 1 for the tradeoff between the two terms). The results in Table 1 also show that LSPI data slightly outperforms both LSPI gbm and LSPI garch. The same holds for FQI. However, a paired t-test indicates that the observed differences are not significant (in the case of annual maturity term the difference in the results obtained on the MLE data and the real data is marginal, but still not significant). The differences in the performance obtained with LSM when it is trained with the real data and when it is trained with models is closer to be significant in all cases (p 0.25). Hence, in this case it seems that the models do not introduce significant biases, or these biases are compensated by the larger number of trajectories available for training the exercise policies. In Figure 1, we present the exercise boundaries discovered by LSPI, FQI and LSM for the real data of 357
7 Li, Szepesvari, Schuurmans LSPI FQI LSM maturity gbm garch data gbm garch data gbm garch data quarterly semi-annual annual Table 1: Average payoffs of LSPI, FQI and LSM on real data for Dow Jones 30 companies Conclusions Stock Price Time (trading days) Figure 1: Exercise boundaries. Intel, with semi-annual maturity and r = The optimal exercise boundary for an American put option is a monotonic increasing function, as shown in Duffie (2001). Figure 1 shows that unlike the boundary discovered by LSM, the exercise boundaries discovered by LSPI and FQI are smooth and (mostly) respect monotonicity. The scarcity of sample paths may explain this non-monotonicity. The boundary of FQI is lower than that of LSPI, which is typical according to our experience and we are currently investigating the cause of this. With 50,000 training paths (with simulated data), the exercise boundaries discovered by LSM will be much smoother, but still choppy. We also evaluate LSPI, FQI and LSM with synthetic sample paths, with the parameters for the GBM model and the GARCH model estimated from real data. For each company, we generate 50,000 training paths 10,000 testing paths. The results in Table 2 indicate that LSPI and FQI gain larger payoffs than LSM, both in the GBM model and in the GARCH model (the interest rate is r = 0.03). Also, in the GBM model the difference between LSM and the two methods is smaller. Besides American put stock options, we also study American Asian stock options, which are complex, exotic, path-dependent options, whose payoffs are determined by the average of past stock prices. The results are similar. LSPI FQI LSM Options are important financial instruments and pricing of complex options is a non-trivial problem. Our empirical results show that methods developed in RL can advance the state-of-the-art in this challenging application area and demonstrates that computational finance remains a promising area for future applications of RL methods. However, much remains to be done. The theoretical analysis presented leaves open the question of how to choose the function approximation technique. When data is scarce, controlling both the estimation and the approximation error becomes important. This calls for model-selection technique. A simple solution is to generate policies with different function approximators and compare their performance. Unlike in a general MDP this is possible because given the price trajectories one can simulate any policy on real data. Hence, separating some test data should make it possible to do model-selection in an efficient way. However, this is left as future work. Appendix Here we sketch the proof for Theorem 1, which refines the proof of Theorem 4 in Antos et al. (2008) for the LSPI option pricing algorithm. First note that in the light of Proposition 2 of Antos et al. (2008), this theorem can indeed be applied to LSPI. The major modification is that due to the finite horizon nature of the problem all the work has to be done with sequences of sub-probability measures (of length T ) instead of working with probability measures over S. Accordingly, the stochastic kernels, P π, associated with policies must be redefined. In particular, P π, as a left-linear operator will assign the subprobability measure sequence (µ 0,..., µ T 1 ) to a subprobability measure sequence (µ 0,..., µ T 1 ) as follows: Here measures µ t, µ t are over S t A. First, let µ t = µ t /µ t (S t A). Let U S t+1 A be an arbitrary measurable set. Then µ t+1(u) = P ((S t, A t ) U). Here the random variables S t, A t are defined as follows: Let (S t, A t ) µ t ( ), S t P t( S t ), S t = S t iff A t = 0, S t = e, otherwise, and A t = π(s t ). Further, µ 0 (U) = P ((S 0, A 0 ) U), where S 0 P 0 ( ), A 0 = π(s 0 ). The kernel P π is also interpreted as a 358
8 maturity GBM model GARCH model term LSPI FQI LSM LSPI FQI LSM quarterly semi-annual annual Table 2: Average payoffs on synthetic data with parameters estimated from real data. right linear operator acting on value functions. For this, first define the integral of Q B(S A) with respect to µ, where µ = (µ 0,..., µ T 1 ) as above. Then µq = (µ 0 Q 0,..., µ T 1 Q T 1 ), where, as usual, for a measure µ t over S t A t and a function Q t over the same domain µ t Q t = Q t (s, a)µ t (ds, da). Then Q = P π Q B(S A) is defined as follows: Q (e, a) = 0, Q ((s, t), a) = δ s,a Q t, s S t, t = 0,..., T 1. These definitions allow us to repeat the proof of Theorem 4 of Antos et al. (2008) (in particular, it makes Lemma 12 of Antos et al. (2008) hold). One more change that is required is to redefine the m-step future-state concentrability coefficients c ρ,ν (m) (needed in the proof of Lemma 12 and influencing the bound of Theorem 4 of Antos et al. (2008)). The modified definition is c ρ,ν (m) = sup π1,...,π m d/dν(ρp π1... P πm ) Sc, where ρ is a distribution sequence over the state-action pairs, ν is a distribution sequence over S, and S c = { (s, 0) : s S } S A, which can be (and is) identified with the set S. The main observation is the following: Let π be a policy and let ˆµ π be such that ˆµ π,t (U) = P ((S t, A t ) U), where S t is a price process obtained with the kernels (P t ) t 0 and A t = π( S t ). Then for any policies π, π, as it follows directly from the definitions, ˆµ π P π ˆµ π, where for the sequences the comparison happens componentwise and for measures ν, ρ defined over a common domain, ν ρ if ν(u) ρ(u) holds for all measurable set U in the common domain. Hence, if ν = ρ = ˆµ πc, where π c always selects action 0, ρp π1 = ˆµ πc P π1 ˆµ π1, thus ρp π1 P π2 ˆµ π1 P π2 ˆµ π2. Continuing this way we get ρp π1... P πm ˆµ πm. Hence, (ρp π1... P πm ˆµ πm ) Sc ˆµ πc. Hence, with this choice, c ρ,ν (m) 1. This gives C ρ,ν = (1 γ) 2 m 1 mγm 1 c ρ,µ (m) 1. Further, one can show that the β-mixing condition of Theorem 4 is satisfied by choosing κ = 1, b = 1 and β such that log(β) = T (this is the origin of T in Λ). The pseudo-dimension of the function spaces and their VC-crossing dimension (by Proposition 3 of Antos et al. (2008)) are equal to d, again appearing in Λ. References Andras Antos, Csaba Szepesvari, and Remi Munos. Learning near-optimal policies with bellmanresidual minimization based fitted policy iteration and a single sample path. Machine Learning Journal, 71:89 129, Dimitri P. Bertsekas. Dynamic programming and optimal control. Athena Scientific, Massachusetts, USA, Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro- Dynamic Programming. Athena Scientific, Massachusetts, USA, Steven J. Bradtke and Andrew G. Barto. Linear leastsquares algorithms for temporal difference learning. Machine Learning, 22(1-3):33 57, March Mark Broadie and Jerome B. Detemple. Option pricing: valuation models and applications. Management Science, 50(9): , September Darrell Duffie. Dynamic asset pricing theory. Princeton University Press, Paul Glasserman. Monte Carlo Methods in Financial Engineering. Springer-Verlag, New York, John C. Hull. Options, Futures and Other Derivatives (6th edition). Prentice Hall, Michail G. Lagoudakis and Ronald Parr. Least-squares policy iteration. The Journal of Machine Learning Research, 4: , December Francis A Longstaff and Eduardo S Schwartz. Valuing American options by simulation: a simple leastsquares approach. The Review of Financial Studies, 14(1): , Spring R. Munos and Cs. Szepesvári. Finite time bounds for fitted value iteration. Journal of Machine Learning Research, 9: , Martin L. Puterman. Markov decision processes : discrete stochastic dynamic programming. John Wiley & Sons, New York, Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, John. N. Tsitsiklis and Benjamin Van Roy. Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks (special issue on computational finance), 12(4): , July
Using Markov Decision Processes to Solve a Portfolio Allocation Problem
Using Markov Decision Processes to Solve a Portfolio Allocation Problem Daniel Bookstaber April 26, 2005 Contents 1 Introduction 3 2 Defining the Model 4 2.1 The Stochastic Model for a Single Asset.........................
Optimization under uncertainty: modeling and solution methods
Optimization under uncertainty: modeling and solution methods Paolo Brandimarte Dipartimento di Scienze Matematiche Politecnico di Torino e-mail: [email protected] URL: http://staff.polito.it/paolo.brandimarte
A Sarsa based Autonomous Stock Trading Agent
A Sarsa based Autonomous Stock Trading Agent Achal Augustine The University of Texas at Austin Department of Computer Science Austin, TX 78712 USA [email protected] Abstract This paper describes an autonomous
TD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 [email protected] Abstract
Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010
Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte
Reinforcement Learning
Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Martin Lauer AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg [email protected]
LECTURE 15: AMERICAN OPTIONS
LECTURE 15: AMERICAN OPTIONS 1. Introduction All of the options that we have considered thus far have been of the European variety: exercise is permitted only at the termination of the contract. These
Motivation. Motivation. Can a software agent learn to play Backgammon by itself? Machine Learning. Reinforcement Learning
Motivation Machine Learning Can a software agent learn to play Backgammon by itself? Reinforcement Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut
ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE
ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.
Valuation of Razorback Executive Stock Options: A Simulation Approach
Valuation of Razorback Executive Stock Options: A Simulation Approach Joe Cheung Charles J. Corrado Department of Accounting & Finance The University of Auckland Private Bag 92019 Auckland, New Zealand.
Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers
Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications
Reinforcement Learning
Reinforcement Learning LU 2 - Markov Decision Problems and Dynamic Programming Dr. Joschka Bödecker AG Maschinelles Lernen und Natürlichsprachliche Systeme Albert-Ludwigs-Universität Freiburg [email protected]
Neuro-Dynamic Programming An Overview
1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. September 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly
Pricing and hedging American-style options: a simple simulation-based approach
The Journal of Computational Finance (95 125) Volume 13/Number 4, Summer 2010 Pricing and hedging American-style options: a simple simulation-based approach Yang Wang UCLA Mathematics Department, Box 951555,
Binomial lattice model for stock prices
Copyright c 2007 by Karl Sigman Binomial lattice model for stock prices Here we model the price of a stock in discrete time by a Markov chain of the recursive form S n+ S n Y n+, n 0, where the {Y i }
Pricing of an Exotic Forward Contract
Pricing of an Exotic Forward Contract Jirô Akahori, Yuji Hishida and Maho Nishida Dept. of Mathematical Sciences, Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan E-mail: {akahori,
Numerical methods for American options
Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment
Finitely Additive Dynamic Programming and Stochastic Games. Bill Sudderth University of Minnesota
Finitely Additive Dynamic Programming and Stochastic Games Bill Sudderth University of Minnesota 1 Discounted Dynamic Programming Five ingredients: S, A, r, q, β. S - state space A - set of actions q(
Computing Near Optimal Strategies for Stochastic Investment Planning Problems
Computing Near Optimal Strategies for Stochastic Investment Planning Problems Milos Hauskrecfat 1, Gopal Pandurangan 1,2 and Eli Upfal 1,2 Computer Science Department, Box 1910 Brown University Providence,
Mathematical Finance
Mathematical Finance Option Pricing under the Risk-Neutral Measure Cory Barnes Department of Mathematics University of Washington June 11, 2013 Outline 1 Probability Background 2 Black Scholes for European
Chapter 2: Binomial Methods and the Black-Scholes Formula
Chapter 2: Binomial Methods and the Black-Scholes Formula 2.1 Binomial Trees We consider a financial market consisting of a bond B t = B(t), a stock S t = S(t), and a call-option C t = C(t), where the
Simulating Stochastic Differential Equations
Monte Carlo Simulation: IEOR E473 Fall 24 c 24 by Martin Haugh Simulating Stochastic Differential Equations 1 Brief Review of Stochastic Calculus and Itô s Lemma Let S t be the time t price of a particular
Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies
Hedging Illiquid FX Options: An Empirical Analysis of Alternative Hedging Strategies Drazen Pesjak Supervised by A.A. Tsvetkov 1, D. Posthuma 2 and S.A. Borovkova 3 MSc. Thesis Finance HONOURS TRACK Quantitative
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment
An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment Hideki Asoh 1, Masanori Shiro 1 Shotaro Akaho 1, Toshihiro Kamishima 1, Koiti Hasida 1, Eiji Aramaki 2, and Takahide
A SNOWBALL CURRENCY OPTION
J. KSIAM Vol.15, No.1, 31 41, 011 A SNOWBALL CURRENCY OPTION GYOOCHEOL SHIM 1 1 GRADUATE DEPARTMENT OF FINANCIAL ENGINEERING, AJOU UNIVERSITY, SOUTH KOREA E-mail address: [email protected] ABSTRACT. I introduce
Numerical Methods for Option Pricing
Chapter 9 Numerical Methods for Option Pricing Equation (8.26) provides a way to evaluate option prices. For some simple options, such as the European call and put options, one can integrate (8.26) directly
Monte Carlo Methods in Finance
Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook October 2, 2012 Outline Introduction 1 Introduction
A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT
New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:
Inductive QoS Packet Scheduling for Adaptive Dynamic Networks
Inductive QoS Packet Scheduling for Adaptive Dynamic Networks Malika BOURENANE Dept of Computer Science University of Es-Senia Algeria [email protected] Abdelhamid MELLOUK LISSI Laboratory University
CS 522 Computational Tools and Methods in Finance Robert Jarrow Lecture 1: Equity Options
CS 5 Computational Tools and Methods in Finance Robert Jarrow Lecture 1: Equity Options 1. Definitions Equity. The common stock of a corporation. Traded on organized exchanges (NYSE, AMEX, NASDAQ). A common
NEURAL NETWORKS AND REINFORCEMENT LEARNING. Abhijit Gosavi
NEURAL NETWORKS AND REINFORCEMENT LEARNING Abhijit Gosavi Department of Engineering Management and Systems Engineering Missouri University of Science and Technology Rolla, MO 65409 1 Outline A Quick Introduction
Pricing American Options using Monte Carlo Methods
U.U.D.M. Project Report 009:8 Pricing American Options using Monte Carlo Methods Quiyi Jia Examensarbete i matematik, 30 hp Handledare och examinator: Johan Tysk Juni 009 Department of Mathematics Uppsala
Options 1 OPTIONS. Introduction
Options 1 OPTIONS Introduction A derivative is a financial instrument whose value is derived from the value of some underlying asset. A call option gives one the right to buy an asset at the exercise or
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
A Memory Reduction Method in Pricing American Options Raymond H. Chan Yong Chen y K. M. Yeung z Abstract This paper concerns with the pricing of American options by simulation methods. In the traditional
Supplement to Call Centers with Delay Information: Models and Insights
Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290
S 1 S 2. Options and Other Derivatives
Options and Other Derivatives The One-Period Model The previous chapter introduced the following two methods: Replicate the option payoffs with known securities, and calculate the price of the replicating
The Black-Scholes pricing formulas
The Black-Scholes pricing formulas Moty Katzman September 19, 2014 The Black-Scholes differential equation Aim: Find a formula for the price of European options on stock. Lemma 6.1: Assume that a stock
Risk/Arbitrage Strategies: An Application to Stock Option Portfolio Management
Risk/Arbitrage Strategies: An Application to Stock Option Portfolio Management Vincenzo Bochicchio, Niklaus Bühlmann, Stephane Junod and Hans-Fredo List Swiss Reinsurance Company Mythenquai 50/60, CH-8022
Stochastic Inventory Control
Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the
Persuasion by Cheap Talk - Online Appendix
Persuasion by Cheap Talk - Online Appendix By ARCHISHMAN CHAKRABORTY AND RICK HARBAUGH Online appendix to Persuasion by Cheap Talk, American Economic Review Our results in the main text concern the case
Numerical PDE methods for exotic options
Lecture 8 Numerical PDE methods for exotic options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 Barrier options For barrier option part of the option contract is triggered if the asset
Valuation of the Surrender Option in Life Insurance Policies
Valuation of the Surrender Option in Life Insurance Policies Hansjörg Furrer Market-consistent Actuarial Valuation ETH Zürich, Frühjahrssemester 2010 Valuing Surrender Options Contents A. Motivation and
Valuation, Pricing of Options / Use of MATLAB
CS-5 Computational Tools and Methods in Finance Tom Coleman Valuation, Pricing of Options / Use of MATLAB 1.0 Put-Call Parity (review) Given a European option with no dividends, let t current time T exercise
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Lecture 5: Model-Free Control
Lecture 5: Model-Free Control David Silver Outline 1 Introduction 2 On-Policy Monte-Carlo Control 3 On-Policy Temporal-Difference Learning 4 Off-Policy Learning 5 Summary Introduction Model-Free Reinforcement
3. Monte Carlo Simulations. Math6911 S08, HM Zhu
3. Monte Carlo Simulations Math6911 S08, HM Zhu References 1. Chapters 4 and 8, Numerical Methods in Finance. Chapters 17.6-17.7, Options, Futures and Other Derivatives 3. George S. Fishman, Monte Carlo:
A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails
12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint
The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy
BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.
Stock Price Dynamics, Dividends and Option Prices with Volatility Feedback
Stock Price Dynamics, Dividends and Option Prices with Volatility Feedback Juho Kanniainen Tampere University of Technology New Thinking in Finance 12 Feb. 2014, London Based on J. Kanniainen and R. Piche,
Pricing Discrete Barrier Options
Pricing Discrete Barrier Options Barrier options whose barrier is monitored only at discrete times are called discrete barrier options. They are more common than the continuously monitored versions. The
DETERMINING THE VALUE OF EMPLOYEE STOCK OPTIONS. Report Produced for the Ontario Teachers Pension Plan John Hull and Alan White August 2002
DETERMINING THE VALUE OF EMPLOYEE STOCK OPTIONS 1. Background Report Produced for the Ontario Teachers Pension Plan John Hull and Alan White August 2002 It is now becoming increasingly accepted that companies
The Black-Scholes Formula
FIN-40008 FINANCIAL INSTRUMENTS SPRING 2008 The Black-Scholes Formula These notes examine the Black-Scholes formula for European options. The Black-Scholes formula are complex as they are based on the
Master of Mathematical Finance: Course Descriptions
Master of Mathematical Finance: Course Descriptions CS 522 Data Mining Computer Science This course provides continued exploration of data mining algorithms. More sophisticated algorithms such as support
Black-Scholes Equation for Option Pricing
Black-Scholes Equation for Option Pricing By Ivan Karmazin, Jiacong Li 1. Introduction In early 1970s, Black, Scholes and Merton achieved a major breakthrough in pricing of European stock options and there
Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia
Estimating the Degree of Activity of jumps in High Frequency Financial Data joint with Yacine Aït-Sahalia Aim and setting An underlying process X = (X t ) t 0, observed at equally spaced discrete times
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing
Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012
Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic
Hedging Options In The Incomplete Market With Stochastic Volatility. Rituparna Sen Sunday, Nov 15
Hedging Options In The Incomplete Market With Stochastic Volatility Rituparna Sen Sunday, Nov 15 1. Motivation This is a pure jump model and hence avoids the theoretical drawbacks of continuous path models.
HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY. Daniel Harenberg [email protected]. University of Mannheim. Econ 714, 28.11.
COMPUTING EQUILIBRIUM WITH HETEROGENEOUS AGENTS AND AGGREGATE UNCERTAINTY (BASED ON KRUEGER AND KUBLER, 2004) Daniel Harenberg [email protected] University of Mannheim Econ 714, 28.11.06 Daniel Harenberg
The Ergodic Theorem and randomness
The Ergodic Theorem and randomness Peter Gács Department of Computer Science Boston University March 19, 2008 Peter Gács (Boston University) Ergodic theorem March 19, 2008 1 / 27 Introduction Introduction
Moreover, under the risk neutral measure, it must be the case that (5) r t = µ t.
LECTURE 7: BLACK SCHOLES THEORY 1. Introduction: The Black Scholes Model In 1973 Fisher Black and Myron Scholes ushered in the modern era of derivative securities with a seminal paper 1 on the pricing
A Simple Model of Price Dispersion *
Federal Reserve Bank of Dallas Globalization and Monetary Policy Institute Working Paper No. 112 http://www.dallasfed.org/assets/documents/institute/wpapers/2012/0112.pdf A Simple Model of Price Dispersion
where N is the standard normal distribution function,
The Black-Scholes-Merton formula (Hull 13.5 13.8) Assume S t is a geometric Brownian motion w/drift. Want market value at t = 0 of call option. European call option with expiration at time T. Payout at
PRICING OF GAS SWING OPTIONS. Andrea Pokorná. UNIVERZITA KARLOVA V PRAZE Fakulta sociálních věd Institut ekonomických studií
9 PRICING OF GAS SWING OPTIONS Andrea Pokorná UNIVERZITA KARLOVA V PRAZE Fakulta sociálních věd Institut ekonomických studií 1 Introduction Contracts for the purchase and sale of natural gas providing
An Incomplete Market Approach to Employee Stock Option Valuation
An Incomplete Market Approach to Employee Stock Option Valuation Kamil Kladívko Department of Statistics, University of Economics, Prague Department of Finance, Norwegian School of Economics, Bergen Mihail
Machine Learning in Statistical Arbitrage
Machine Learning in Statistical Arbitrage Xing Fu, Avinash Patra December 11, 2009 Abstract We apply machine learning methods to obtain an index arbitrage strategy. In particular, we employ linear regression
Pricing Barrier Option Using Finite Difference Method and MonteCarlo Simulation
Pricing Barrier Option Using Finite Difference Method and MonteCarlo Simulation Yoon W. Kwon CIMS 1, Math. Finance Suzanne A. Lewis CIMS, Math. Finance May 9, 000 1 Courant Institue of Mathematical Science,
Session IX: Lecturer: Dr. Jose Olmo. Module: Economics of Financial Markets. MSc. Financial Economics
Session IX: Stock Options: Properties, Mechanics and Valuation Lecturer: Dr. Jose Olmo Module: Economics of Financial Markets MSc. Financial Economics Department of Economics, City University, London Stock
STCE. Fast Delta-Estimates for American Options by Adjoint Algorithmic Differentiation
Fast Delta-Estimates for American Options by Adjoint Algorithmic Differentiation Jens Deussen Software and Tools for Computational Engineering RWTH Aachen University 18 th European Workshop on Automatic
Hedging Variable Annuity Guarantees
p. 1/4 Hedging Variable Annuity Guarantees Actuarial Society of Hong Kong Hong Kong, July 30 Phelim P Boyle Wilfrid Laurier University Thanks to Yan Liu and Adam Kolkiewicz for useful discussions. p. 2/4
Lectures. Sergei Fedotov. 20912 - Introduction to Financial Mathematics. No tutorials in the first week
Lectures Sergei Fedotov 20912 - Introduction to Financial Mathematics No tutorials in the first week Sergei Fedotov (University of Manchester) 20912 2010 1 / 1 Lecture 1 1 Introduction Elementary economics
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
1 Portfolio mean and variance
Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring
Arbitrage-Free Pricing Models
Arbitrage-Free Pricing Models Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Arbitrage-Free Pricing Models 15.450, Fall 2010 1 / 48 Outline 1 Introduction 2 Arbitrage and SPD 3
Martingale Pricing Applied to Options, Forwards and Futures
IEOR E4706: Financial Engineering: Discrete-Time Asset Pricing Fall 2005 c 2005 by Martin Haugh Martingale Pricing Applied to Options, Forwards and Futures We now apply martingale pricing theory to the
Option Pricing. 1 Introduction. Mrinal K. Ghosh
Option Pricing Mrinal K. Ghosh 1 Introduction We first introduce the basic terminology in option pricing. Option: An option is the right, but not the obligation to buy (or sell) an asset under specified
Does Black-Scholes framework for Option Pricing use Constant Volatilities and Interest Rates? New Solution for a New Problem
Does Black-Scholes framework for Option Pricing use Constant Volatilities and Interest Rates? New Solution for a New Problem Gagan Deep Singh Assistant Vice President Genpact Smart Decision Services Financial
Financial Market Microstructure Theory
The Microstructure of Financial Markets, de Jong and Rindi (2009) Financial Market Microstructure Theory Based on de Jong and Rindi, Chapters 2 5 Frank de Jong Tilburg University 1 Determinants of the
Revenue Management for Transportation Problems
Revenue Management for Transportation Problems Francesca Guerriero Giovanna Miglionico Filomena Olivito Department of Electronic Informatics and Systems, University of Calabria Via P. Bucci, 87036 Rende
QUANTIZED INTEREST RATE AT THE MONEY FOR AMERICAN OPTIONS
QUANTIZED INTEREST RATE AT THE MONEY FOR AMERICAN OPTIONS L. M. Dieng ( Department of Physics, CUNY/BCC, New York, New York) Abstract: In this work, we expand the idea of Samuelson[3] and Shepp[,5,6] for
Lecture 10. Finite difference and finite element methods. Option pricing Sensitivity analysis Numerical examples
Finite difference and finite element methods Lecture 10 Sensitivities and Greeks Key task in financial engineering: fast and accurate calculation of sensitivities of market models with respect to model
The Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
4: SINGLE-PERIOD MARKET MODELS
4: SINGLE-PERIOD MARKET MODELS Ben Goldys and Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2015 B. Goldys and M. Rutkowski (USydney) Slides 4: Single-Period Market
3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
Generation Asset Valuation with Operational Constraints A Trinomial Tree Approach
Generation Asset Valuation with Operational Constraints A Trinomial Tree Approach Andrew L. Liu ICF International September 17, 2008 1 Outline Power Plants Optionality -- Intrinsic vs. Extrinsic Values
Nonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]
VI. Real Business Cycles Models
VI. Real Business Cycles Models Introduction Business cycle research studies the causes and consequences of the recurrent expansions and contractions in aggregate economic activity that occur in most industrialized
11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
Finite Differences Schemes for Pricing of European and American Options
Finite Differences Schemes for Pricing of European and American Options Margarida Mirador Fernandes IST Technical University of Lisbon Lisbon, Portugal November 009 Abstract Starting with the Black-Scholes
An Introduction to Modeling Stock Price Returns With a View Towards Option Pricing
An Introduction to Modeling Stock Price Returns With a View Towards Option Pricing Kyle Chauvin August 21, 2006 This work is the product of a summer research project at the University of Kansas, conducted
Beyond Reward: The Problem of Knowledge and Data
Beyond Reward: The Problem of Knowledge and Data Richard S. Sutton University of Alberta Edmonton, Alberta, Canada Intelligence can be defined, informally, as knowing a lot and being able to use that knowledge
