Monte-Carlo Methods for Risk Management

Transcription

1 IEOR E460: Quantitative Risk Management Spring 010 c 010 by Martin Haugh Monte-Carlo Methods for Risk Management In these lecture notes we discuss Monte-Carlo (MC) techniques that are particularly useful in risk-management applications. We focus on importance sampling and stratified sampling, both of which are variance reduction techniques that can be very useful in estimating risk measures associated with rare-events. Because applications to risk management tend to be quite involved, we will only discuss one such application in these lecture notes. But see Chapter 9 of Glasserman s Monte-Carlo Methods in Financial Engineering (004) for further applications of importance and stratified sampling to credit risk and the estimation of risk measures in both light- and heavy-tailed settings. We also discuss the simulation of stochastic differential equations (SDEs) and low-discrepancy sequences (LDS), both of which can also be used in risk management applications. The simulation of SDE s is often necessary in pricing applications and in evaluating model risk. As we discussed in the Model Risk lecture notes, we should generally price exotic derivatives using a range of different but plausible models. In many instances, pricing can only be accomplished by simulating the underlying SDE s. Low discrepancy sequences have also become quite popular in MC applications, particularly when regular MC methods are too demanding from a computational viewpoint. As risk management applications are often very computationally demanding, LDS methods can also be of use here. We assume that readers are already familiar with Monte-Carlo simulation and know, in particular, how to generate random variables and analyze simulation output. We also assume that readers have had some exposure to variance reduction methods. 1 Importance Sampling Suppose we wish to estimate θ := P(X 0) = E[I {X 0} ] where X N(0, 1). The raw estimator given by the usual Monte-Carlo approach is obtained as follows: 1. Generate X 1,..., X n IID N(0, 1). Set I j = I {Xj 0} for j = 1,..., n 3. Set θ n = n j=1 I j/n 4. Compute approximate CI s For this problem, however, the usual approach would be completely inadequate since approximating θ to any reasonable degree of accuracy would require n to be inordinately large. For example, on average we would have to set n in order to obtain just one non-zero value of I. 1 Clearly this is impractical and a much smaller value of n would have to be used. Using a much smaller value of n, however, would almost inevitably result in an estimate, θ n = 0, and an approximate confidence interval [L, U] = [0, 0]! So the naive Monte-Carlo approach does not work. Before proceeding any further, it is not unreasonable to ask why such a problem would be important. After all, if you want to estimate θ = P(X 0), isn t it enough to know that θ is very close to 0? Put another way, who cares whether θ = or θ = 10 0? For many problems, this is a valid objection, and indeed for such 1 See Example 1.

2 Monte-Carlo Methods for Risk Management problems the answer is that we don t care. However, for many other other problems it is very important to know θ to a much greater level of accuracy. For example, suppose we are designing a nuclear power plant and we want to estimate the probability, θ, of a meltdown occurring sometime in the next 100 years. We would expect θ to be very small, even for a poorly designed power plant. However, this is not enough. Should a meltdown occur, then clearly the consequences could be catastrophic and so we need to know θ to a high degree of accuracy. For another example, suppose we want to price a deep-out-of-the-money option using simulation. Then the price of the option will be very small, perhaps lying between 10 cents and.1 cents. Clearly a bank is not going to suffer if it misprices an option and sells it for.1 cents when the correct value is 10 cents. But what if the bank sells 1 million of these options? And what if the bank makes similar trades several times a week? Then it becomes very important to price the option correctly. A particularly rich source of examples can be found in risk-management where we seek to estimate risk measures such as the VaR α or ES α of a given portfolio. Note that these examples require estimating the probability of a rare event. Even though the events are rare, they are very important because when they do occur their impact can be very significant. We will study importance sampling, a variance reduction technique that can be invaluable when estimating rare event probabilities and expectations. The Importance Sampling Estimator Suppose we wish to estimate θ = E f [h(x)] where X has PDF 3 f. Let g be another PDF with the property that g(x) 0 whenever f(x) 0. That is, g has the same support as f. Then θ = E f [h(x)] = h(x)f(x) dx = h(x) f(x) [ ] h(x)f(x) g(x) g(x) dx = E g g(x) where E g [ ] denotes an expectation with respect to the density g. This has very important implications for estimating θ. The original simulation method is to generate n samples of X from the density, f, and set θ n = h(x j )/n. An alternative method, however, is to generate n values of X from the density, g, and set n h(x j )f(x j ) θ n,is =. ng(x j ) θ n,is is then 4 an importance sampling estimator of θ. We often define j=1 h (X) := h(x)f(x) g(x) so that θ = E g [h (X)]. We refer to f and g as the original density and importance sampling densities 5, respectively. We also refer to f/g as the likelihood ratio. Example 1 Consider again the problem where we want to estimate θ = P(X 0) = E[I {X 0} ] when X N(0, 1). We may then write θ = E[I {X 0} ] = = = I {X 0} 1 π e x dx I {X 0} 1 π e x 1 π e (x µ) 1 π e (x µ) dx I {X 0} e µx+µ / 1 π e (x µ) dx = E µ [ I {X 0} e µx+µ / ] It can of course also be used for estimating non rare event probabilities and expectations but it is not as useful in such circumstances. An exception to this statement is when importance sampling is easier than the regular simulation method. 3 Or PMF, if X is a discrete random variable. 4 We will see later where the name importance sampling comes from. 5 Obviously if X was discrete we would use mass function in place of density function.

3 Monte-Carlo Methods for Risk Management 3 where now X N(µ, 1) and e µx+µ / is the likelihood ratio. If we set µ = 0, for example, and use n = 1 million samples, then we find an approximate 95% confidence interval for θ is given by [.75,.778] We can of course also estimate expectations using importance sampling. Example Suppose we wish to estimate θ = E[X 4 e X /4 I {X } ] where X N(0, 1). Then the same argument as before implies that we may also write θ = E µ [X 4 e X /4 e µx+µ / I {X } ] where now X N(µ, 1). The General Formulation Let X = (X 1,..., X n ) be a random vector with joint PDF f(x 1,..., x n ) and suppose we wish to estimate θ = E f [h(x)]. Let g(x 1,..., x n ) be another PDF such that g(x) 0 whenever f(x) 0. Then θ = E f [h(x)] = =... x 1 x n h(x 1,..., x n )f(x 1,..., x n ) dx 1... dx n... x 1 x n h(x 1,..., x n ) f(x 1,..., x n ) g(x 1,..., x n ) g (x 1,..., x n ) dx 1... dx n = E g [h (X)] where h (X) := h(x)f(x)/g(x). Again have two methods for estimating θ: the original method where we simulate with respect to the density function, f, and the importance sampling method where we simulate with respect to the density, g. Example 3 Suppose we wish to estimate θ = P ( n X i 50) where the X i s are IID N(0, 1). Then θ = E[h(X)] where h(x) := I { X i 50} and X := (X 1,..., X n ). We could estimate θ using importance sampling as follows. θ = E[h(X)] =... x 1 = σ x n... 1 x n ( ( = σ x n... 1 x n = E g [ σ n ( e X 1 xn e x 1 / π... e x n / π e x 1 / e x 1 /σ e x (1 1 1 I { X i 50} dx 1... dx n )... e x n / e x 1 /σ e... e x n /σ x n /σ πσ πσ (1 1/σ)... e x n (1 1/σ ) σ )... e X n ) (1 1 σ ) I { X i 50} ) e x 1 /σ πσ... e x n /σ πσ ] I { X i 50} dx 1... dx n I { X i 50} dx 1... dx n = E g [h (X)] where E g [.] denotes expectation under a multivariate normal distribution where the X MVN(0, σ I n ). Thus far we have not addressed the issue of how to choose a good sampling density, g, so that we obtain a variance reduction when we sample from g instead of f. We will address this question in the next two sections and also explain the term importance sampling.

4 Monte-Carlo Methods for Risk Management 4 Obtaining a Variance Reduction As before, suppose we wish to estimate θ = E f [h(x)] where X is a random vector with joint PDF, f. We will assume that h(x) 0. Now let g be another density with support equal to that of f. Then we know and this gives rise to two estimators: 1. h(x) where X f and. h (X) where X g θ = E f [h(x)] = E g [h (X)] The variance of the importance sampling estimator is given by Var g (h (X)) = h (x) g(x) dx θ = h(x) f(x) f(x) dx θ g(x) while the variance of the original estimator is given by Var f (h(x)) = h(x) f(x) dx θ. So the reduction in variance 6 is then given by ( Var f (h(x)) Var g (h (X)) = h(x) 1 f(x) ) f(x) dx. (1) g(x) In order to achieve a variance reduction, the integral in (1) should be positive. For this to happen, we would like 1. f(x)/g(x) > 1 when h(x)f(x) is small and. f(x)/g(x) < 1 when h(x)f(x) is large. Now the important part of the density, f, could plausibly be defined to be that region, A say, in the support of f where h(x)f(x) is large. But, by the above observation, we would like to choose g so that f(x)/g(x) is small whenever x is in A. That is, we would like a density, g, that puts more weight on A: hence the name importance sampling. Note that when h involves a rare event so that h(x) = 0 over most of the state space, it can then be particularly valuable to choose g so that we sample often from that part of the state space where h(x) 0. This is why importance sampling is most useful for simulating rare events. Further guidance on how to choose g is obtained from the following observation. As we are free to choose g, let s suppose we choose 7 g(x) = h(x)f(x)/θ. Then it is easy to see that Var g (h (X)) = θ θ = 0 so that we have a zero variance estimator! This means that if we sample with respect to this particular choice of g, then we would only need one sample and this sample would equal θ with probability one. 8 Of course this is not feasible in practice. After all, since it is θ that we are trying to estimate, it does not seem likely that we could simulate a random variable whose density is given by g(x) = h(x)f(x)/θ. However, all is not lost and this observation can often guide us towards excellent choices of g that lead to extremely large variance reductions. 6 A negative reduction means a variance increase. 7 Note that this choice of g is valid since g(x) dx = 1 and we have assumed h is non-negative. 8 Recall that with this choice of g, h (x) = h(x)f(x)/g(x) = θ.

5 Monte-Carlo Methods for Risk Management 5 How to Choose a Good Sampling Distribution We saw above that if we could choose g(x) = h(x)f(x)/θ, then we would obtain the best possible estimator of θ, that is, one that has zero variance. In general, we cannot do this, but it does suggest that if we could choose g( ) so that it is similar to h( )f( ), then we might reasonably expect to obtain a large variance reduction. What does the phrase similar mean? One obvious thing to do would be to choose g( ) so that it has a similar shape to h( )f( ). In particular, we could try to choose g so that g(x) and h(x)f(x) both take on their maximum values at the same value, x, say. When we choose g this way, we say that we are using the maximum principle. Of course this only partially defines g since there are infinitely many density functions that could take their maximum value at x. Nevertheless, this is often enough to obtain a significant variance reduction and in practice, we often take g to be from the same family of distributions as f. For example, if f is multivariate normal, then we might also take g to be multivariate normal but with a different mean vector and / or variance-covariance matrix. 9 Example 4 Returning to Example, recall that we wished to estimate θ = E[h(X)] = E[X 4 e X /4 I {X } ] where X N(0, 1). If we sample from a PDF, g, that is also normal with variance 1 but mean µ, then we know that g takes it maximum value at x = µ. Therefore, a good choice of µ might be µ = arg max x h(x)f(x) = arg max x x4 e x /4 = 8. Then θ = E g [h (X)] = E g [X 4 e X /4 e 8X+4 I {X } ] where g( ) denotes the N( 8, 1) PDF. Example 5 (Pricing an Asian call option) For the purpose of option pricing, we assume as usual that S t GBM(r, σ ), where S t is the price of the stock at time t and r is the risk-free interest rate. Suppose now that we want to price an Asian call option whose payoff at time t is given by ( m h(s) := max 0, S ) it/m K () m where S := {S it/m : i = 1,..., m}, T is the expiration date and K is the strike price. The price of this option is then given by C a = E[e rt h(s)]. Now we can write S it/m = S 0 e (r σ /) it m +σ T m (X X i ) where the X i s are IID N(0, 1). This means that if f is the joint PDF of X = (X 1,..., X m ), then (with a mild abuse of notation) we may write C a = E f [h(x 1,..., X n )]. Now if K is large relative to S 0 so that the option is deep out-of-the-money then pricing the option using simulation amounts to performing a rare event simulation. This is particularly true when K is very large relative to S 0, and the option is deep out-of-the-money. As a result, estimating C a using importance sampling will often result in a very large variance reduction. In order to apply importance sampling, we need to choose the sampling density, g( ). For this, we could take g( ) to be multivariate normal with variance-covariance matrix equal to the identity, I m, and mean vector, µ. That is we shift f(x) by µ. As before, a good possible value of µ might be µ = arg max x h(x)f(x) which can be found using numerical methods. 9 We note that it is not necessary that f and g come from the same family of distributions. In fact sometimes it is necessary to choose g from a different family of distributions. This might occur, for example, if it is difficult or inefficient to simulate from the family of distributions to which f belongs. In that case, our reason for using importance sampling in the first place is so that we can simulate from an easier distribution, g.

6 Monte-Carlo Methods for Risk Management 6 Potential Problems with the Maximum Principle Sometimes applying the maximum principle to choose g( ) will be difficult. For example, it may be the case that there are multiple or even infinitely many solutions to µ = arg max x h(x)f(x). Even when there is a unique solution, it may be the case that finding it is very difficult. In such circumstances, an alternative method for choosing g is to scale f. We will demonstrate this by example. Example 6 (Using Scaling to Select g( )) Assume in Example 3 that n =. Then θ = P ( X 1 + X 50 ) = E[I {X 1 +X 50} ] where X 1, X are IID N(0, 1). Then e (x 1 +x )/ h(x)f(x) = I {x 1 +x 50}. π Therefore, h(x)f(x) = 0 inside the circle x 1 + x 50 and it takes on its maximum value at every point on the circle x 1 + x = 50. As a result, it is not possible to apply the maximum principle. Before choosing a sampling density, g, recall that we would like g to put more weight on those parts of the sample space where h(x)f x (x) is large. One way to achieve this is by scaling the density of X = (X 1, X ) so that X is more dispersed. For example, we could take g to be multivariate normal with mean vector 0 and variance-covariance matrix ( ) σ 0 Σ g = 0 σ where σ > 1. Note that this simply means that under g, X 1 and X are IID N(0, σ ). Furthermore, when σ > 1, then more probability mass is given to the region X 1 + X 50 as desired. We could choose the value of σ using heuristic methods. One method would be to choose σ so that E g [X 1 + X ] = 50 which in this case would imply that σ = 5. Why? Then 10 ( ) θ = E[I {X 1 +X 50} ] = E g [σ exp X 1 (1 1/σ ) X (1 1/σ ) I {X 1 +X 50} ]. For the more general case where n >, we could proceed by again choosing σ so that E g [ n X i ] = 50. Tilted Densities Suppose f is light-tailed so that it has a moment generating function (MGF). Then a common way of generating the sampling density, g, from the original density, f, is to use the MGF of f. We use M x (t) to denote the MGF and it is defined by M x (t) = E f [e tx ]. Then a tilted density of f is given by f t (x) = etx f(x) M x (t) for < t <. The tilted densities are useful since a random variable with density f t ( ) tends to be larger than one with density f when t > 0, and smaller when t < 0. This means, for example, that if we want to sample more often from the region where X tends to be large, we might want to use a tilted density with t > 0 as our sampling density g. Similarly, if we want to sample more often from the region where X tends to be small, then we might use a tilted density with t < 0. Example 7 Suppose X is an exponential random variable with mean 1/λ. Then f(x) = λe λx for x 0, and it is easy to see that f t (x) = Ce (λ t)x where C is the constant that makes the density integrate to This is the same expression we had earlier in Example 3 except n =.

7 Monte-Carlo Methods for Risk Management 7 Example 8 Suppose X 1,..., X n are independent random variables, where X i has density f i ( ). Let S n := n X i and suppose we want to estimate θ := P(S n a) for some constant, a. If a is large so that we are dealing with a rare event we should use importance sampling to estimate θ. Since S n is large when the X i s are large it makes sense to sample each X i from its tilted density function, f i,t ( ) for some value of t > 0. Then we may write ] ( n n ) ] f i (X i ) θ = E[I {Sn a}] = E t [I {Sn a} = E t [I {Sn a} M i (t) e ts n f i,t (X i ) where E t [.] denotes expectation with respect to the X i s under the tilted densities, f i,t ( ), and M i (t) is the moment generating function of X i. If we write M(t) := n M i(t), then it is easy to see that the importance sampling estimator, θ n,i, satisfies θ n,i M(t)e ta. (3) Therefore a good choice of t would be that value that minimizes the bound in (3). Why is this? We can minimize this by minimizing log(m(t)e ta ) = log(m(t)) ta. It is straightforward to check that the minimizing value of t satisfies µ t = a where µ t := E t [S n ]. If we define the stopping time τ a := min{n 0 : S n a}, then P (τ a < ) is the probability that S n ever exceeds a. If E[X 1 ] > 0 and the X i s are IID with MGF, M X (t), then this probability equals one. The case of interest is then when E[X 1 ] 0. A similar argument to that of Example 8 implies θ = P (τ a < ) = E[I {τa< }] ( τa ) = E t [I {τa < } M X (t) = E t [ I {τa< }e tsτ a +τaψ(t)] e ts τa where ψ(t) := log(m X (t)) is the cumulant generating function. Note that if E t [X 1 ] > 0 then τ a < almost surely and so we obtain θ = E t [ e ts τ a +τaψ(t)]. In fact, this is an example where importance sampling can be used to ensure that the simulation stops almost surely. In Assignment 6 we discuss how to choose a good value of t based on the cumulant generating function. Note that this example has direct applications to the estimation of ruin probabilities in the context of insurance risk. For example, suppose X i := Y i ct i where Y i is the size of the i th claim, T i is the inter-arrival time between claims, c is the premium received per unit time and a is the initial reserve. Then θ is the probability that the insurance company ever goes bankrupt. Only in very simple models is it possible to calculate θ analytically. In general, Monte-Carlo approaches are required. Estimating Conditional Expectations Importance sampling can also be very useful for computing conditional expectations when the event being conditioned upon is a rare event. For example, suppose we wish to estimate θ = E[h(X) X A] where A is a rare event and X is a random vector with PDF, f. Then the density of X, given that X A, is so f(x x A) = f(x) P(X A) θ = E[h(X)I {X A}]. E[I {X A} ] for x A Now since A is a rare event we would be much better off if we could simulate using a sampling density, g, that makes A more likely to occur. Then, as usual, we would have θ = E g[h(x)i {X A} f(x)/g(x)]. E g [I {X A} f(x)/g(x)] ]

8 Monte-Carlo Methods for Risk Management 8 So to estimate θ using importance sampling, we would generate X 1,..., X n with density g( ), and set n θ n,i = h(x i)i {Xi A}f(X i )/g(x i ) n I. {X i A}f(X i )/g(x i ) In contrast to our usual estimators, θ n,i is no longer an average of n IID random variables but instead, it is the ratio of two such averages. This has implications for computing approximate confidence intervals for θ. In particular, confidence intervals should now be estimated using bootstrapping techniques. An obvious application of this methodology in risk management is the estimation of ES or CVaR. Difficulties with Importance Sampling The most difficult aspect to importance sampling is in choosing a good sampling density, g. In general, one needs to be very careful for it is possible to choose g according to some good heuristic such as the maximum principle, but to then find that g results in a variance increase. In fact it is possible to choose a g that results in an importance sampling estimator that has an infinite variance! This situation would typically occur when g puts too little weight relative to f on the tails of the distribution. An Application to Portfolio Credit Risk We consider 11 a portfolio loss of the form L = m e iy i where e i is the deterministic and positive exposure to the i th credit and Y i is the default indicator with corresponding default probability, p i. We assume that Y follows a Bernoulli mixture model which we now define. Definition 1 Let p < m and let Ψ = (Ψ 1,..., Ψ p ) be a p-dimensional random vector. Then we say the random vector Y = (Y 1,..., Y m ) follows a Bernoulli mixture model with factor vector Ψ if there are functions p i : R p [0, 1], 1 i m, such that conditional on Ψ the components of Y are independent Bernoulli random variables satisfying P (Y i = 1 Ψ = ψ) = p i (ψ). We are interested in the problem of estimating θ := P (L c) where c is substantially larger E[L]. Note that a good importance sampling distribution for θ should also yield a good importance sampling distribution for computing risk measures associated with the α-tail of the loss distribution where q α (L) c. We begin with the case where the default indicators are independent. Independent Default Indicators We define Ω to be the state space of Y so that Ω = {0, 1} m. Then Then P ({y}) = M L (t) = E f [e tl ] = m p yi i (1 p i) 1 y i, y Ω. m E[e te iy i ] = Let Q t be the corresponding tilted probability measure so that m Q t ({y}) = et eiyi M L (t) P ({y}) = = m m m ( pi e te i ) + 1 p i. e teiyi (p i e te i + 1 pi ) py i i (1 p i) 1 yi q y i t,i (1 q t,i) 1 yi 11 This application is described in Section 8.5 of MFE, but is based on the original paper of Glasserman and Li (005).

9 Monte-Carlo Methods for Risk Management 9 where q t,i := p i e tei /(p i e tei + 1 p i ) is the Q t probability of the i th credit defaulting. Note that the default indicators remain independent Bernoulli random variables under Q t. Since q t,i 1 as t and q t,i 0 as t it is clear that we shift the mean of L to any value in (0, m e i). The same argument that was used at the end of Example 8 suggests that we take t equal to that value that solves m q i,te i = c. This value can be found easily using numerical methods. Dependent Default Indicators Suppose now that there is a p-dimensional factor vector, Ψ, such that the default indicators are independent with default probabilities p i (ψ) conditional on Ψ = ψ. Suppose also that Ψ MVN p (0, Σ). The Monte-Carlo scheme for estimating θ is to first simulate Ψ and to then simulate Y conditional on Ψ. We can apply importance sampling to the second step using our discussion of independent default indicators above. However, we can also apply importance sampling to the first step. A natural way to do this is to simulate Ψ form the MVN p (µ, Σ) distribution for some µ R p. The corresponding likelihood ratio, r µ (Ψ) say, (f/g in our earlier notation) is given by the ratio of the two multivariate normal densities and satisfies exp ( 1 r µ (Ψ) = Ψ Σ 1 Ψ ) exp ( 1 (Ψ µ) Σ 1 (Ψ µ) ) = exp( µ Σ 1 Ψ + 1 µ Σ 1 µ). How Do We Choose µ? Recall that the quantity of interest is θ := P (L c) = E[P (L c Ψ)]. We know from our earlier discussion of importance sampling that we would like to choose the importance sampling density, g (Ψ) say, so that g (Ψ) P (L c Ψ) exp( 1 Ψ Σ 1 Ψ). (4) Of course this is not possible since we do not know P (L c Ψ), the very quantity that we wish to estimate. The maximum principle applied to the MVN p (µ, Σ) distribution would then suggest taking µ equal to the value of Ψ which maximizes the right-hand-side of (4). Again it is not possible to solve this problem exactly as we do not know P (L c Ψ) but numerical methods can be used to find good approximate solutions. See Glasserman and Li (005) for further details. The Importance Sampling Algorithm for Estimating θ = P (L c) 1. Generate Ψ 1,..., Ψ n independently from the MVN p (µ, Σ) distribution. For each Ψ i estimate P (L c Ψ = Ψ i ) using the importance sampling distribution that we described in IS our discussion of independent default indicators. Let θ n 1 (Ψ i ) be the corresponding estimator based on n 1 samples. 3. The full importance sampling estimator is then given by θ IS n = 1 n n r µ (Ψ i ) θ IS n 1 (Ψ i ). 3 Stratified Sampling Consider a game show where contestants first pick a ball at random from an urn and then receive a payoff, Y. The payoff is random and depends on the color of the selected ball so that if the color is c then Y is drawn from the PDF, f c. The urn contains red, green, blue and yellow balls, and each of the four colors is equally likely to be chosen. The producer of the game show would like to know how much a contestant will win on average when he plays the game. To answer this question, she decides to simulate the payoffs of n contestants and take their average payoff as her estimate. The payoff, Y, of each contestant is simulated as follows:

10 Monte-Carlo Methods for Risk Management Simulate a random variable, I, where I is equally likely to take any of the four values r, g, b and y. Simulate Y from the density f I (y). The average payoff, θ := E[Y ], is then estimated by n j=1 θ n := Y j. n Now suppose n = 1000, and that a red ball was chosen 46 times, a green ball 70 times, a blue ball 6 times and a yellow ball 58 times. Question: Would this influence your confidence in θ n? What if f g tended to produce very high payoffs and f b tended to produce very low payoffs? Is there anything that we could have done to avoid this type of problem occurring? The answer is yes. We know that each ball color should be selected 1/4 of the time so we could force this to be true by conducting four separate simulations, one each to estimate E[X I = c] for c = r, g, b, y. Note that E[Y ] = E[E[Y I]] = 1 4 E[Y I = r] E[Y I = g] E[Y I = b] + 1 E[Y I = y] 4 so that an unbiased estimator of θ is obtained by setting θ st,n := 1 4 θ r,nr θ g,ng θ b,nb θ y,ny (5) where θ c := E[Y I = c] for c = r, g, b, y. 1 How does the variance of θ st,n compare with the variance of θ n, the original raw simulation estimator? To answer this question, assume for now that n c = n/4 for each c, and that Y c is a sample from the density, f c. Then a fair 13 comparison of Var( θ n ) with Var( θ st,n ) should compare Var(Y 1 + Y + Y 3 + Y 4 ) with Var(Y r + Y g + Y b + Y y ) (6) where Y 1, Y, Y 3 and Y 4 are IID samples from the original simulation algorithm (i.e., where we first select the ball randomly and then receive the payoff), and the Y c s are independent with density f c ( ), for c = r, g, b, y. Now recall the conditional variance formula which states Each term in the right-hand-side of (7) is non-negative so this implies Var(Y ) E[Var(Y I)] Var(Y ) = E[Var(Y I)] + Var(E[Y I]). (7) = 1 4 Var(Y I = r) Var(Y I = g) Var(Y I = b) + 1 Var(Y I = y) 4 which implies = Var(Y r + Y g + Y b + Y y ) 4 Var(Y 1 + Y + Y 3 + Y 4 ) = 4Var(Y ) Var(Y r + Y g + Y b + Y y ). (8) As a result, we may conclude that using θ st,n instead of θ n leads to a variance reduction. This variance reduction will be substantial if I accounts for a large fraction of the variance of Y. Note also that the computational requirements for computing θ st,n are similar 14 to those required for computing θ n. We call θ st,n a stratified sampling estimator of θ, and we say that I is the stratification variable. 1 θc,nc is an estimate of θ c using n c samples. θ st,n is an estimate of θ using n samples, so it is implicitly assumed in (5) that n r + n g + n b + n y = n. 13 Fair here means that each estimator is based on the same total number of samples. 14 For this example, the stratified estimator will actually require less work, but it is also possible in general for it to require more work.

11 Monte-Carlo Methods for Risk Management 11 The Stratified Sampling Algorithm We will now formally describe the stratified sampling algorithm. Suppose as usual that we wish to estimate θ := E[Y ] where Y is a random variable. Let W be another random 15 variable that satisfies the following two conditions: Condition 1: For any R, P(W ) can be easily computed. Condition : It is easy to generate (Y W ), i.e., Y given W. Now divide R into m non-overlapping subintervals, 1,..., m, such that p j := P(W j ) > 0 and m j=1 p j = 1 where. Note that if W can take any value in R, then the first interval should be [, a], while the final interval should be [b, ] for some finite a and b. Notation 1. Let θ j := E[Y W j ] and σ j := Var(Y W j).. We define the random variable I by setting I := j if W j. 3. Let Y (j) denote a random variable with the same distribution as (Y W j ) (Y I = j). Our notation then implies θ j = E[Y I = j] = E[Y (j) ] and σ j = Var(Y I = j) = Var(Y (j) ). In particular we have θ = E[Y ] = E[E[Y I]] = p 1 E[Y I = 1] p m E[Y I = m] = p 1 θ p m θ m. Note that to estimate θ we only need to estimate the θ i s since by condition 1 above, the p i s are easily computed. Furthermore, we know how to estimate the θ i s by condition. If we use n i samples to estimate θ i, then an estimate of θ is given by θ st,n = p 1 θ1,n p m θm,nm. It is clear that θ st,n will be unbiased if for each i, θ i,ni is an unbiased estimate of θ i. Obtaining a Variance Reduction How does the stratification estimator compare with the usual raw simulation estimator? As was the case with the game show example, to answer this question we would like to compare Var( θ n ) with Var( θ st,n ). First we need to choose n 1,..., n m such that n n m = n. That is, we need to determine the number of samples, n i, that will be used to estimate each θ i, but in such a way that the total number of samples is equal to n. Clearly, the optimal approach would be to choose the n i s so as to minimize Var( θ st,n ). Consider for now, however, the sub-optimal allocation where we set n j := np j for j = 1,..., m. Then Var( θ st,n ) = Var(p 1 θ1,n p m θm,nm ) = p σ p m n 1 m j=1 = p jσj. n On the other hand, the usual simulation estimator has variance σ /n where σ := Var(Y ). Therefore, we need only show that m j=1 p jσ j < σ to prove that the non-optimized 16 stratification estimator has a lower variance 15 Y and W must be dependent to achieve a variance reduction. 16 The optimized stratification estimator refers to the estimator where we choose the n i s to minimize the variance of θ st,n. σm n m

12 Monte-Carlo Methods for Risk Management 1 than the usual raw estimator. 17 The proof that m j=1 p jσj < σ is precisely the same as that used for the game show example. In particular, equation (7) implies σ = Var(Y ) E[Var(Y I)] = m j=1 p jσj and the proof is complete! Optimizing the Stratified Estimator We know n1 θ st,n = p Y (1) nm i p m n 1 where for a fixed j, the Y (j) i s are IID Y (j). This then implies Y (m) i n m Var( θ st,n ) = p σ p σm m = n 1 n m m j=1 p j σ j n j. (9) Therefore, to minimize Var( θ st,n ) we must solve the following constrained optimization problem: min n j m j=1 p j σ j n j subject to n n m = n. (10) We can easily solve (10) using a Lagrange multiplier. The optimal solution is given by ( ) n p j σ j j = m j=1 p n (11) jσ j ( m ) and the minimized variance is given by Var( θ st,n ) = j=1 p jσ j /n. Note that the solution in (11) makes intuitive sense: if p j is large, then other things being equal, it makes sense to expend more effort simulating from stratum j, i.e., the region where W j j. Similarly, if σj is large then, other things again being equal, it makes sense to simulate more often from stratum j so as to get a more accurate estimate of θ j. Remark 1 It is interesting to note the following connection between stratified sampling and importance sampling. We know that when we importance sample, we like to sample more often from the important region. The choice of n j in (11) also means that we simulate more often from the important region when we use optimized stratified sampling. Remark Note also the connection between stratified sampling and conditional Monte-Carlo. Both methods rely on the conditional variance formula to prove that they lead to a variance reduction. The difference between the two methods can best be explained as follows. Suppose we wish to estimate θ := E[Y ] using simulation and we do this by first generating random variable, W, and then generating Y given W. In the conditional expectation method, we simulate W first, but then compute E[Y W ] analytically. In the stratified sampling method, we effectively generate W analytically, and then simulate Y given W. Advantages and Disadvantages of Stratified Sampling The obvious advantage of stratified sampling is that it leads to a variance reduction which can be very substantial if the stratification variable, W, accounts for a large fraction of the variance of Y. The main disadvantage of stratified sampling is that typically we do not know the σj s so it is impossible to compute the optimal n j s exactly. Of course we can overcome this problem by first doing m pilot simulations to estimate each σ j. If we let N p denote the total number of pilot simulations, then a good heuristic is to use N p /m runs for each individual pilot simulation. In order to obtain a reasonably good estimate of σj, a useful rule-of-thumb 17 The optimized stratification estimator would then of course achieve an even greater variance reduction.

13 Monte-Carlo Methods for Risk Management 13 is that N p /m should be greater than 30. If m is large however, and each simulation run is computationally expensive, then it may be the case that a lot of effort is expended in trying to estimate the optimal n j s. One method of overcoming this problem is to abandon the pilot simulations and simply use the sub-optimal allocation where n j = np j. We saw earlier that this allocation still results in a variance reduction which sometimes can be substantial. In practice, both methods are used. The decision to conduct pilot simulations should depend on the problem at hand. For example, if you have reason to believe that the σ j s will not vary too much then it should be the case that the optimal allocation and the sub-optimal allocation will be very similar. In this case, it is probably not worth doing the pilot simulations. On the other hand, if the σ j s vary considerably, then conducting the pilot runs may be worthwhile. Of course, a combination of the two is also possible where a only a subset of the pilot simulations is conducted. The stratified simulation algorithm is given below. We assume that the pilot simulations have already been completed, or it has been decided not to conduct them at all; either way, the n j s have been computed. We also show how the estimate, θ n,st, and the estimated variance, σ n,st, can be computed without having to store all the generated samples. That is, we simply keep track of Y (j) (j) i and Y i for each j since these quantities are all that is required 18 to compute θ n,st and σ n,st. Stratification Simulation Algorithm for Estimating θ set θ n,st = 0; σ n,st = 0; for j = 1 to m set sum j = 0; sum squares j = 0; for i = 1 to n j generate Y (j) i set sum j = sum j + Y (j) i set sum squares j = sum squares j + Y (j) i end for set θ j = sum j /n j set σ j = ( sum squares j sum j /n j) /(nj 1) set θ n,st = θ n,st + p j θ j set σ n,st = σ n,st + σ j p j /n j end for set approx. 100(1 α) % CI = θ n,st ± z 1 α/ σ n,st Financial Applications Most of our financial examples relate to option pricing in a geometric Brownian motion (GBM) setting. Since GBM is a very poor model of asset price dynamics, however, these examples are not directly applicable in practice. However, the specific techniques and results that we discuss are certainly applicable. For example, we know a multivariate t random vector can be generated using a multivariate normal random vector together with an independent chi-squared random variable. Therefore the techniques we discuss below for multivariate normal random vectors can also be used 19 for multivariate t random vectors. They are also applicable more generally to 18 Recall that θ n,st = m j=1 ( nj ) Y (j) i p n j, Var( θ n,st ) = ( nj ) ( m j j=1 Var Y (j) nj ) i p n j j and Var Y (j) i can be estimated n j knowing just n j Y (j) i and n j Y (j) i. As stated previously, any simulation study that requires a large number of samples should only keep track of these quantities, thereby avoiding the need to store every sample. 19 See Chapter 9 of Glasserman (004) for a risk management application in a multivariate t setting.

14 Monte-Carlo Methods for Risk Management 14 normal-mixture distributions, Gaussian and t copulas, as well as the simulation of SDE s. As stated earlier, Chapter 9 of Glasserman (004) should be sufficient to convince any reader of the general applicability of stratified sampling (as well as importance sampling) to risk management. Example 9 (Pricing a European Call Option) Suppose that we wish to price a European call option where we assume as usual that S t GBM(r, σ ). Then C 0 = E [ e rt max(0, S T K) ] = E[Y ] ( where Y = h(x) = e rt max 0, S 0 e (r σ /)T +σ ) T X K for X N(0, 1). While we know how to compute C 0 analytically, it is worthwhile seeing how we could estimate it using stratified simulation. Let W = X be our stratification variable. To see that we can stratify using this choice of W note that: (1) Computing P(W ) For R, P(W ) can easily be computed. Indeed, if = [a, b], then P(W ) = Φ(b) Φ(a), where Φ(.) is the CDF of a standard normal random variable. () Generating (Y W ) (h(x) X ) can easily be generated. We do this by first generating X := (X X ) and then take h( X). We generate X as follows. First note that if X N(0, 1), then we can generate an X using the inverse transform method by setting X = Φ 1 (U). The problem with such an X is that it may not lie in = [a, b]. However, we can overcome this problem by simply generating Ũ U(Φ(a), Φ(b)) and then setting X = Φ 1 (Ũ). It is then straightforward to check that X (X X [a, b]). It is therefore clear that we can estimate C 0 using X as a stratification variable. Example 10 (Pricing an Asian Call Option) Recall that the discounted payoff of an Asian call option is given by ( m Y := e rt max 0, S it/m m ) K and that it s price is given by C a = E[Y ] where we assume S t GBM(r, σ ). Now each S it/m may be expressed as ( ) S it/m = S 0 exp (r σ /) it m + σ T m (X X i ) (1) (13) where the X i s are IID N(0, 1). This means that we may then write C a = E [h(x 1,..., X m )] where the function h(.) is given implicitly by equations (1) and (13). So to estimate C a using our standard simulation algorithm, we would simply generate sample values of h(x 1,..., X m ) and take their average as our estimate. We can also, however, estimate C a using stratified sampling. 0 To do so, we must first choose a stratification variable, W. One possible choice would be to set W = X j for some j. However, this is unlikely to capture much of the variability of h(x 1,..., X m ). A much better choice would be to set W = m j=1 X j. Of course, we need to show that such a choice is possible. That is, we need to show that P(W ) is easily computed, and that (Y W ) is easily generated. (1) Computing P(W ) Since X 1,..., X m are IID N(0, 1), we immediately have that W N(0, m). If = [a, b] then P(W ) = P (N(0, m) ) = P (a N(0, m) b) 0 The method we now describe is also useful for pricing other path dependent options. See Glasserman (004) for further details.

15 Monte-Carlo Methods for Risk Management 15 ( a = P N(0, 1) b ) m m ( ) ( ) b a = Φ Φ. m m ( ( b Similarly, if = [b, ), then P(W ) = 1 Φ m ), and if = (, a], then P(W ) = Φ a m ). () Generating (Y W ) We need two results from the theory of multivariate normal random variables. The first result is well known to us: Result 1: Suppose X = (X 1,..., X m ) MVN(0, Σ). If we wish to generate a sample vector X, we first generate Z MVN(0, I m ) 1 and then set X = C T Z (14) where C T C = Σ. One possibility of course is to let C be the Cholesky decomposition of Σ, but in fact any matrix C that satisfies C T C = Σ will do. Result : Let a = (a 1 a... a m ) satisfy a = 1, i.e. a a m = 1, and let Z = (Z 1,..., Z m ) MVN(0, I m ). Then { } m (Z 1,..., Z m ) a i Z i = w MVN(wa T, I m a T a). Therefore, to generate {(Z 1,..., Z m ) m a iz i = w} we just need to generate a vector, V, where V MVN(wa T, I m a T a) = wa T + MVN(0, I m a T a). Generating such a V is very easy since (I m a T a) T (I m a T a) = I m a T a. That is, Σ T Σ = Σ where Σ = I m a T a, so we can take C = Σ in (14). Now, we can return to the problem of generating (Y W ). Since Y = h(x 1,..., X m ), we can clearly generate (Y W ) if we can generate [(X 1,..., X m ) m X i ]. To do this, suppose again that = [a, b]. Then [ (X 1,..., X m ) ] [ m X i [a, b] (X 1,..., X m ) Now we can generate [(X 1,..., X m ) m X i ] in two steps: [ 1 m Generate m X i 1 m X i m [ a m, [ 1 m Step 1: m X a i b m, m ]]. This is easy to do since ( [ 1 m m X a i N(0, 1) so we just need to generate N(0, 1) N(0, 1) m, the method described in Example 9. Let w be the generated value. Step : Now generate the comments that follow it. [ (X 1,..., X m ) b m ] ]. b m ]) which we can do using ] 1 m m X i = w which we can do by the second result above and 1 That is, we generate m IID N(0, 1) random variables.

16 Monte-Carlo Methods for Risk Management 16 Example 11 (Pricing a Barrier Option) Recall again the problem of pricing an option that has payoff { max(0, ST K h(x) = 1 ) if S T/ L, max(0, S T K ) otherwise. where X = (S T/, S T ). We can write the price of the option as [ ( )] C 0 = E e rt max(0, S T K 1 )I {ST / L} + max(0, S T K )I {ST / >L} where as usual, we assume that S t GBM(r, σ ). Using conditional Monte-Carlo, we can write (why?) C 0 = E[Y ] where ( ) Y := e rt/ c(s T/, T/, K 1, r, σ)i {ST / L} + c(s T/, T/, K, r, σ)i {ST / L} (15) and where c(x, t, k, r, σ) is the price of a European call option with strike k, interest rate r, volatility σ, time to maturity t, and initial stock price x. Question 1: Having conditioned on S T/, could we now also use stratified sampling? Question : Could we use importance sampling? Question 3: What about using importance sampling before doing the conditioning? 4 Simulating Stochastic Differential Equations We now discuss the simulation of stochastic differential equations (SDE s). This is an important computational tool that is useful for the pricing of exotic securities as well as evaluating model risk. Consider, for example, the down-and-out barrier option that was discussed in the Model Risk lecture notes. This option was priced used seven different models, all of which had been calibrated to the same implied volatility surface. Option prices for the seven different models were plotted in Figure 4 as a function of the barrier level and all of these prices were obtained by simulating the underlying SDE s. Brigo et al. (007) also discuss the simulation and estimation of SDE s from a risk-management perspective and provide Matlab code for their various algorithms. Brigo et al. (007) and Glasserman (004) can be consulted for further details and references. Simulating a 1-Dimensional SDE: The Euler Scheme Let us assume that we are faced with an SDE of the form dx t = µ(t, X t ) dt + σ(t, X t ) dw t (16) and that we wish to simulate values of X T but do not know its distribution. (This could be due to the fact that we cannot solve (16) to obtain an explicit solution for X T, or because we simply cannot determine the distribution of X T even though we do know how to solve (16)). When we simulate an SDE, what we mean is that we simulate a discretized version of the SDE. In particular, we simulate a discretized process, { X h, X h,..., X mh }, where m is the number of time steps, h is a constant and mh = T. The smaller the value of h, the closer our discretized path will be to the continuous-time path we wish to simulate. Of course this will be at the expense of greater computational effort. While there are a number of discretization schemes available, we will focus on the simplest and perhaps most common scheme, the Euler scheme. The Euler scheme is intuitive, easy to implement and satisfies X kh = X ) ( ) (k 1)h + µ ((k 1)h, X(k 1)h h + σ (k 1)h, X(k 1)h h Zk (17)

17 Monte-Carlo Methods for Risk Management 17 where the Z k s are IID N(0, 1). If we want to estimate θ := E[f(X T )] using the Euler scheme, then for a fixed number of paths, n, and discretization interval, h, we have the following algorithm. Using the Euler Scheme to Estimate θ = E[f(X T )] When X t Follows a 1-Dimensional SDE for j = 1 to n t = 0; X = X0 for k = 1 to T/h = m generate Z N(0, 1) set X = X + µ(t, X)h + σ(t, X) h Z set t = t + h end for set f j = f( X) end for set θ n = (f f n ))/n set σ n = n j=1 (f j θ n ) /(n 1) set approx. 100(1 α) % CI = θ σ n ± z n 1 α/ n Remark 3 Observe that even though we only care about X T, we still need to generate intermediate values, X ih, if we are to minimize the discretization error. Because of this discretization error, θ n is no longer an unbiased estimator of θ. Remark 4 If we wished to estimate θ = E[f(X t1,..., X tp )] then in general we would need to keep track of (X t1,..., X tp ) inside the inner for-loop of the algorithm. Exercise 1 Can you think of a derivative where the payoff depends on (X t1,..., X tp ), but where it would not be necessary to keep track of (X t1,..., X tp ) on each sample path? Simulating a Multidimensional SDE In the multidimensional case, X t, W t and µ(t, X t ) in (16) are now vectors, and σ(t, X t ) is a matrix. This situation arises when we have a series of SDE s in our model. This could occur in a number of financial engineering contexts. Some examples include: (1) Modeling the evolution of multiple stocks. This might be necessary if we are trying to price derivatives whose values depend on multiple stocks or state variables, or if we are studying the properties of some portfolio strategy with multiple assets. () Modeling the evolution of a single stock where we assume that the volatility of the stock is itself stochastic. Such a model is termed a stochastic volatility model. (3) Modeling the evolution of interest rates. For example, if we assume that the short rate, r t, is driven by a number of factors which themselves are stochastic and satisfy SDE s, then simulating r t amounts to simulating the SDE s that drive the factors. Examples include the multi-factor Gaussian and CIR models. Such models also occur in HJM and LIBOR market models. In all of these cases, whether or not we will have to simulate the SDE s will depend on the model in question and on the particular quantity that we wish to compute. If we do need to discretize the SDE s and simulate their discretized versions, then it is very straightforward. If there are n correlated Brownian motions driving the SDE s, then at each time step, t i, we must generate n IID N(0, 1) random variables. We would then use the

18 Monte-Carlo Methods for Risk Management 18 Cholesky Decomposition to generate X ti+1. This is exactly analogous to our method of generating correlated geometric Brownian motions. In the context of simulating multidimensional SDE s, however, it is more common to use independent Brownian motions as any correlations between components of the vector, X t, can be induced through the matrix, σ(t, X t ). Applications to Financial Engineering Example 1 (Option Pricing Under Stochastic Volatility) Suppose the evolution of the stock price, S t, under the risk-neutral probability measure is given by ds t = rs t dt + V t S t dw (1) t (18) dv t = α (b V t ) dt + σ V t dw () t. (19) If we want to price a European call option on the stock with expiration, T, and strike K, then the price is given by C 0 = exp( rt )E[max(S T K, 0)]. We could estimate C 0 by simulating n sample paths of {S t, V t } up to time T, and taking the average of exp( rt ) max(s T K, 0) over the n paths as our estimated call option price, Ĉ0. Exercise Write out the details of the algorithm that you would use to estimate C 0 in Example 1. Remark 5 It is certainly worth mentioning that the Euler scheme can perform poorly in practice when applied to the Heston stochastic volatility model. See the discussion of implementation risk below. Example 13 (The CIR Model with Time-Dependent Parameters) We assume the Q-dynamics of the short-rate, r t, are given by dr t = α[µ(t) r t ] dt + σ r t dw t (0) where µ(t) is a deterministic function of time. This generalized CIR model is used when we want to fit a CIR-type model to the initial term-structure. Suppose now that we wish to price a derivative security maturing at time T with payoff C T (r T ). Then its time 0 price, C 0, is given by [ C 0 = E 0 e ] T 0 rs ds C T (r T ). (1) The distribution of r t is not available in an easy-to-use closed form so perhaps the easiest way to estimate C 0 is by simulating the dynamics of r t. Towards this end, we could either use (0) and simulate r t directly or alternatively, we could simulate X t := f(r t ) where f( ) is an invertible transformation. Note that because of the discount factor in (1), it is also necessary to simulate the process, Y t, given by Y t = exp( t 0 r s ds). Exercise 3 Describe in detail how you would you would estimate C 0 in Example 13. Note that there are alternative ways to do this. What way do you prefer? Exercise 4 Suppose we wish to simulate the known dynamics of a zero-coupon bond. How would you ensure that the simulated process satisfies 0 < Z T t < 1?

19 Monte-Carlo Methods for Risk Management 19 Variance Reduction Techniques for Simulating SDE s Simulating SDE s is a computationally intensive task as we need to do a lot of work for each sample that we generate. As a result, variance reduction techniques are often very useful in such contexts. We give one example based on stratified sampling and the Brownian bridge. Note that these ideas could be applied very generally to many different models and risk management applications. Example 14 (The Brownian Bridge and Stratified Sampling) Consider a short rate model of the form dr t = µ(t, r t ) dt + σ(t, r t ) dw t. When pricing a derivative that matures at time T using an Euler scheme it is necessary to generate the path (W h, W h,..., W mh = W T ). It will often be the case, however, that the value of W T will be particularly significant in determining the payoff. As a result, we might want to stratify using the random variable, W T. This is easy to do for the following two reasons. (i) W T N(0, T ) so we can easily generate a sample of W T and (ii) We can easily generate (W h, W h,..., W T h W T ) by computing the relevant conditional distributions and then simulating from them. For example, it is straightforward to see that ( ) (v t)x + (t s)y (v t)(t s) (W t W s = x, W v = y) N, for s < t < v () v s v s and we can use this result to generate (W h W 0, W T ). More generally, we can use () to successively simulate (W h W 0, W T ), (W h W h, W T ),..., (W T h W T h, W T ). We can in fact simulate the points on the sample path in any order we like. In particular, to simulate W v we use () and condition on the two closest sample points before and after v, respectively, that have already been sampled. This method of pinning the beginning and end points of the Brownian motion is known as the Brownian bridge. Exercise 5 If we are working with a multi-dimensional correlated Brownian motion, W t, (e.g. in the context of a multi-factor model of the short rate) is it still easy to use the Brownian bridge construction where we first generate the random vector, W T? Allocation of Computational Resources An important issue that arises when simulating SDE s is the allocation of computational resources. In particular, we need to determine how many sample paths, n, to generate and how many steps, m, to simulate on each sample path. A smaller value of m will result in greater bias and numerical error, whereas a smaller value of n will result in greater statistical noise. If there is a fixed computational budget then it is important to choose n and m in an optimal manner. We now discuss this issue. Suppose dx t = µ(t, X t ) dt + σ(t, X t ) db t and that we wish to estimate θ := E[f(X T )] using an Euler approximation scheme. Let θ h n be an estimate of θ based upon simulating n sample paths using a total of m = T/h discretization points per path. In particular, we have θ h n = f( X h 1 ) f( X h n) n where X i h is the value of X T h on the ith path. Under some technical conditions on the process, X t, it is known that the Euler scheme has a weak order of convergence equal to 1 so that See Glasserman (004) for further details. E[f(X T )] E[f( X h T )] am 1 (3)

20 Monte-Carlo Methods for Risk Management 0 Table 1: Call Option Price Estimates Using Euler Scheme in Heston s Stochastic Volatility Model Time Steps Sticky Zero Reflection for some constant a and all m greater than some fixed m 0. Suppose now that we have a fixed computational budget, C, and that each simulation step costs c. We must therefore have n = C/mc. We would like to choose the optimal values of m (and therefore n) as a function of C. We do this by minimizing the mean squared error (MSE), which is the sum of the bias squared and the variance, v. In particular, (3) implies MSE a m + v (4) n for sufficiently large m. Substituting for n in (4), it is easy to see that it is optimal (for sufficiently large C) to take m C 1/3 (5) n C /3. (6) When it comes to estimating θ, (5) and (6) provide guidance as follows. We begin by using n 0 paths and m 0 discretization points per path to compute an initial estimate, θ 0, of θ. If we then compute a new estimate, θ 1, by setting m 1 = m 0, then (5) and (6) suggest we should set n 1 = 4n 0. We may then continue to compute new estimates, θ i, in this manner until the estimates and their associated confidence intervals converge. In general, if we increase m by a factor of then we should increase n by a factor of 4. Although estimating θ in this way requires additional computational resources, it is not usually necessary to perform more than two or three iterations, provided we begin with sufficiently large values of m 0 and n 0. Implementation Risk There are other important issues that arise when simulating SDE s. For example, while we have only described the Euler scheme, there are other more sophisticated discretization schemes that can also be used. They include, for example, predictor-corrector schemes or the Milstein scheme. In a sense that we will not define, these schemes have superior convergence properties than the Euler scheme. However, they are sometimes more difficult to implement, particularly in the multi-dimensional setting. It is also certainly worth mentioning that the Euler scheme can perform poorly in practice when applied to the Heston stochastic volatility 3 model of Example 1. In particular, depending on the model parameters, α, b and σ, it can perform poorly when applied to the process in (19), despite the fact that theoretical convergence is guaranteed. For example, Andersen reports the following results for pricing an at-the-money 10-year call option when r = q = 0. He takes α =.5, V 0 = b =.04, σ =.1, S 0 = 100 and ρ := Corr(W (1) t, W () t ) = 0.9. Using one million sample paths and a sticky zero or reflection assumption 4, he obtains the estimates displayed in Table 4 for the option price as a function of m, the number of discretization points. One therefore needs to be very careful when applying an Euler scheme to this SDE. Note that these convergence problems would be easily identified if we followed the procedure outlined in the paragraph immediately following (6). But for this particular pr0cess, one should use a better scheme such as that proposed by Andersen (007). More generally, it 3 The volatility process in the Heston model is also known as the CIR model in term applications. 4 The sticky zero assumption simply means that anytime the variance process, V t, goes negative in the Monte-Carlo it is replaced by 0. The reflection assumption replaces V t with V t. In the limit as m, the variance will stay non-negative with probability 1 so both assumptions are unnecessary in the limit.