Math493 - Fall 2013 - HW 4 Solutions

Math493 - Fall 2013 - HW 4 Solutions Renato Feres - Wash. U. Preliminaries We have up to this point ignored a central aspect of the Monte Carlo method: How to estimate errors? Clearly, the larger the sample size used in approximating the expected value of a random variable X by the sample mean, the greater is the precision of the approximation. In other words, according to the (weak) law of large numbers, which will be stated in detail below, if x 1, x 2,..., x are sample values of an independent, identically distributed sequence of random variables X 1, X 2,..., X having the same probability distribution as X, then E(X ) x 1+ x 2 + + x The question then is: For a desired precision of the approximation, how large should be? And in what precise sense does the sample mean converge to E(X ) as goes to infinity? This is a problem in statistical estimation, applied to computer simulated data (as opposed to real world data). The necessary theoretical tools are developed in chapters 4 and 5 of the textbook, the most important of which being the Central Limit Theorem. In this assignment we will take a quick look at some of those tools and apply them to a few Monte Carlo simulations. Chebyshev s inequality and the weak law of large numbers. Chebyshev s inequality is proved in section 1.10, pages 68 and 69 of the textbook. Theorem 1 (Chebyshev s inequality). If X is a random variable having mean µ and variance 2 then, for any positive constant k,. P ( X µ k ) 1 k 2. (1) The weak law of large numbers (theorem 5.1.1, page 290 of the textbook) is a simple consequence of Chebyshev s inequality. Before describing what it is, let us stop for a moment to consider a notion of limit of random variables implicit in inequality 1. Definition 1 (Convergence in probability). We say that a sequence X 1, X 2,... of random variables converges in probability towards X if for all ǫ>0 lim P ( X n X ǫ)=0. n In words: the probability that X n differs from X by more than (an arbitrarily small positive number) ǫ goes to zero as n grows towards infinity. The weak law of large numbers says that the sample mean of a sequence of i.i.d. random variables X 1, X 2,..., X, which is the random variable defined by X = X 1+ + X converges in probability to the common mean value µ=e(x i ). It also gives a (somewhat crude) estimation of error.,

Theorem 2 (The weak law of large numbers). Let X 1, X 2,... be a sequence of independent and identically distributed random variables having mean µ and finite variance, 2 <. Then, for any ǫ>0, It follows, in particular, that the sample mean X converges in probability to µ. P ( X µ ǫ ) 2 ǫ 2. (2) Proof. The following fact corresponds to Example 2.8.1 of the textbook. It will also be discussed in class before too long. (This is also shown at the beginning of Math 3200.) If X 1,..., X are i.i.d. random variables with mean µ and variance 2, then ) E (X = µ, Var (X )= 2. Therefore, Chebyshev s inequality applied to X implies ( X P µ k ) 1 k 2. (3) ow choose k as follows: k = (ǫ/). Substituting this value of k into inequality 3 gives the desired result. The weak law of large numbers justifies the idea of approximating an expected value by a finite sample mean, which is at the basis of the Monte Carlo method. It also provides a way to estimate errors, which is inequality 2. It says, in essence, that if we want a high probability 1 δ that the sample mean does not deviate from the actual mean by more than a small ǫ then it is enough to choose δ 2 /ǫ 2. The following figure illustrates the convergence of a sequence of sample values of X j, for j = 1,...,. X 2 1 0 1 2 0 100 200 300 400 500 The scatter plot in the background are the points (j, x j ), j = 1,..., (for = 500) where each x j is a sample value of an independent uniform random number X j between 2 and 2. Superimposed to this graph is the line plot of is the partial means. Point on this second graph have coordinates (j, x j ), where x j = (x 1 + +x j )/j (and the points are connected by lines.) I used the following R-script to obtain this graph. ote the use of cumsum to obtain the partial means. The point to note about the graph is that the partial means fluctuate less and less about the mean, and it clearly 2

appears to converge to the expected value 0. (Graphs like this one actually illustrate a stronger sense of convergence than that implied by the weak law of large numbers.) The graph was generated by the following script. =c(1:500) X=runif(length(),-2,2) plot(,x,pch=.,ylim=c(-2,2)) points(,cumsum(x)/,type= l ) An example. To illustrate how the weak law of large numbers can be used to estimate errors, consider the following very simple example. Let X 1, X 2,... be i.i.d. random numbers in [0,1] with the uniform distribution. It is an easy calculus exercise to check that Var(X i )=1/12. Suppose that we want our estimate X of the mean µ to differ from the exact value of the mean by no more than 0.01: X µ 0.01. We can never be 100% sure that this will happen, no matter how big is, but we can ask to be, say 99.9% sure. This means that we want: P ( ) X µ 0.01 0.001. How big should be, then? Here ǫ = 0.01 and we may take 2 ǫ 2 0.001. Therefore, should satisfy 107 12 106. This essentially solves the problem we had set out so solve. There are two issues, though. One is that we have used the explicitly computed value for 2 ; but we may not know this value in an actual problem any more than the value of µ, which is what we want to obtain in the first place. (Of course, in this simple example we know that µ=1/2.) We will later consider the (easy to resolve) issue of estimating 2 from the data. Another problem is that the sample size we obtain from Chebyshev s inequality is very inefficient. In other words, we can often achieve the same precision with much smaller than what Chebychev s theorem would suggest. Here is the actual simulation, with = 10 6 : > =10^6 > X=runif() #Choose random numbers between 0 and 1 > p=sum(x)/ #Obtain the empirical mean > p [1] 0.5005443 > abs(p-0.5) #Compare empirical mean with the exact mean [1] 0.0005442686 ote how the precision obtained here, p 0.5 5 10 4, is so much better than 0.01. Let us estimate how likely it is to get such a good approximation. (I could have been extremely luck here!) The following program repeats the same experiment 1000 times and counts how many times the error is less than 5 10 4. 3

M=10^3 #umber of times empirical mean is computed =10^6 #Sample size a=0 #Initialize the number of times the sample # m-0.5 < 5*10^(-4) for (i in 1:M) { X=runif() p=sum(x)/ a=a+(abs(p-0.5)<5*10^(-4)) } a/m #Relative frequency of getting as good or #better precision than m-0.5 < 5*10^(-4). The value obtained fora/m was 0.923. This means that we should get just as good or better an approximation than the one I first got about 90% of the time. There is another way of obtaining much better (smaller) values of by using the centrally important Central Limit Theorem. As we will see, the central limit theorem says that the probability distribution of the sample mean X can be approximated by a normal distribution when is large. So before we turn to the CLT, we need to introduce normal random variables. ormal random variables. 2 if its probability density function is given by A random variable X is said to be normally distributed with mean µ and variance f (x)= { 1 exp 1 2π 2 2 ( x µ ) } 2 for <x <. By a simple integral calculation (see section 3.4 in the text) you can show that E(X )=µ and Var(X )= 2. A useful fact to observe is that if X is normally distributed with mean µ and variance 2, then Z = X µ (4) is also a normal random variable with mean 0 and variance 1. Therefore, Z has pdf f (x)= 1 2π e x2 2 Thus the cumulative distribution function of Z takes the form Φ(x)= 1 2π x e s2 /2 d s. We say that the normal random variable Z with mean 0 and variance 1 is a standard normal random variable. Knowledge of the cumulative distribution function of Z makes it possible to compute probabilities for an arbitrary normal random variable X, by noting that ( X µ F X (x)=p (X x)=p x µ ) ( = P Z x µ ) =Φ ( x µ Values ofφ(z) can be obtained by looking up in a table, which is the traditional method. It is also easily obtained in R. The main R functions associated to the normal distribution are: ). 4

dnorm density function pnorm cumulative distribution function qnorm quantile function rnorm random variable The following examples illustrate the use of each of these four functions. #dnorm is the pdf of a normal r.v. #It has the following form, where the below given values #for mean and sd are the default values of #mu and sigma (the standard deviation): #dnorm(x, mean = 0, sd = 1) #For example (omitting the mean and standard deviation), > dnorm(0) [1] 0.3989423 > dnorm(3,mean=3,sd=1) [1] 0.3989423 > dnorm(10,mean=3,sd=1) [1] 9.13472e-12 #The above number is very small since x=10 is far into the #right tail of the density function. #We can use dnorm to draw a graph #of the normal density. This is done next. # #Plot of the density curve of a normal distribution: x=seq(from=-3,to=3,length.out=100) #Set of points on the x-axis y=dnorm(x) #Values of the normal density on those x values plot(x,y,main="standard ormal Distribution",type= l,ylab="density",xlab= z ) abline(h=0) #Adds a horizontal straight line at y=0 #We want to shade the region under the graph over the #interval [1,2]. region.x=x[1<=x & x<=2] region.y=y[1<=x & x<=2] region.x=c(region.x[1],region.x,tail(region.x,1)) region.y=c( 0,region.y, 0) polygon(region.x,region.y,density=10) The graph is shown in the next figure. Areas under the density plot indicate probabilities. For example, the shaded area in the above graph represents the probability P(1 Z 2), where Z is a standard normal random variable. This interpretation of the p.d.f graph is, of course, general and doesn t only apply to normal random variables. The function pnorm is the cumulative distribution function. In particular, pnorm(z, mean = 0, sd = 1) is the same asφ(z). It will become very useful soon. Here are some examples to show the usage. #The main parameters of the function are indicated here: 5

Standard ormal Distribution Density 0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 z #pnorm(x, mean = 0, sd = 1) #Here x is any real number, positive, negative, or zero. #For example, > pnorm(0) [1] 0.5 > pnorm(1,mean=0,sd=2) [1] 0.6914625 > 1-pnorm(-1,mean=0,sd=2) [1] 0.6914625 #ote that pnorm(x) is greater than 1/2 if x>0 and #less than 1/2 if x<0. The quantile function qnorm is the inverse function of pnorm. Therefore, its argument has to be a number between 0 and 1. > qnorm(0.5,mean=0,sd=2) [1] 0 > pnorm(qnorm(.75)) [1] 0.75 > qnorm(pnorm(3)) [1] 3 > qnorm(pnorm(3,mean=0,sd=2.5),mean=1,sd=1) [1] 2.2 Finally, the function rnorm is the random variable itself. This is what you use to generate normally distributed random numbers. For example, suppose we wish to generate 10000 normally distributed random variables with µ = 0.5 and = 2, then plot a histogram. The two lines > x=rnorm(10000,mean=0.5,sd=2) > hist(x,25) generate the graph It is apparent that the histogram is an approximation of the density function graph. 6

Frequency 0 200 400 600 800 1000 Histogram of x 5 0 5 x The difference between histogram plots and density plots. A histogram is often used to describe the probability distribution of empirical ( real world or computer simulated) random data, whereas the graph of a probability density function is often used to describe the theoretical model of the source of that data. The next graph superimposes to a histogram plot obtained from 1000 values generated by rnorm (the empirical distribution) the plot of the theoretical distribution describing that data, which is the graph of the R-functiondnorm. Histogram of x Density 0.00 0.05 0.10 0.15 0.20 10 5 0 5 10 x The figure was generated by the following code. ote, in particular, the use oflines. Once something has been plotted, the R-command lines can be used to add new features to the displayed graph; in this case it drew the density plot over the histogram. x=rnorm(1000,mean=0,sd=2) hist(x,breaks=seq(-10,10,by=0.5),freq=false) z=seq(-10,10,0.1) lines(z,dnorm(z,mean=0,sd=2)) 7

The Central Limit Theorem. We now turn to the main theorem of this assignment. Theorem 3 (The Central Limit Theorem). Let X 1, X 2,... be a sequence of independent and identically distributed random variables with mean µ and finite variance 2. Let X be the mean of the first random variables in the sequence. Then ( ) lim P X µ z =Φ(z). The theorem says that the sequence of random variables X µ, where is the variance of X, converges in distribution to a standard normal random variable Z. In other words, for large values of, ( X P µ k ) P ( Z k). ote the following area relations: or, more formally, P ( Z k)=1 2P(Z k)=1 2[1 P(Z < k)]=2p(z < k) 1=2Φ(k) 1. Therefore, ( X P µ k ) 2Φ(k) 1. (5) The proof of Theorem 3 is in chapter 5 of the textbook. I will make a few comments on it below. A concrete illustration of the CLT. Let X be a random variable and X 1, X 2,... independent random variables having the same distribution as X. The central limit theorem essentially says that X 1 + + X is approximately normally distributed for large, regardless of how X is distributed. As an example, suppose that X is a uniformly distributed random number between 1 and 2. Its mean is 3/2 and variance 2 = 1/12. ote that X n has mean 3/2 and standard deviation 1/ 12n. In the below four graphs we compare, for each n = 1,2,3,10, the graph of the standard normal density (dashed line) and a histogram of 10 5 values of Z n, which is defined by Z n = X n 3 2 1 12n = ( 12n X n 3 ). 2 ote that each Z n has mean 0 and standard deviation 12n. By the central limit theorem, Z n should converge in distribution to a standard normal random variable. This can be seen reasonably clearly in the below graphs. Here is the code I used to produce these graphs: par(mfrow=c(2,2))#creates a 2X2 plotting area =10^5 #Sample size ############################################# n=1 8

n=1 n=2 density 0.0 0.2 0.4 density 0.0 0.2 0.4 4 2 0 2 4 4 2 0 2 4 x x n=3 n=10 density 0.0 0.2 0.4 density 0.0 0.2 0.4 4 2 0 2 4 4 2 0 2 4 x x x=matrix(0,1,) for (i in 1:n){ x=x+runif(,1,2) } x=x/n x=(x-3/2)*sqrt(12*n)#substract mean, divide by sample variance. hist(x,breaks=seq(from=-4.5,to=4.5,by=0.2),freq=false, xlim=range(c(-4.5,4.5)),ylim=range(c(0,0.42)), main= n=1,xlab= x,ylab= density ) z=seq(-4.5,4.5,0.1) lines(z,dnorm(z,mean=0,sd=1),type= l,lty= dashed ) abline(h=0) grid() ############################################# n=2 x=matrix(0,1,) for (i in 1:n){ x=x+runif(,1,2) } x=x/n x=(x-3/2)*sqrt(12*n)#substract mean, divide by sample variance. hist(x,breaks=seq(from=-4.5,to=4.5,by=0.2),freq=false, xlim=range(c(-4.5,4.5)),ylim=range(c(0,0.42)), main= n=2,xlab= x,ylab= density ) z=seq(-4.5,4.5,0.1) lines(z,dnorm(z,mean=0,sd=1),type= l,lty= dashed ) abline(h=0) 9

grid() ############################################# n=3 x=matrix(0,1,) for (i in 1:n){ x=x+runif(,1,2) } x=x/n x=(x-3/2)*sqrt(12*n)#substract mean, divide by sample variance. hist(x,breaks=seq(from=-4.5,to=4.5,by=0.2),freq=false, xlim=range(c(-4.5,4.5)),ylim=range(c(0,0.42)), main= n=3,xlab= x,ylab= density ) z=seq(-4.5,4.5,0.1) lines(z,dnorm(z,mean=0,sd=1),type= l,lty= dashed ) abline(h=0) grid() ############################################# n=10 x=matrix(0,1,) for (i in 1:n){ x=x+runif(,1,2) } x=x/n x=(x-3/2)*sqrt(12*n)#substract mean, divide by sample variance. hist(x,breaks=seq(from=-4.5,to=4.5,by=0.2),freq=false, xlim=range(c(-4.5,4.5)),ylim=range(c(0,0.42)), main= n=10,xlab= x,ylab= density ) z=seq(-4.5,4.5,0.1) lines(z,dnorm(z,mean=0,sd=1),type= l,lty= dashed ) abline(h=0) grid() ############################################# Obtaining error estimates from the CLT (confidence intervals). Let us now return to the problem of obtaining error bounds in our Monte Carlo estimation of the mean of a random variable. Recall that good precision means that the absolute difference between the estimated mean using X and the actual mean µ should be small with high probability. Problem. Suppose that independent coin flips are simulated, where = 2.5 10 5. So we consider i.i.d. random variables X 1,..., X such that X i {0,1}, p(0)= p(1)=0.5. An error tolerance value ǫ is set for estimating the mean µ by the sample mean X. 1. Find the mean µ=e(x i ) and variance Var(X i ). Solution. The mean is µ= 1 2 0+ 1 2 1= 1 2 10

and the variance is 2 = 1 2 2. Suppose that ǫ=10 3. Find the probability P ( 0 1 2 ) 2 + 1 2 ( X µ ǫ ( 1 1 ) 2 = 1 2 4. ) Solution. We need the approximate identity 5. ote that the identity can be written as follows: P ( ( ) X ǫ µ ǫ 2Φ. ) 1. (6) ow, Therefore, the probability we want is ǫ = 10 3 2.5 10 5 = 1. 1/2 p= 2Φ(1) 1. We can calculate p using the R-function cumulative distribution functionpnorm: p=2*pnorm(1)-1 which gives the values p = 0.6826895. Therefore, the probability that X µ 10 3 is approximately 70%. Said differently, if we compute many values of the sample mean X, for = 2.5 10 5, then 70% of the time resulting value of X will be that close to the exact mean. 3. What should we choose if we want X µ 10 3 with 90% confidence? Solution. Using the approximate identity 6 once again, the problem now is to find so that ( ǫ ) 2Φ 1=0.9 for ǫ=10 3. ote that ǫ 1 = 2 10 3. Therefore, we need to solve for in ( Φ 2 10 3 ) = (1+0.9)/2=0.95. ow we need the inverse function ofφ. This inverse is the quantile function, qnorm. Using R we obtain qnorm(0.95)=1.644854. So 2 10 3 = 1.645. Finally, = ((1.645/2) 10 3 ) 2 = 676506.2 7 10 5. What if we don t know? The point of using the Monte Carlo method is to have a way of computing µ when a direct evaluation of E[X ] may not be feasible. In those cases, the direct evaluation of 2 is likely also not feasible. Therefore, 2 should also be estimated from the data. This can be done as follows. Define the sample variance, S 2, by S 2 = 1 ( ) 2 X i X. 1 In one of the problems in this homework you are going to prove the following: i=1 2 = E ( S 2). 11

So it seems reasonable, and will be justified later (possibly in Math 494) that for a large enough sample size the estimated value of 2 given by the value of S 2 computed from the data can be used in identity 6. Let us compare 2 and S 2 for the previous example. We already know from part (1) that 2 = 0.25. To find a sample value of S 2 : = 1000 # sample values of a uniform r.v. between 0 and 1: x = sample(c(0,1),,prob=c(0.5,0.5),replace=true) #This is the sample mean m = sum(x)/ #This is the sample variance: s2 = (-1)^(-1)*sum((x-m)^2) The value I got this for the variance estimator was 0.250025. This seems to be sufficiently close to confirm our claim that we can substitute S for when we do not know. Moral of the story. The Monte Carlo method, generally speaking, amounts to expressing the solution to a problem as the expected value µ of some random variable X. By the law of large numbers, this expected value can be approximated by the values of the sample mean X, for some large. Approximation should be understood in the sense of convergence in probability, as explained earlier. In order to decide how large to choose for a desired level of precision, we can use the conclusion of the CLT, now substituting S for : ( X P µ S ) z 2Φ(z) 1 (7) for large. Here, 100 will typically be enough for this approximation to be acceptable. This means that, when determining the appropriate for a given precision, you should assume at least 100 so that equation 7, which is the main tool for estimating, is applicable. The procedure can be summarized by the following algorithm: 1. Choose a level of confidence, say a = 0.99. Find the value of z by solving the equation 2Φ(z) 1 = a. Recall that the R-functionqnorm is the inverse function ofφ; 2. Choose a precision level ǫ; 3. Generate at least 100 data values; 4. Continue to generate additional data values until you obtain S/ < ǫ/z. 5. The estimate of µ, with the given precision ǫ and confidence level a, is then X. By this procedure, 100a = 99% of the time, the obtained value of X would be no more than ǫ away from the true value µ. 12

Problems 1. Sample variance. Let X 1, X 2,..., X be a random sample (that is, a sequence of i.i.d. random variables), with mean µ and variance 2. Define the sample variance as follows: S 2 = 1 ( ) 2 X i X. 1 i=1 Prove that S 2 is an unbiased estimator of 2. That is, show that E ( S 2) = 2. Hint: This is Example 2.8.2 in the textbook. Keep in mind the following fact, which will be discussed later in class: If X and Y are independent random variables with finite expectations, then E(X Y ) = E(X )E(Y ).) Solution. ote that X i µ and X j µ are independent random variables if i j, and their expectations are equal to 0. Therefore, E [( X i µ )( X j µ )] E ( X i µ ) E ( X j µ ) = 0 = This remark is used in the transition from the fourth to the fifth term below: [ ( ) ] [( 2 1 E X µ = E = 1 2 ( Xi µ )) 2 ] [ 1 = E i=1 E i=1 2 2 i=1 j=1 [ (Xi µ ) 2 ] = 1 2 ( 2 ) = 2. By a very similar, but simpler argument you obtain: ( Xi µ )( X j µ )] = 1 if i j if i = j 2 i=1 j=1 E [( X i µ )( X j µ )] [ (Xi E µ )( X µ )]= 2. ow observe that ( ) 2= ( ) 2= ( X i X X i µ+µ X Xi µ ) 2 ( + 2 Xi µ )( ) ( ) 2 µ X + µ X. By taking expectations, and using the previous facts, we obtain [ ( ) 2 E X i X ]= 2 2 2 + 2 = 1 2. Finally, But this is what we wanted to show. E ( S 2) = 1 [ ( ) ] 2 E X i X = 2. 1 i=1 2. Approaching the mean. The approach of the sample means of a sequence of i.i.d. random variables to the actual mean µ as the sample size increases was observed in the graph given above, after the statement of the weak law of large numbers. The graph shows a scatter plot of 500 points (j, x j ), where the x j are sample values generated byrcauchy, and the line plot of the partial means x j. (a) Produce a similar graph for the Cauchy random variable X. (The Cauchy distribution is defined in Exercise 1.8.10, page 57, of the textbook. Cauchy distributed random numbers can be generated in R with the function rcauchy.) 13

Solution. Here is the graph: X 10 5 0 5 10 0 100 200 300 400 500 It was generated using the script =c(1:500) X=rcauchy(length()) plot(,x,pch=.,ylim=c(-10,10)) points(,cumsum(x)/,type= l ) (b) Does your graph lend support to the weak law of large numbers theorem? If not, what may be wrong with applying that theorem to this case? Solution. The graph does not seem to indicate convergence. There are big jumps at random steps that do not seem to decrease. In fact, the law of large numbers does not apply to Cauchy random variables. One requirement for that theorem to apply is that the mean and variance should be finite, but this is not the case for the Cauchy distribution, as we have shown in class. 3. Computing areas. Consider the following random experiment, whose goal is to approximate π by a Monte Carlo simulation. Let P 1,P 2,...,P be a sequence of independent, uniformly distributed random points in the square [ 1,1] [ 1,1], and let m be the number of those points that fall into the disc D = {(x, y) : x 2 + y 2 1}. Let X i be the random variable with values in {0,1}, such that X i = 1 if P i lies in the disc and 0 if not. (a) What is the expected value µ and variance 2 of X i? Solution. Let 1 D be the indicator function of the disc. Then X i = 1 D (P i ); so X i is a discrete random variable with possible values 0 and 1. Since the P i are uniformly distributed over the square, the probability that P i lies in the disc is proportional to the area of the disc. Therefore X i has pmf P(X i = 1)= Area(D) Area(S) p(x)= = π 4 if x = 1 P(X i = 0)=1 π 4 if x = 0 The mean value of X i is then µ=e(x i )=1 p(1)+0 p(0)= π 4. 14

The second moment of X i is E(X 2 i )=12 p(1)+0 2 p(0)= p(1)=π/4. Therefore, 2 = E ( X 2 ) i µ 2 = π ( π ) 2= 4 π 4 4 ( 1 π ). 4 (b) Explain (by citing the appropriate theorem) that for large values of n, the ratio m/n approximates π/4. Solution. This is precisely what the law of large numbers implies. If we regard m as a random variable, then X 1 + + X n = m. So X n = m n. The weak law of large numbers says that m/n converges in probability to µ=π/4. (c) If n= 5 10 5, find the probability that the error X n µ is no greater than 0.0005, where X n is the sample mean. (Here and below, use the estimate 5 or 7, obtained from the central limit theorem, rather than Chebyshev s inequality for the estimation of errors.) Solution. The main identity we need is ( X P n µ ) z 2Φ(z) 1 n Letting ǫ=0.0005, then z = ǫ z = ǫ n n = 0.0005 5 10 5 π ( ) = 0.86. 1 π Recall thatφ(z) in R ispnorm(z). The probability we want is 4 4 2Φ(z) 1=0.61. Therefore, X n µ 0.0005 with probability a little greater than 0.6. (d) Do a simulation of the experiment of the previous item. Give a few (say, 10) sample values of the approximation of π you obtain in this way. Solution. I used the following script: n=5*10^5 #The x and y coordinates of the random points are X=2*runif(n)-1 Y=2*runif(n)-1 #The number of random points in the disc is m=sum(x^2+y^2<1) #The approximate probability times 4 is 4*m/n Here are a few values: 3.140752, 3.1424, 3.139712, 3.138544, 3.141736, 3.13968, 3.13976, 3.142752, 3.144528, 3.143544 The following simple program can check empirically how often we get the asked for precision ǫ. a=0*c(1:100) for (i in 1:100){ 15

n=5*10^5 #The x and y coordinates of the random points are X=2*runif(n)-1 Y=2*runif(n)-1 #The number of random points in the disc is u=(x^2+y^2<1) m=sum(u) #The approximate probability times 4 is a[i]=m/n } sum(abs(a-pi/4)<0.0005)/100 The fraction of times the sample mean gave a value less than 0.0005 away from π/4 was 0.6, as predicted. ote that the precision for the estimation of π is less since multiplying by 4 reduces the precision. In fact, about 60% of the time we get that 4X n π < 4 0.0005= 0.002. (e) How large a sample size n would be needed to insure that X n µ 0.0005 happens 90% of the time? (This percentage should be interpreted as follows: If your obtain k independent sample values of X n, then the inequality would hold for approximately 0.9k of those k values. ) Solution. We use again the identity ( X P n µ ) z 2Φ(z) 1 n ow the probability is 0.9, so the value of z we need is the solution to 2Φ(z) 1=0.9 Φ(z)=1.9/2=0.95. We can solve for z using the quantile functionz=qnorm(0.95). We get z = 1.645. The problem now is to solve for n in ( z z = ǫ n= n ǫ ) 2= π 4 ( 1 π ) ( ) 1.645 2 = 1824379 2 10 6. 4 0.0005 As before, keep in mind that this is the precision for estimating µ=π/4. The error for estimating π is 4 times greater. 4. Monte Carlo approach to the Buffon needle problem. In problem 2 of homework assignment 3, you proved that the probability for Buffon s needle to cross a line is p = 2l/πa. Let us assume now that a = 1 and l = 1/2, so that p = 1/π. Using the algorithm described at the end of the above tutorial, write a program to compute 1/π by Monte Carlo simulation. Use precision ǫ=0.01 and the confidence level a = 0.99. How large an did your program require? (Hint: otice that is now a random variable; you do not choose it in advance. One way to do it would be to perform the needle experiment inside a while loop; the condition for exiting the loop would be S (the square root of the sample variance) being less than some appropriate number given in terms of ǫ. At each step, choose a random x and random θ as described in the previous assignment; then compute the vertical coordinates of the head and tip of the needle: x and x +l sin θ. If either 0 or 1 lies between these two values, count one more crossing. The ratio of number of crossings per total number of trials approximates the probability p.) Solution. Let X 1, X 2,..., X be independent random variables with values in {0,1} such that X i = 1 describes the event needle crosses a line at ith step and X i = 0 the negation of X i = 1. 16

A simple algebraic derivation using the fact that X 2 i = X i yields the following simplification of the expression for the sample variance: S 2 = 1 ( ) 2= ) X i X 1 1 X (1 X. i=1 It is convenient to write A = X 1 + +X = X. Here is a simple way to update the sample mean and variance at the (+ 1)st step from their values at step : A +1 = A + X +1, X +1 = A +1 + 1, S2 +1 = + 1 ) X +1 (1 X +1. My implementation of the Monte Carlo algorithm for this problem is here: a=0.99 #Choose confidence level epsilon=0.01 #Choose precision z=qnorm((1+a)/2) #Find the value of z for the chosen a =0 #Initialize variable to count number of steps Sbound=0 #Simulation will run till sample variance is #less than Sbound. This quantity will be updated inside the #while loop for each new. A=0 #Initialize sum X1 +... + X S=0 while (S>=Sbound <100) { x=runif(1) theta=2*pi*runif(1) x1=x+0.5*sin(theta) cross=(x1<0 x1>1) =+1 A=A+cross #M is the sample mean M=A/ S=(/(max(-1,0)))*M*(1-M) Sbound=sqrt()*epsilon/z } M #This is the sample mean that approximates 1/pi #This is the number of steps abs(m-1/pi) #This is the error in estimating the mean One run of this gave me the following values: x = 0.3113949, x (1/π) =0.007, = 3054, 1 x = 3.211356. It is a poor approximation of π to be sure, but it only used a little over 3000 steps. For better precision we can use smaller values of ǫ. 17