Exercise 2: Numerical Analysis and Simulation using Matlab

Transcription

1 Exercise 2: Numerical Analysis and Simulation using Matlab Part I Time costs of computing (10 marks) Write a Matlab script that performs the following: 1. Using the rand command, generate a vector of 500 observations uniformly distributed between 0 and 10000; call this vector a. 2. Again using the rand command, generate another vector of 500 observations uniformly distributed between 0 and 1; call this vector b. 3. Repeat the following operations for iterations: i. The sum of a and b ii. The difference of a and b iii. The scalar product of a and b iv. The scalar division of a and b v. The square root of a vi. The exponential function of a. vii. The sine of a viii. The tangent of a Look up the profile command in the Matlab help files. Use the profile command to compare the computing time of these operations. Express the time costs of each operation as a ratio relative to i.; report these ratios in a table. Suggested solution The M-file ops.m executes steps 1 to 3 as follows: T = 10000; n = 500; rand('state',1); a = 10000*rand(n,1); b = rand(n,1); for i = 1:T a+b; a-b; a.*b; a./b; sqrt(a); exp(a); sin(a); tan(a); Econ 353 Spring 2006 Page 1 of 13

2 The Matlab profiler traces the sequence of calls to functions and scripts and analyses how much time is spent executing specific operations. The profile command includes an option for reporting the detail for built-in functions and operators like addition and multiplication. Running the commands >> profile on -detail operator >> ops >> profile report opsreport produces a report in HTML format. The report states how much time was spent on the following lines: 7: for i = 1:T % 8: a+b; % 9: a-b; % 10: a.*b; % 11: a./b; % 12: sqrt(a); % 13: exp(a); % 14: sin(a); % 15: tan(a); % 16: Expressing the computing times as ratios: Operation Index a+b 1.00 a-b 1.58 a.*b 1.08 a./b 1.92 sqrt(a) exp(a) sin(a) tan(a) Thus we can see that the last 4 operations are an order of magnitude more costly relative to the elementary arithmetic operations. Econ 353 Spring 2006 Page 2 of 13

3 Part II Stopping rules in iterative methods (15 marks) In Lab 1, you looked at a Matlab implementation of the simple Walrasian iterative. Modify the code in walras1.m to consider two different stopping rules: Rule 1: Stop if p k p k+1 / (1 + p k ) ε Rule 2: Stop if p k p k+1 ε (1 β*) where β* = max j=1,,k p k+1-j - p k+1 / p k-j - p k (Here,. means absolute value of.) For each rule, report the final value of the iterative and the number of iterations for ε = 10-2, 10-4, 10-6, Define a suitable accuracy measure and use it to evaluate each rule-ε combination: which is the most accurate? Based on your results, provide an estimate for the iterative s rate of (linear) convergence. Suggested solution See the M-files walras_rule1.m and walras_rule2.m for the suggested Matlab implementations. The main loops of these M-files appear below. [Marking guide: I would suggest allocating 10 marks to the quality of the Matlab implementation and reported results, 5 marks for the analysis.] Results for Rule 1: Value of ε Final no. of iterations, k Final value of the iterative, p k Excess demand evaluated at final p k [walras_rule1.m] for k=1:maxit if k>maxit maxit_reached = 1; break E_k = 0.5*p_k^(-0.2) + 0.5*p_k^(-0.5)-1; p_k1 = p_k + lambda * E_k; % excess demand at p(k) if abs(p_k-p_k1) <= tol*(1+abs(p_k)) break else p_k = p_k1; Econ 353 Spring 2006 Page 3 of 13

4 Results for Rule 2: Value of ε Final no. of iterations, k Final value of the iterative, p k Excess demand evaluated at final p k [walras_rule2.m] for k=1:maxit if k>maxit maxit_reached = 1; break E_k = 0.5*p(k)^(-0.2) + 0.5*p(k)^(-0.5)-1; % excess demand at p(k) p(k+1) = p(k) + lambda * E_k; beta_star = 0; beta = zeros(1,k); if k>1 for j=1:k-1 beta(k+1-j) = abs(p(k+1-j)-p(k+1))/abs(p(k-j)-p(k)); beta_star = max(beta); if abs(p(k)-p(k+1)) <= tol*(1-beta_star) break As discussed in class, a natural way to measure accuracy is the difference of the excess demand from zero; this is an example of forward error analysis. By this measure, it is clear that Rule 2 with ε = 10-8 produces the most accurate result. Or we could use the fact that for the given excess demand function we know the analytical solution is p* = 1 and thus calculate the true error as p k 1. For a linearly convergent iterative, we defined the rate of convergence as the limit of the sequence β k = p k+1 - p* / p k - p* as k where p* is the limiting value of the iterative. In walras_rule2.m, estimation of β* produces a vector of values p k+1-j - p k+1 / p k-j - p k which we can view as estimates of β k, j = 1,,k. If p k is close to p*, these β k estimates should approach the limiting rate of convergence as j decreases. A plot of the β k estimates for ε =10-8, k = 48 suggests that the limiting rate of convergence 0.65: Econ 353 Spring 2006 Page 4 of 13

5 beta Iteration Econ 353 Spring 2006 Page 5 of 13

6 Part III A Monte Carlo experiment (20 marks) One of the main tasks of econometrics is to study the sampling distribution of an estimator or test statistic. For instance, we may be interested in the bias the expected difference from the population mean of an estimator in repeated samples. There are powerful theorems (e.g. central limit theorems) that characterize the sampling distributions of estimators as the sample size ts to infinity. With finite samples, however, these asymptotic theorems are often uninformative or misleading. Monte Carlo simulation is a method for studying the finite-sample distribution of an estimator or test statistic. It uses simulated experimental data to evaluate the performance of the estimator or test procedure. The basic steps in a Monte Carlo experiment are the following: 1. Specify a model for the data-generating process (DGP), i.e. make assumptions about functional relationships among the variables, probability distributions, and the true values of the associated parameters. 2. Generate R data sets (samples) by simulating R random draws from the DGP. Typically, the sampling process is based on a computer-generated sequence of pseudo-random numbers. 3. Calculate the statistic of interest for each data set. The R calculated values represent the sampling distribution of the statistic. 4. Calculate the desired sampling measure for the statistic. E.g. The bias of the statistic is estimated as the average deviation from the true mean taken over the simulated sampling distribution. Your task is to write a Matlab program that performs Monte Carlo experiments. In developing your program, observe the following coding guidelines: - Declare and initialize the main variables before performing operations on them. - Where possible, use parameters rather than hard-coded numbers. - Use vectorized code rather than loops when it is efficient to do so. - Include concise and informative comments throughout. A. Preparation Consider the following DGP: y t = β + v t ; t = 1,2,,T where β is a constant and v t is indepently and identically distributed (i.i.d.) as Normal with mean 0 and variance σ Write the pseudo-code for a Monte Carlo experiment that estimates the bias of a given estimator of β. Use the following notation: Econ 353 Spring 2006 Page 6 of 13

7 R T y rt b r B(y) randn() number of samples generated by the experiment number of observations in each sample t-th observation in sample r value of the estimator for sample r the estimator for β as a function of the input vector y a function that returns a value drawn at random from a Normal distribution with mean 0 and variance 1 The bias estimate is to be computed as the average of b r β over the R samples. Assume the estimates b r are stored in an R-by-1 vector named BDist. You may declare additional scalar, vector, or matrix variables as needed. Use the symbol to denote assignment and use = to denote a test for equality. (Refer to the pseudo-code example from Session 3 as a style guide.) Suggested solution [Marking guide: I would suggest allocating 4 marks for the pseudo-code, 8 marks for the Matlab implementation, 8 marks for the analysis. The bonus question is worth 5 marks.] For r = 1, 2,, R: For t = 1, 2,, T: y rt β + σ randn() // need to multiply by σ to get Var(v rt ) = σ 2. y sample (y r1,y r2,,y rt ) // collect the observations for sample r in a vector b r B(y sample ).. bias ( r=1,..,r b r )/R β 2. Implement your pseudo-code in Matlab. Begin by creating two M-files: mc.m A script file that implements the Monte Carlo simulation steps. B.m A function file that implements the estimator B(y). The function should take a vector as input and return a scalar value. (Leave B.m as an empty stub for now; the specific estimation rules will be defined later.) Your main script should call the function B.m to obtain estimates of β. Use the built-in Matlab function randn() to draw values of v t. Include the following line of code to initialize the Matlab random number generator: randn('state',1); Econ 353 Spring 2006 Page 7 of 13

8 Suggested solution: % mc.m % % A simple Monte Carlo simulation for estimating the mean. % Assumes a normal distribution for the error term. % % Ming Kang % February 2006 clear all; % Set parameters R = 100; T = round(10^4.5); beta = 2; sigma2 = 4; % Initialize variables BDist = zeros(r,1); y = zeros(t,1); % Set random-number generator randn('state',1); % Simulation step for r = 1:R y = beta + sqrt(sigma2)*randn(t,1); BDist(r) = B(y); % Calculate sampling statistics mu = mean(bdist); bias = mu - beta; mse = mean((bdist - beta).^2); hist(bdist,20); % B.m % function b = B(y) b = mean(y); % b = median(y); Econ 353 Spring 2006 Page 8 of 13

9 B. Analysis 1. Explain how the Monte Carlo algorithm is affected by each of the three types of numerical error discussed in class. Which do you think is the greatest source of error and why? Describe the effect of R on the accuracy of the algorithm. 2. Estimate β with the sample mean: that is, B.m should compute the average of the values in the input vector. Set β = 2, σ 2 = 4, R = 100, T = 25 and run the Monte Carlo experiment. Use the hist command to plot a histogram of BDist (use bins of size 1) and comment on its shape: is the distribution symmetric, smooth, bell-shaped, etc. Report the mean and variance of BDist. 3. Repeat Question 2 but this time estimate β with the sample median; that is, B.m should compute the median (i.e. the 50 th percentile) of the values in the input vector. Compare your results with Question 1 and comment on any differences. 4. Repeat the experiments in Questions 2 and 3 for T = 10 k ; k = 2.0, 2.5, 3.0,, 5.0. For each value of T, estimate the mean-squared error (MSE) of each estimator: that is, calculate the average value of (b r -β) 2 over the R estimates. Use the loglog command to plot the MSE of both estimators against T and comment on any tr that may be apparent. Compute the ratio of the MSE for the sample median to the MSE for the sample mean; record the values for each T in a table. What happens to the MSE ratio as T grows large? 5. (Bonus question) Consider the same DGP but now suppose v t is drawn from the Student s-t distribution instead of the Normal distribution. The Student s t-distribution deps on a degrees-of-freedom parameter, d = 1,2,. The procedure for sampling from the Student s-t distribution with d degrees of freedom is the following: i. Sample d+1 values from the standard Normal distribution (mean 0, variance 1); call these z n ; n = 1,2,,d+1 ii. Compute the ratio: t = z 1 d d + 1 d n= 1 z 2 n Modify your code for mc.m so that the v t s are generated using this procedure. Repeat the steps in Question 4 and fill in the table below with values for the MSE ratio. Compare your results for T = 25 with those from Question 4; comment on the effects of d and T. Degrees of freedom (d) No. of obs. (T) Econ 353 Spring 2006 Page 9 of 13

10 Suggested solution: 1. The three sources of numeric error discussed in class were modeling error, approximation error, and round-off error. With Monte Carlo experiments, modeling error is negligible by design: the experimenter chooses the parameters, distributions, etc. associated with the data generating process and thus there is an exact correspondence between the statistical model and the truth. (Whether the model generates realistic data is not of direct interest in this problem.) Approximation error is present because the simulated sampling distribution is an estimate; we only observe R discrete values of the sample statistic where in fact the statistic has a continuous distribution over an infinity of values. (There is also approximation error in how the normal random variable is generated from a sequence of pseudo-random numbers but this is less significant.) Roundoff error is probably small given that the range of computed numbers is not extreme and also averaging ts to smooth out these errors. Therefore, approximation error seems to be the most significant source of numeric error. The parameter R controls the number of replicated draws from the sampling distribution; the higher is R, the better is the discrete approximation to the true sampling distribution. 2. Results for the sample mean with T = 25: Here, BDist has mean and variance As illustrated in the histogram, the sampling distribution is rough but it appears to be single-peaked and not noticeably skewed. Econ 353 Spring 2006 Page 10 of 13

11 3. Results for the sample median with T=25: Now BDist has mean and variance Thus the sample median appears to have greater bias and higher variance. The distribution as illustrated in the histogram is noticeably choppier and more spread out than before. 4. Results: log 10 (T) MSE for the mean MSE for the median MSE ratio (median/mean) For both estimators, the loglog plot suggests log 10 (MSE) is a linear function of log 10 (T) with negative slope, which suggests that MSE declines exponentially with T. The position of the tr is lower for the sample mean; that is, the MSE for the sample median is always greater than for the sample mean. The MSE ratio is positive for all T; the ratio fluctuates as T increases, no increasing or decreasing tr is apparent. Econ 353 Spring 2006 Page 11 of 13

12 mse for mean (solid), median (dash) T 5. Results for R = 100: Degrees of freedom (d) No. of obs. (T) Notice that the median has lower MSE than the mean in some cases which contrasts with the earlier result that the MSE ratio was always positive. No clear trs are apparent except for the case of T=25 in which the MSE ratio increases with d; this makes sense as the t-distribution actually looks more like the normal distribution as d gets large. Repeating the experiment (not required) with R = 1000 confirms this result: (R = 1000) Degrees of freedom (d) No. of obs. (T) Econ 353 Spring 2006 Page 12 of 13

13 The new code: % mc2.m % % A simple Monte Carlo simulation for estimating the mean. % Assumes a t-distribution for the error term. % % Ming Kang % February 2006 clear all; % Set parameters R = 1000; T = 100; beta = 2; d = 3; % Initialize variables BDist1 = zeros(r,1); BDist2 = zeros(r,1); y = zeros(t,1); % sample mean % sample median % Set random-number generator randn('state',1); % Simulation step for r = 1:R numer = randn(t,1); denom = sqrt(sum(randn(t,d).^2,2)); z = numer./denom; y = beta + z; BDist1(r) = mean(y); BDist2(r) = median(y); % Calculate sampling statistics mse1 = mean((bdist1 - beta).^2); mse2 = mean((bdist2 - beta).^2); mse_r = mse2/mse1 Econ 353 Spring 2006 Page 13 of 13