Exercise 2: Numerical Analysis and Simulation using Matlab

Similar documents
Chapter 3 RANDOM VARIATE GENERATION

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

AP Physics 1 and 2 Lab Investigations

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

4. Continuous Random Variables, the Pareto and Normal Distributions

A few useful MATLAB functions

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MEASURES OF VARIATION

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Centre for Central Banking Studies

3: Summary Statistics

Lecture Notes Module 1

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Permutation Tests for Comparing Two Populations

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Measuring Line Edge Roughness: Fluctuations in Uncertainty

Betting with the Kelly Criterion

NCSS Statistical Software

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Analysis of Bayesian Dynamic Linear Models

Means, standard deviations and. and standard errors

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Java Modules for Time Series Analysis

6 Scalar, Stochastic, Discrete Dynamic Systems

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

1.5 Oneway Analysis of Variance

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Mean, Median, Standard Deviation Prof. McGahagan Stat 1040

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Introduction to Matlab

CS 103X: Discrete Structures Homework Assignment 3 Solutions

Lecture 5 : The Poisson Distribution

Financial Econometrics MFE MATLAB Introduction. Kevin Sheppard University of Oxford

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

E3: PROBABILITY AND STATISTICS lecture notes

Exploratory Data Analysis

Vector and Matrix Norms

1 The Brownian bridge construction

Nonparametric adaptive age replacement with a one-cycle criterion

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Georgia Department of Education Kathy Cox, State Superintendent of Schools 7/19/2005 All Rights Reserved 1

Descriptive Statistics

u = [ 2 4 5] has one row with three components (a 3 v = [2 4 5] has three rows separated by semicolons (a 3 w = 2:5 generates the row vector w = [ 2 3

Basics of Statistical Machine Learning

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Geostatistics Exploratory Analysis

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Lesson 4 Measures of Central Tendency

Gamma Distribution Fitting

Least Squares Estimation

The Method of Least Squares

Fairfield Public Schools

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

TEC H N I C A L R E P O R T

Appendix 3 IB Diploma Programme Course Outlines

7 Gaussian Elimination and LU Factorization

Week 4: Standard Error and Confidence Intervals

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Numerical Matrix Analysis

Descriptive Statistics and Measurement Scales

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Statistics 104: Section 6!

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Exercise 1.12 (Pg )

AP STATISTICS REVIEW (YMS Chapters 1-8)

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Data Mining: Algorithms and Applications Matrix Math Review

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

6.4 Normal Distribution

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Introduction to General and Generalized Linear Models

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

In order to describe motion you need to describe the following properties.

Simple Regression Theory II 2010 Samuel L. Baker

Data Structures and Algorithms

MATH. ALGEBRA I HONORS 9 th Grade ALGEBRA I HONORS

Big Ideas in Mathematics

Review of basic statistics and the simplest forecasting model: the sample mean

SAN DIEGO COMMUNITY COLLEGE DISTRICT CITY COLLEGE ASSOCIATE DEGREE COURSE OUTLINE

Multivariate Normal Distribution

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

CALCULATIONS & STATISTICS

LOGNORMAL MODEL FOR STOCK PRICES

Using simulation to calculate the NPV of a project

Algebra I Credit Recovery

096 Professional Readiness Examination (Mathematics)

CURVE FITTING LEAST SQUARES APPROXIMATION

Transcription:

Exercise 2: Numerical Analysis and Simulation using Matlab Part I Time costs of computing (10 marks) Write a Matlab script that performs the following: 1. Using the rand command, generate a vector of 500 observations uniformly distributed between 0 and 10000; call this vector a. 2. Again using the rand command, generate another vector of 500 observations uniformly distributed between 0 and 1; call this vector b. 3. Repeat the following operations for 10000 iterations: i. The sum of a and b ii. The difference of a and b iii. The scalar product of a and b iv. The scalar division of a and b v. The square root of a vi. The exponential function of a. vii. The sine of a viii. The tangent of a Look up the profile command in the Matlab help files. Use the profile command to compare the computing time of these operations. Express the time costs of each operation as a ratio relative to i.; report these ratios in a table. Suggested solution The M-file ops.m executes steps 1 to 3 as follows: T = 10000; n = 500; rand('state',1); a = 10000*rand(n,1); b = rand(n,1); for i = 1:T a+b; a-b; a.*b; a./b; sqrt(a); exp(a); sin(a); tan(a); Econ 353 Spring 2006 Page 1 of 13

The Matlab profiler traces the sequence of calls to functions and scripts and analyses how much time is spent executing specific operations. The profile command includes an option for reporting the detail for built-in functions and operators like addition and multiplication. Running the commands >> profile on -detail operator >> ops >> profile report opsreport produces a report in HTML format. The report states how much time was spent on the following lines: 7: for i = 1:T 0.12 1% 8: a+b; 0.19 2% 9: a-b; 0.13 1% 10: a.*b; 0.23 3% 11: a./b; 1.88 21% 12: sqrt(a); 2.54 29% 13: exp(a); 1.57 18% 14: sin(a); 2.09 24% 15: tan(a); 0.04 0% 16: Expressing the computing times as ratios: Operation Index a+b 1.00 a-b 1.58 a.*b 1.08 a./b 1.92 sqrt(a) 15.67 exp(a) 21.17 sin(a) 13.08 tan(a) 17.42 Thus we can see that the last 4 operations are an order of magnitude more costly relative to the elementary arithmetic operations. Econ 353 Spring 2006 Page 2 of 13

Part II Stopping rules in iterative methods (15 marks) In Lab 1, you looked at a Matlab implementation of the simple Walrasian iterative. Modify the code in walras1.m to consider two different stopping rules: Rule 1: Stop if p k p k+1 / (1 + p k ) ε Rule 2: Stop if p k p k+1 ε (1 β*) where β* = max j=1,,k p k+1-j - p k+1 / p k-j - p k (Here,. means absolute value of.) For each rule, report the final value of the iterative and the number of iterations for ε = 10-2, 10-4, 10-6, 10-8. Define a suitable accuracy measure and use it to evaluate each rule-ε combination: which is the most accurate? Based on your results, provide an estimate for the iterative s rate of (linear) convergence. Suggested solution See the M-files walras_rule1.m and walras_rule2.m for the suggested Matlab implementations. The main loops of these M-files appear below. [Marking guide: I would suggest allocating 10 marks to the quality of the Matlab implementation and reported results, 5 marks for the analysis.] Results for Rule 1: Value of ε Final no. of iterations, k Final value of the iterative, p k Excess demand evaluated at final p k 10-2 10 1.05197659091279-0.01755008852421 10-4 21 1.00048080455497-0.00016822440116 10-6 32 1.00000420959831-0.00000147335502 10-8 42 1.00000005667300-0.00000001983555 [walras_rule1.m] for k=1:maxit if k>maxit maxit_reached = 1; break E_k = 0.5*p_k^(-0.2) + 0.5*p_k^(-0.5)-1; p_k1 = p_k + lambda * E_k; % excess demand at p(k) if abs(p_k-p_k1) <= tol*(1+abs(p_k)) break else p_k = p_k1; Econ 353 Spring 2006 Page 3 of 13

Results for Rule 2: Value of ε Final no. of iterations, k Final value of the iterative, p k Excess demand evaluated at final p k 10-2 16 1.00412743509784-0.00144039997406 10-4 26 1.00005581309872-0.00001953381360 10-6 37 1.00000048843772-0.00000017095314 10-8 48 1.00000000427421-0.00000000149598 [walras_rule2.m] for k=1:maxit if k>maxit maxit_reached = 1; break E_k = 0.5*p(k)^(-0.2) + 0.5*p(k)^(-0.5)-1; % excess demand at p(k) p(k+1) = p(k) + lambda * E_k; beta_star = 0; beta = zeros(1,k); if k>1 for j=1:k-1 beta(k+1-j) = abs(p(k+1-j)-p(k+1))/abs(p(k-j)-p(k)); beta_star = max(beta); if abs(p(k)-p(k+1)) <= tol*(1-beta_star) break As discussed in class, a natural way to measure accuracy is the difference of the excess demand from zero; this is an example of forward error analysis. By this measure, it is clear that Rule 2 with ε = 10-8 produces the most accurate result. Or we could use the fact that for the given excess demand function we know the analytical solution is p* = 1 and thus calculate the true error as p k 1. For a linearly convergent iterative, we defined the rate of convergence as the limit of the sequence β k = p k+1 - p* / p k - p* as k where p* is the limiting value of the iterative. In walras_rule2.m, estimation of β* produces a vector of values p k+1-j - p k+1 / p k-j - p k which we can view as estimates of β k, j = 1,,k. If p k is close to p*, these β k estimates should approach the limiting rate of convergence as j decreases. A plot of the β k estimates for ε =10-8, k = 48 suggests that the limiting rate of convergence 0.65: Econ 353 Spring 2006 Page 4 of 13

0.8 0.7 0.6 0.5 beta 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 45 50 Iteration Econ 353 Spring 2006 Page 5 of 13

Part III A Monte Carlo experiment (20 marks) One of the main tasks of econometrics is to study the sampling distribution of an estimator or test statistic. For instance, we may be interested in the bias the expected difference from the population mean of an estimator in repeated samples. There are powerful theorems (e.g. central limit theorems) that characterize the sampling distributions of estimators as the sample size ts to infinity. With finite samples, however, these asymptotic theorems are often uninformative or misleading. Monte Carlo simulation is a method for studying the finite-sample distribution of an estimator or test statistic. It uses simulated experimental data to evaluate the performance of the estimator or test procedure. The basic steps in a Monte Carlo experiment are the following: 1. Specify a model for the data-generating process (DGP), i.e. make assumptions about functional relationships among the variables, probability distributions, and the true values of the associated parameters. 2. Generate R data sets (samples) by simulating R random draws from the DGP. Typically, the sampling process is based on a computer-generated sequence of pseudo-random numbers. 3. Calculate the statistic of interest for each data set. The R calculated values represent the sampling distribution of the statistic. 4. Calculate the desired sampling measure for the statistic. E.g. The bias of the statistic is estimated as the average deviation from the true mean taken over the simulated sampling distribution. Your task is to write a Matlab program that performs Monte Carlo experiments. In developing your program, observe the following coding guidelines: - Declare and initialize the main variables before performing operations on them. - Where possible, use parameters rather than hard-coded numbers. - Use vectorized code rather than loops when it is efficient to do so. - Include concise and informative comments throughout. A. Preparation Consider the following DGP: y t = β + v t ; t = 1,2,,T where β is a constant and v t is indepently and identically distributed (i.i.d.) as Normal with mean 0 and variance σ 2. 1. Write the pseudo-code for a Monte Carlo experiment that estimates the bias of a given estimator of β. Use the following notation: Econ 353 Spring 2006 Page 6 of 13

R T y rt b r B(y) randn() number of samples generated by the experiment number of observations in each sample t-th observation in sample r value of the estimator for sample r the estimator for β as a function of the input vector y a function that returns a value drawn at random from a Normal distribution with mean 0 and variance 1 The bias estimate is to be computed as the average of b r β over the R samples. Assume the estimates b r are stored in an R-by-1 vector named BDist. You may declare additional scalar, vector, or matrix variables as needed. Use the symbol to denote assignment and use = to denote a test for equality. (Refer to the pseudo-code example from Session 3 as a style guide.) Suggested solution [Marking guide: I would suggest allocating 4 marks for the pseudo-code, 8 marks for the Matlab implementation, 8 marks for the analysis. The bonus question is worth 5 marks.] For r = 1, 2,, R: For t = 1, 2,, T: y rt β + σ randn() // need to multiply by σ to get Var(v rt ) = σ 2. y sample (y r1,y r2,,y rt ) // collect the observations for sample r in a vector b r B(y sample ).. bias ( r=1,..,r b r )/R β 2. Implement your pseudo-code in Matlab. Begin by creating two M-files: mc.m A script file that implements the Monte Carlo simulation steps. B.m A function file that implements the estimator B(y). The function should take a vector as input and return a scalar value. (Leave B.m as an empty stub for now; the specific estimation rules will be defined later.) Your main script should call the function B.m to obtain estimates of β. Use the built-in Matlab function randn() to draw values of v t. Include the following line of code to initialize the Matlab random number generator: randn('state',1); Econ 353 Spring 2006 Page 7 of 13

Suggested solution: % mc.m % % A simple Monte Carlo simulation for estimating the mean. % Assumes a normal distribution for the error term. % % Ming Kang % February 2006 clear all; % Set parameters R = 100; T = round(10^4.5); beta = 2; sigma2 = 4; % Initialize variables BDist = zeros(r,1); y = zeros(t,1); % Set random-number generator randn('state',1); % Simulation step for r = 1:R y = beta + sqrt(sigma2)*randn(t,1); BDist(r) = B(y); % Calculate sampling statistics mu = mean(bdist); bias = mu - beta; mse = mean((bdist - beta).^2); hist(bdist,20); % B.m % function b = B(y) b = mean(y); % b = median(y); Econ 353 Spring 2006 Page 8 of 13

B. Analysis 1. Explain how the Monte Carlo algorithm is affected by each of the three types of numerical error discussed in class. Which do you think is the greatest source of error and why? Describe the effect of R on the accuracy of the algorithm. 2. Estimate β with the sample mean: that is, B.m should compute the average of the values in the input vector. Set β = 2, σ 2 = 4, R = 100, T = 25 and run the Monte Carlo experiment. Use the hist command to plot a histogram of BDist (use bins of size 1) and comment on its shape: is the distribution symmetric, smooth, bell-shaped, etc. Report the mean and variance of BDist. 3. Repeat Question 2 but this time estimate β with the sample median; that is, B.m should compute the median (i.e. the 50 th percentile) of the values in the input vector. Compare your results with Question 1 and comment on any differences. 4. Repeat the experiments in Questions 2 and 3 for T = 10 k ; k = 2.0, 2.5, 3.0,, 5.0. For each value of T, estimate the mean-squared error (MSE) of each estimator: that is, calculate the average value of (b r -β) 2 over the R estimates. Use the loglog command to plot the MSE of both estimators against T and comment on any tr that may be apparent. Compute the ratio of the MSE for the sample median to the MSE for the sample mean; record the values for each T in a table. What happens to the MSE ratio as T grows large? 5. (Bonus question) Consider the same DGP but now suppose v t is drawn from the Student s-t distribution instead of the Normal distribution. The Student s t-distribution deps on a degrees-of-freedom parameter, d = 1,2,. The procedure for sampling from the Student s-t distribution with d degrees of freedom is the following: i. Sample d+1 values from the standard Normal distribution (mean 0, variance 1); call these z n ; n = 1,2,,d+1 ii. Compute the ratio: t = z 1 d d + 1 d n= 1 z 2 n Modify your code for mc.m so that the v t s are generated using this procedure. Repeat the steps in Question 4 and fill in the table below with values for the MSE ratio. Compare your results for T = 25 with those from Question 4; comment on the effects of d and T. Degrees of freedom (d) No. of obs. (T) 3 6 9 10 25 100 Econ 353 Spring 2006 Page 9 of 13

Suggested solution: 1. The three sources of numeric error discussed in class were modeling error, approximation error, and round-off error. With Monte Carlo experiments, modeling error is negligible by design: the experimenter chooses the parameters, distributions, etc. associated with the data generating process and thus there is an exact correspondence between the statistical model and the truth. (Whether the model generates realistic data is not of direct interest in this problem.) Approximation error is present because the simulated sampling distribution is an estimate; we only observe R discrete values of the sample statistic where in fact the statistic has a continuous distribution over an infinity of values. (There is also approximation error in how the normal random variable is generated from a sequence of pseudo-random numbers but this is less significant.) Roundoff error is probably small given that the range of computed numbers is not extreme and also averaging ts to smooth out these errors. Therefore, approximation error seems to be the most significant source of numeric error. The parameter R controls the number of replicated draws from the sampling distribution; the higher is R, the better is the discrete approximation to the true sampling distribution. 2. Results for the sample mean with T = 25: 15 10 5 0 0.5 1 1.5 2 2.5 3 Here, BDist has mean 1.9514 and variance 0.1644. As illustrated in the histogram, the sampling distribution is rough but it appears to be single-peaked and not noticeably skewed. Econ 353 Spring 2006 Page 10 of 13

3. Results for the sample median with T=25: 15 10 5 0 0.5 1 1.5 2 2.5 3 3.5 Now BDist has mean 1.9274 and variance 0.2550. Thus the sample median appears to have greater bias and higher variance. The distribution as illustrated in the histogram is noticeably choppier and more spread out than before. 4. Results: log 10 (T) MSE for the mean MSE for the median MSE ratio (median/mean) 2.0 0.0442581 0.0694461 1.56912 2.5 0.0111912 0.0196407 1.75502 3.0 0.0043804 0.0068140 1.55557 3.5 0.0013336 0.0022508 1.68768 4.0 0.0003182 0.0005183 1.62867 4.5 0.0001036 0.0001838 1.77490 5.0 0.0000382 0.0000565 1.47814 For both estimators, the loglog plot suggests log 10 (MSE) is a linear function of log 10 (T) with negative slope, which suggests that MSE declines exponentially with T. The position of the tr is lower for the sample mean; that is, the MSE for the sample median is always greater than for the sample mean. The MSE ratio is positive for all T; the ratio fluctuates as T increases, no increasing or decreasing tr is apparent. Econ 353 Spring 2006 Page 11 of 13

10-1 10-2 mse for mean (solid), median (dash) 10-3 10-4 10-5 10 2 10 3 10 4 10 5 T 5. Results for R = 100: Degrees of freedom (d) No. of obs. (T) 3 6 9 10 0.63276 1.07076 0.96472 25 0.90520 0.96347 1.22535 100 0.58819 1.29718 1.19873 Notice that the median has lower MSE than the mean in some cases which contrasts with the earlier result that the MSE ratio was always positive. No clear trs are apparent except for the case of T=25 in which the MSE ratio increases with d; this makes sense as the t-distribution actually looks more like the normal distribution as d gets large. Repeating the experiment (not required) with R = 1000 confirms this result: (R = 1000) Degrees of freedom (d) No. of obs. (T) 3 6 9 10 0.58435 1.03613 1.20269 25 0.68058 1.12398 1.30109 100 0.58292 1.17798 1.25101 Econ 353 Spring 2006 Page 12 of 13

The new code: % mc2.m % % A simple Monte Carlo simulation for estimating the mean. % Assumes a t-distribution for the error term. % % Ming Kang % February 2006 clear all; % Set parameters R = 1000; T = 100; beta = 2; d = 3; % Initialize variables BDist1 = zeros(r,1); BDist2 = zeros(r,1); y = zeros(t,1); % sample mean % sample median % Set random-number generator randn('state',1); % Simulation step for r = 1:R numer = randn(t,1); denom = sqrt(sum(randn(t,d).^2,2)); z = numer./denom; y = beta + z; BDist1(r) = mean(y); BDist2(r) = median(y); % Calculate sampling statistics mse1 = mean((bdist1 - beta).^2); mse2 = mean((bdist2 - beta).^2); mse_r = mse2/mse1 Econ 353 Spring 2006 Page 13 of 13