M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 1 - Bootstrap



Similar documents

Beta Distribution. Paul Johnson and Matt Beverlin June 10, 2013

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

Univariate Regression

Chapter 5 Discrete Probability Distribution. Learning objectives

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Simple Linear Regression Inference

1 Prior Probability and Posterior Probability

Package SHELF. February 5, 2016

How To Test For Significance On A Data Set

Basics of Statistical Machine Learning

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Package mcmcse. March 25, 2016

Sample Size Calculation for Longitudinal Studies

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Package EstCRM. July 13, 2015

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Supplement to Call Centers with Delay Information: Models and Insights

Estimation of σ 2, the variance of ɛ

1.3. DOT PRODUCT If θ is the angle (between 0 and π) between two non-zero vectors u and v,

Lecture 8. Confidence intervals and the central limit theorem

Tutorial 5: Hypothesis Testing

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16

Part 2: One-parameter models

An Internal Model for Operational Risk Computation

Notes on the Negative Binomial Distribution

Multiple Linear Regression

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

Week 5: Multiple Linear Regression

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Standard errors of marginal effects in the heteroskedastic probit model

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Chapter 3 RANDOM VARIATE GENERATION

1 Sufficient statistics

Least Squares Estimation

Sales forecasting # 2

Math Practice Exam 2 with Some Solutions

ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA

Bootstrapping Big Data

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Cyber-Security Analysis of State Estimators in Power Systems

Maximum Likelihood Estimation

Part 2: Analysis of Relationship Between Two Variables

Hypothesis testing - Steps

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Chapter 4 Statistical Inference in Quality Control and Improvement. Statistical Quality Control (D. C. Montgomery)

Quantile Regression under misspecification, with an application to the U.S. wage structure

Nonlinear Regression:

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Handling missing data in Stata a whirlwind tour

Applied Multivariate Analysis - Big data analytics

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

How To Check For Differences In The One Way Anova

YASAIw.xla A modified version of an open source add in for Excel to provide additional functions for Monte Carlo simulation.

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

Statistical Models in R

Final Exam Practice Problem Answers

3F3: Signal and Pattern Processing

Generalized Linear Models

Chapter 3 Quantitative Demand Analysis

Confidence Intervals for Cp

Bootstrap Example and Sample Code

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

The Variability of P-Values. Summary

Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia

SAS Syntax and Output for Data Manipulation:

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Principle of Data Reduction

Chapter 16, Part C Investment Portfolio. Risk is often measured by variance. For the binary gamble L= [, z z;1/2,1/2], recall that expected value is

Dongfeng Li. Autumn 2010

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Mobile App Design Project #1 Java Boot Camp: Design Model for Chutes and Ladders Board Game

START Selected Topics in Assurance

Introduction and usefull hints for the R software

From the help desk: Bootstrapped standard errors

Chapter 4: Statistical Hypothesis Testing

sin(θ) = opp hyp cos(θ) = adj hyp tan(θ) = opp adj

Sample size and power calculations

Complex Numbers. w = f(z) z. Examples

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

You Are What You Bet: Eliciting Risk Attitudes from Horse Races

Supplementary PROCESS Documentation

VI. Real Business Cycles Models

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Finance 400 A. Penati - G. Pennacchi Market Micro-Structure: Notes on the Kyle Model

Transcription:

Nathalie Villa-Vialanei Année 2015/2016 M1 in Economics and Economics and Statistics Applied multivariate Analsis - Big data analtics Worksheet 1 - Bootstrap This worksheet illustrates the use of nonparametric bootstrap to estimate confidence intervals and variance of estimators. The simulations use R. An introduction to R, to basic use of packages and of the RStudio environment and useful references to obtain help can be found at http://www.nathalievilla.org/teaching/tide. In particular, an time ou do not understand how to use a function, ou must use the help: help(rgamma) to have the help page for the function rgamma and carefull read it. Eercice 1 M first bootstrap estimation This eercise s aim is to program a boostratp estimate of the confidence interval of the mean. 1. Using the function rgamma, generate a random sample from the Γ distribution with parameters k = 2 and θ = 1, having size n = 15. We recall that the densit of Γ(k, θ) is: k 1 ep( /θ) f k,θ () = Γ(k) θ k with Γ the Γ function Γ(k) = (k 1)!, k N. What is the mean of the Γ(k, θ) distribution? What is the empirical mean of our sample? 2. Using the function sample, create a function that: takes a given sample (vector) for input; generates a bootstrap sample from the previous sample; outputs its (empirical) mean. Test this function to obtain the mean from a bootstrap sample b using the sample generated in the previous question. 3. Use this function in a for loop to obtain 1000 values of mean from bootstrap samples. What is the bootstrap estimate of the mean? The bootstrap estimate of the 2.5% and 97.5% of the quantiles of this distribution? 4. Draw the distribution of these means and identif the true mean, the empirical estimateof the mean, the bootstrap estimate of the mean and the bootstrap 95% confidence interval of the mean. I {>0}

Distribution of the empirical means for 1000 bootstrap samples Frequenc 0 20 40 60 80 true mean empirical estimation of the mean bootstrap CI of the mean 1.0 1.5 2.0 2.5 mean Eercice 2 Bootstrap using the package boot In this eercise, the package boot is used to calculate a bootstrap estimate of the variance of the empirical estimate. The computational time used b the package boot is compared to the use of a for loop. 1. Using the function rbeta, generate a random sample from the Beta(α, β) distribution with parameter α = 1 and β = 2 1, having size n = 20. What is the epected mean of the Beta(1, 2) distribution? What is the empirical estimate,, of this mean for the generated sample? 2. What is the variance of the estimator? 3. Using the sample generated in the previous question and a for loop, find a bootstrap estimate for the variance of with B = 1000 bootstrap samples. Use the function sstem.time to obtain the computational time required to compute the 10000 bootstrap estimates of the mean. 4. Load the package boot with the command line: librar(boot) The function boot requires an argument statistic which is a function having two arguments, the original sample and a vector of indees for a subsample which calculates the empirical estimate (here the mean) from the bootstrap sample. Use?boot to see eamples of such functions and create a function boot.mean <- function(a.sample, sub.indees) {... } which calcultes the empirical mean from the bootstrap sample with observations sub.indees. 5. Use the boot function to generate 1000 estimates of the mean from bootstrap samples. Store this results in a R object named res.boot. Print res.boot and use names(res.boot) to see which information is included in this object (also use the help page of the function). How to obtain a bootstrap estimate of the variance of from res.boot? 6. Compare the computational time required b the boot function to the computational time required b the method used in question 3 for 10000 bootstrap estimates. 1 We recall that the densit of the Beta(α, β) distribution is αβ (αβ) 2 (αβ1). Γ(αβ) Γ(α)Γ(β) α 1 (1 ) β 1 I [0,1] () whose mean is equal to αβ α and whose variance is

Eercice 3 Bootstrap in linear models This eercise illustrates how the bootstrap can be used in the contet of a linear model to obtain confidence intervals for the parameters and confidence bounds for the prediction. 1. Generate 20 i.i.d. observations from X U[0, 1] where U[0, 1] is the uniform distribution in [0, 1]. Then generate from this sample, 20 observations from Y, which Y fitting the model Y = 2X 1 0.5(ɛ 1), where ɛ χ 2 (1) and ɛ and X are independant. Note that, for this model, the law of the least squares estimates of the parameter (α = 2 and β = 1) of the model is not the usual normal distribution. Make a graphic with the i.i.d. observations of the couple (X, Y) and add the true linear model on it (i.e., the line = 2 1). i.i.d. sample of size 20 from a linear model 2. Use the function lm to find the least squares estimates of the parameters, ˆα and ˆβ of the model. Add the estimated linear model to the figure of the previous questions.

i.i.d. sample of size 20 from a linear model true model least squares estimate 3. Use the function boot to find a 95% confidence interval for ˆα and ˆβ with 5000 bootstrap samples. 4. Use the function boot to obtain 95% confidence intervals for the estimations of the predictions associated to = 0, 0.01, 0.02,..., 1. Add the bounds of the predictions to the figure obtained in question 2.

i.i.d. sample of size 20 from a linear model true model least squares estimate confidence interval for the prediction