The Bootstrap and Jackknife Methods for Data Analysis



Similar documents
Bootstrapping Big Data

Solutions to Homework 3 Statistics 302 Professor Larget

Nonparametric statistics and model selection

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Profit Forecast Model Using Monte Carlo Simulation in Excel

We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap

The Variability of P-Values. Summary

Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16

Step 3: Go to Column C. Use the function AVERAGE to calculate the mean values of n = 5. Column C is the column of the means.

X = rnorm(5e4,mean=1,sd=2) # we need a total of 5 x 10,000 = 5e4 samples X = matrix(data=x,nrow=1e4,ncol=5)

COMMON CORE STATE STANDARDS FOR

Chapter 12 Bagging and Random Forests

Retrospective Test for Loss Reserving Methods - Evidence from Auto Insurers

~ EQUIVALENT FORMS ~

Algebra 1 Course Information

17. SIMPLE LINEAR REGRESSION II

A study on the bi-aspect procedure with location and scale parameters

Sampling based sensitivity analysis: a case study in aerospace engineering

The Method of Least Squares

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

What is the purpose of this document? What is in the document? How do I send Feedback?

Lesson 4 Measures of Central Tendency

Confidence Intervals for One Standard Deviation Using Standard Deviation

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Problem of the Month Pick a Pocket

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1. Go to your programs menu and click on Microsoft Excel.

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Scientific Programming

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Bootstrap Methods and Permutation Tests*

Name: Date: Use the following to answer questions 3-4:

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

School of Engineering Department of Electrical and Computer Engineering

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

The Big Data Bootstrap

FX Transaction Costs Plugging the Leakage in Returns

Hybrid processing of SCADA and synchronized phasor measurements for tracking network state

Simulation and Lean Six Sigma

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

How To Model A Network Of Firms

Decision Theory Rational prospecting

Simple Linear Regression

Lecture 1: Review and Exploratory Data Analysis (EDA)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Normality Testing in Excel

How To Check For Differences In The One Way Anova

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Magruder Statistics & Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Controller Design using the Maple Professional Math Toolbox for LabVIEW

Solutions to Homework 6 Statistics 302 Professor Larget

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

SKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.

Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation

STCE. Fast Delta-Estimates for American Options by Adjoint Algorithmic Differentiation

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Data Mining Introduction

Building Java Programs

f(x) = g(x), if x A h(x), if x B.

Decision Trees from large Databases: SLIQ

E-Commerce Design and Implementation Tutorial

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)

EXPERIMENTAL DESIGN REFERENCE

AMS 5 CHANCE VARIABILITY

DRAFT. Algebra 1 EOC Item Specifications

APPENDIX N. Data Validation Using Data Descriptors

WISE Sampling Distribution of the Mean Tutorial

Integer Programming: Algorithms - 3

Cell Phone Impairment?

M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 1 - Bootstrap

Exploratory Data Analysis. Psychology 3256

What Does the Normal Distribution Sound Like?

Exploratory Data Analysis

Unbeknownst to us, the entire population consists of 5 cloned sheep with ages 10, 11, 12, 13, 14 months.

HONORS STATISTICS. Mrs. Garrett Block 2 & 3

CS/COE

STRATEGIC MANAGEMENT Microsoft s Search

Module 4: Data Exploration

Fuel Economy Simulation for the Vehicle Fleet

Draft 1, Attempted 2014 FR Solutions, AP Statistics Exam

Measurement with Ratios

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Stat 20: Intro to Probability and Statistics

Computational Statistics: A Crash Course using R for Biologists (and Their Friends)

Manhattan Center for Science and Math High School Mathematics Department Curriculum

Now we begin our discussion of exploratory data analysis.

Summarizing and Displaying Categorical Data

STAT 35A HW2 Solutions

Journal College Teaching & Learning June 2005 Volume 2, Number 6

Comparison of resampling method applied to censored data

Chapter 11 Equilibrium

Chapter 3 RANDOM VARIATE GENERATION

Transcription:

The Bootstrap and Jackknife Methods for Data Analysis Rainer W. Schiel Institut für Theoretische Physik, Universität Regensburg December 21, 2011

A very simple example Zero-dimensional data (only one quantity measured) from an experiment or from a Monte-Carlo simulation 25 20 15 10 5 200 400 600 800 1000 Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 2 / 15

A very simple example Plot the Histogram: 140 120 100 80 60 40 20 5 10 15 20 25 Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 3 / 15

A very simple example 140 120 100 80 60 40 20 5 10 15 20 25 Average: x = 1 N i x i (= 14.9043...) Standard deviation: σ = 1 N 1 i (x i x) 2 (= 3.2298...) Standard deviation of the mean: σ mean = 1 N σ (= 0.1021...) Result: x = x ±σ mean (= 14.90±0.10) Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 4 / 15

Standard deviation of the mean: an Alternative? Repeat the whole experiment (i.e. taking 1000 data points) many (say 2500) times Compute the means of each experiment ( 2500 means) 200 150 100 50 14.7 14.8 14.9 15.0 15.1 15.2 15.3 Standard deviation of these means is σ mean (compare to 14.90±0.10) Method works, but is nonpractical use re-sampling techniques Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 5 / 15

The Bootstrap A loop sewn at the top rear or sometimes on each side of a boot to facilitate pulling it on. It is said that Baron von Münchhausen pulled himself (and his horse) up by his own bootstraps. Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 6 / 15

The bootstrap re-sampling technique Do experiment once (here: 1000 measurements) Use this data to repeat the experiment over and over again (here: 2500 times) Repeat the experiment by randomly picking (with replacement) 1000 measurements Do this 2500 times Compute the mean of each sample 200 150 100 50 14.6 14.7 14.8 14.9 15.0 15.1 15.2 Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 7 / 15

The bootstrap re-sampling technique 200 150 100 50 14.6 14.7 14.8 14.9 15.0 15.1 15.2 The mean of the means is the mean of the original data set The standard deviation of the means of the samples is σ mean Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 8 / 15

Advantages of the Bootstrap method No need to propagate errors... never again! just compute the final quantity that you re interested in on all samples the errors are automatically propagated to the final result Handle data with skewed distributions the data of the samples is just as skewed as the original data Perform more complicated data analysis e.g. consider two noisy data sets f i (x) and g i (x) you are interested in a fit to f(x) g(x) with traditional methods, this can be difficult (division by zero!) no problem with the bootstrap method, though Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 9 / 15

Disadvantages of the Bootstrap method Computationally intensive though usually not a problem with today s computers You might have to write your own code but you might have to do that for other data analysis methods, too A reasonably large original data sample is needed (> O(100)) Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 10 / 15

The Jackknife According to www.artofmanliness.com, the Jackknife is a pocket knife that has a simple hinge at one end, and may have more than one blade. Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 11 / 15

The single-elimination jackknife re-sampling technique Somewhat similar to the bootstrap re-sampling technique Do experiment once (here: 1000 measurements) Make 1000 samples of 999 measurements each with the i th measurement eliminated in the i th sample Compute the mean (or the wanted quantity) of each sample 120 100 80 60 40 20 14.895 14.900 14.905 14.910 14.915 Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 12 / 15

The single-elimination jackknife re-sampling technique 120 100 80 60 40 20 14.895 14.900 14.905 14.910 14.915 The mean of the means is the mean of the original data set The standard deviation of the means is σmean N measurements (here: 1000)) (where N is the number of Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 13 / 15

Bootstrap vs. Jackknife The bootstrap method handles skewed distributions better The jackknife method is suitable for smaller original data samples Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 14 / 15

Conclusions The bootstrap and jackknife methods are powerful tools for data analysis They are very well suited to analyze lattice data Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 15 / 15