The Bootstrap and Jackknife Methods for Data Analysis

The Bootstrap and Jackknife Methods for Data Analysis Rainer W. Schiel Institut für Theoretische Physik, Universität Regensburg December 21, 2011

A very simple example Zero-dimensional data (only one quantity measured) from an experiment or from a Monte-Carlo simulation 25 20 15 10 5 200 400 600 800 1000 Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 2 / 15

A very simple example Plot the Histogram: 140 120 100 80 60 40 20 5 10 15 20 25 Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 3 / 15

A very simple example 140 120 100 80 60 40 20 5 10 15 20 25 Average: x = 1 N i x i (= 14.9043...) Standard deviation: σ = 1 N 1 i (x i x) 2 (= 3.2298...) Standard deviation of the mean: σ mean = 1 N σ (= 0.1021...) Result: x = x ±σ mean (= 14.90±0.10) Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 4 / 15

Standard deviation of the mean: an Alternative? Repeat the whole experiment (i.e. taking 1000 data points) many (say 2500) times Compute the means of each experiment ( 2500 means) 200 150 100 50 14.7 14.8 14.9 15.0 15.1 15.2 15.3 Standard deviation of these means is σ mean (compare to 14.90±0.10) Method works, but is nonpractical use re-sampling techniques Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 5 / 15

The Bootstrap A loop sewn at the top rear or sometimes on each side of a boot to facilitate pulling it on. It is said that Baron von Münchhausen pulled himself (and his horse) up by his own bootstraps. Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 6 / 15

The bootstrap re-sampling technique Do experiment once (here: 1000 measurements) Use this data to repeat the experiment over and over again (here: 2500 times) Repeat the experiment by randomly picking (with replacement) 1000 measurements Do this 2500 times Compute the mean of each sample 200 150 100 50 14.6 14.7 14.8 14.9 15.0 15.1 15.2 Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 7 / 15

The bootstrap re-sampling technique 200 150 100 50 14.6 14.7 14.8 14.9 15.0 15.1 15.2 The mean of the means is the mean of the original data set The standard deviation of the means of the samples is σ mean Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 8 / 15

Advantages of the Bootstrap method No need to propagate errors... never again! just compute the final quantity that you re interested in on all samples the errors are automatically propagated to the final result Handle data with skewed distributions the data of the samples is just as skewed as the original data Perform more complicated data analysis e.g. consider two noisy data sets f i (x) and g i (x) you are interested in a fit to f(x) g(x) with traditional methods, this can be difficult (division by zero!) no problem with the bootstrap method, though Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 9 / 15

Disadvantages of the Bootstrap method Computationally intensive though usually not a problem with today s computers You might have to write your own code but you might have to do that for other data analysis methods, too A reasonably large original data sample is needed (> O(100)) Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 10 / 15

The Jackknife According to www.artofmanliness.com, the Jackknife is a pocket knife that has a simple hinge at one end, and may have more than one blade. Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 11 / 15

The single-elimination jackknife re-sampling technique Somewhat similar to the bootstrap re-sampling technique Do experiment once (here: 1000 measurements) Make 1000 samples of 999 measurements each with the i th measurement eliminated in the i th sample Compute the mean (or the wanted quantity) of each sample 120 100 80 60 40 20 14.895 14.900 14.905 14.910 14.915 Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 12 / 15

The single-elimination jackknife re-sampling technique 120 100 80 60 40 20 14.895 14.900 14.905 14.910 14.915 The mean of the means is the mean of the original data set The standard deviation of the means is σmean N measurements (here: 1000)) (where N is the number of Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 13 / 15

Bootstrap vs. Jackknife The bootstrap method handles skewed distributions better The jackknife method is suitable for smaller original data samples Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 14 / 15

Conclusions The bootstrap and jackknife methods are powerful tools for data analysis They are very well suited to analyze lattice data Rainer W. Schiel (Regensburg) Bootstrap and Jackknife December 21, 2011 15 / 15