# Problem 1: Calculate and plot the global mean temperature anomaly.

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Data: ATMO : Meteorological Research Methods Spring 2010 PROBLEM SET 2 (SOLUTIONS) Gridded [5 5 ] monthly surface temperature anomalies for all months of January 1958 to December Time series of ENSO index for all months of January 1958 to December Note: The complete code to produce these results is contained in the IDL file kcw hw2.pro. Problem 1: Calculate and plot the global mean temperature anomaly. Figure 1: Time series of monthly global mean temperature anomaly for all months of Jan to Dec Figure 1 shows my result for this time series. Now for some details on how I did the calculation. The global temperature data file is arranged somewhat inconveniently, i.e., with all the temperatures (all lats and lons) in a single row for each month. To make this and all other calculations much easier, the first thing I was did was reformat the global temperature grid from its native [nlon nlat, nt ime] format to a more intuitive 3-D form, i.e., like this: [nlon, nlat, nt ime] To calculate the global mean temperature for each month, I first calculated the mean temperature at each latititude for that month, i.e., take the simple mean of the temperatures of all longitudes at a given latitude: T lat = 1 nlon nlon 1 i=1 T lon,i

2 Note that this does not require any special weighting becuase all values are at the same latitude. However, I had to disregard any NaN values when computing this mean. Once we have the mean at each latitude, take the cosine weighted mean of those, giving the mean global temperature for this month: nlat j=1 T month = cos(lat j)t lat,j nlat j=1 cos(lat j) However, when I did this weighted mean, I also accounted for the possibility that there could have been entire latitudes that had no data. Those latitudes should not be included either in the T for that latitude nor should they be included in the weighting. Simply setting the temperatures at those missing latitudes to zero, then computing the cosine-weighted mean, would be incorrect. From Figure 1, it is clear that there is quite a bit of month-to-month variation, but there is also clearly a positive trend. Problem 2: Investigate the relationship between global mean temperature and ENSO. I first restricted both the global mean temperature time series (from problem 1) and the ENSO index time series to only the months of Dec, Jan, and Feb. After this restriction, I standardized the ENSO time series: z = ENSO ENSO σ ENSO (1) By definition, this standardized ENSO time series now has a mean of zero and a standard deviation (and variance) of one. This gives us a more useful measure of the behavior of ENSO during only the winter months. For example, we have now essentially defined what a positive (1σ) ENSO event is for any given winter. I did not standardize the temperature time series though because we want our regression coefficient to be in units of temperature (degrees C) per unit standard deviation of ENSO index. Figure 2 shows the resulting time series. Just eyeballing these two time series suggests that there is some relationship between temperature and ENSO. The regression quantifies this relationship. Figure 3 shows the scatter plot of the two time series along with the best fit line and fit parameters. For completeness, here is what the regression does: Let y = temperature, and z = standardized ENSO. Linear regression: ŷ = a 0 + a 1 z (2) Regression coefficient, covariance in (z,y) per unit variance of z: a 1 = z y (3) z 2 Correlation coefficient, similar to covariance but normalized by the variance of both z and y: z r = a 1 y 2 (4) 2 But since the ENSO index is standardized, the regression and correlation coefficients simplify to: a 1 = z y r = a 1 y 2 2

3 Figure 2: Time series of (black) global mean temperature anomaly and (red) standardized ENSO index for winter months of Jan to Dec The ENSO index is plotted against the right axis. The units of a 1 are degrees C per unit standard deviation of ENSO index. The fraction of the variance in temperature that is explained by ENSO is simply r 2. This best fit was accomplished by simply doing this (in IDL): fit = poly_fit(z, y,1) a0 = fit[0] a1 = fit[1] r = a1/stddev(y) yhat = z*a1 + a0 Despite the eyeballed impression of a good correlation between ENSO and global mean temperature that we get from the time series in Figure 2, the correlation between the two is not very large. It is only about r 0.21 which means that only about 4.5% of the variance in global temperature is explained by ENSO. (Although, as we ll see in the next homework, this correlation is a bit different if we lag the temperature relative to ENSO). The regression coefficient (a 1 ) tells us that we would expect about a 0.05 C change in global temperature per unit standard deviation change in ENSO index. It was useful to standardize ENSO first for a couple reasons: (1) it makes the calculations easier, (2) it gives a more useful result. The correlation coefficient does not depend on the amplitude of ENSO (regardless of whether or not ENSO is standardized), but the regression coefficient does depend on that amplitude. A raw (not standardized) ENSO index isn t really very meaningful (what are the units, for example?). 3

4 Figure 3: Scatterplot of global mean temperature anomaly versus standardized ENSO index for winter months of Jan to Dec The line shows the best-fit linear regression between the two. The numbers in the bottom-right list the number of samples (N) in the time series, the regression coeffcient (a1), the correlation coefficient (r) and the square of the correlation coefficient (r2). Whereas the standardized ENSO index gives us a better feel for what a positive or negative value means. For example, z = 1 means that ENSO is one standard deviation above its long-term winter mean. Assessing Significance To assess the significance of the correlation between temperature and ENSO index, I used the Student s t-statistic. We get our sample value for the t-statistic via: t = r (N 2) 1 2 (1 r 2 ) 1 2 where r =.209 is the sample correlation we just computed and N is the number of independent samples. N 2 is the degrees of freedom. The Null Hypothesis we are testing here is that the true correlation (ρ) between ENSO and temperature is zero. The t-distribution is the probability distribution of ρ about this zero mean (the t-distribution is almost identical to the normal distribution in this case due to the large number of degrees of freedom). There is some finite probability that our sample correlation (r) would be different from zero even though the true correlation is zero. So, basically, we are using this t-distribution to quantify just how confident we can be that a non-zero sample correlation is not due to chance. We will strictly define rejection of the null hypothesis as the correlation is not due to chance with 95% confidence. So this means that our value for t in the above equation must exceed the two-sided value for 95% confidence from the (5) 4

5 t-distribution. I ll call this the confidence value. I m using the two-sided confidence value because I have no a priori reason to expect either a positive or negative correlation. So we will consider our r to be significant if it lies beyond 95% of the t-distribution on either side. In short, this amounts to using the 2.5% value (0.025) for our confidence value. Now then, how many degrees of freedom (ν = N 2) do we have? If we think that all the months in our time series are independent, then ν = = 151. In that case our confidence value is: t It is more likely that all months are not independent. It would be more appropriate to consider only that each winter is independent, so ν = 51 2 = 49. In that case our confidence value is: t The change of ν didn t affect our confidence value much. It s about 2 in both cases, i.e., like a 2σ of a normal distribution. However, this change of ν does affect the t-stat we calculate using Equation (5) above. ν = 151, our t-stat is ν = 49, our t-stat is So we could reject the Null Hypothesis if we treated all months as independent, but we could not reject it if we treated only all winters as indpendent. (As a side note: when I did this calculation about 8 years ago, I found a much higher correlation, and I also found that I could reject the Null Hypothesis regardless of whether I treated all months or just all winters as independent. I guess the last 8 years of temperature and ENSO and/or refinements to the data record changed something.) Composite Analysis Composite analysis is an alternate (or perhaps supplemental) way to assess the relationship between ENSO and global temperature. The time series and scatterplots of Figures 2 and 3 show a tendency for winter temperature to be warmer during positive ENSO phases and colder during negative ENSO phases. Hence, a reasonable thing to do is to grab sub-sets of the temperature data from each phase. Specifically, I will define a warm/cold phase as being when our standardized ENSO index > 1 and < 1. So we have our two composite categories: 1. Global mean temperature anomalies in winter months when ENSO > Global mean temperature anomalies in winter months when ENSO < 1. Are the statitics of temperature in these two categories significantly different from the total population (i.e., all winter months)? Are they different from each other? Let s start with the statistics of the total population. Using all 153 winter months, I get the following sample mean (T ) and sample standard deviation (s T ) for mean global temperature anomalies: T = C s T = C I ll use the Student s t-statistic to place some confidence intervals around the sample mean. The t-statistic in this case is of the form: t = T µ s T N 1 (6) where µ is the true population mean, and N is the number of independent samples. Assuming that we have only independent winters, N = 51. For 95% confidence intervals, our t-stat should lie in the range: So, invert this to find the confidence intervals for our µ: ( T t t < T µ s T N 1 < t (7) s T N 1 ) < µ < ( T + t ) s T N 1 (8) 5

6 or more simply: µ = T ± t s T N 1 = T ± δt (9) Using N 1 = 50, our t-stat is t , our δt = So our confidence intervals on the population mean are: C < µ < C Now, what do we get from our two composite categories? I computed the mean, standard deviations and confidence intervals of temperature for the two composite categories. The table below summarizes the results: Sub-set N T s T δt T δt T + δt All ENSO > ENSO < ENSO > ENSO < The first line gives the results for all winter months, with only 51 independent winters. The next set of two lines gives the results for the composites, assuming all months were independent. So, for example, I found 23 months when ENSO > 1 and 19 months when ENSO < 1. However, these months spanned only 13 and 10 different winters, respectively. The last set of two lines gives the results for composites assuming only independent winters. The mean temperature for the ENSO > 1 subset is C which is beyond the range of our estimate for the population mean, i.e., it is > C. The mean temperature for the ENSO < 1 subset is C which is within the range of our estimate of the population mean. So far, our results suggest ENSO s effect on temperature is significantly more pronounced during positive ENSO phases. Note, however, that there is overlap in the confidence ranges (the T ± δt columns of the table). That is, the > 1 ranges overlap the < 1 ranges, and both overlap with the population ranges. Finally, I tested the Null Hypothesis that the > 1 and < 1 sub-sets have the same mean (i.e., they are from the same distribution). The t-statistic in this case has the form: t = T 1 T 2 (10) 1 ŝ N N 2 where ŝ = N 1 s N 2s 2 2 N 1 + N 2 2 (11) The T, N, s are the mean, number of samples and sample standard deviations of each composite category, i.e., the N and s T values from the table above. The quantity ν = N 1 + N 2 2 is the number of degrees of freedom. Assuming all months are independent, I get ν = 40 and t Assuming only winters are independent, I get ν = 21 and t Both of these t-stat values are much less than the two-tailed Student s t-statistic for 95% confidence. That value is approximately 2. So, we cannot reject the Null Hypothesis. In the end, this problem has shown us that there is certainly some sort of positive correlation between ENSO and global mean temperature. There seems to be a greater temperature response to ENSO during the warm phase (i.e., when the standardized ENSO index > 1). However, the relationship does not appear to be very significant, nor does it explain much of the variance in temperature. 6

7 Problem 3: Investigate the relationship between surface temperature and ENSO at all spatial grid points. Figure 4: Map showing the number useable winter months at each grid point. Here we basically apply the same methodology as Problem 2, i.e., regress a time series of temperature against a time series of ENSO index. However, in this problem we are grabbing a time series of temperature at each grid point, calculating regression and correlation with ENSO, then summarizing the results by plotting these coefficients on a map. As in Problem 2, I first restricted both ENSO and global mean temperature to the winter months. I then simply looped over the lat, lon grid to extract a time series at each point which I then regressed against ENSO. This results in a grid of r and a 1 values. For example: a1 i,j = z T i,j r i,j = a1 i,j s i,j where z is the standardized ENSO, T i,j is temperature at each grid point, and s i,j is the standard deviation of temperature at each grid point. One detail here: not all the grid points will have the same number of useable months in their time series. To account for that, I simply disregarded any NaN values. That is, at each grid point, I removed months that had NaN in the T. I removed these months from both T and ENSO, then standardized ENSO using only the good months. So, theoretically, I might use a different z in the calculation at each grid point. I m not entirely convinced that this was the best way to do it: do I use the same z for all grid points, then just remove values from it where the T has NaN? Or do I calculate a different z for each grid point? I couldn t decide, so I did the latter. While computing my grid of r and a 1, I also kept track of how many good samples were used in the calculation at each grid point. Figure 4 shows how many useable months were at each grid point. Most of the domain has at at least 100 useable samples. The polar regions are obvious exceptions to this. I didn t do the regression calculation at a grid point unless there were at least 10 samples. Initially, I planned to account for the variability of N so that I could use the appropriate number for N when assessing significance. However, In the end, this didn t affect things very much, so I used the same value for N at all points. I used N = 51, i.e., all winters (not all months) were independent. 7

8 Figure 5 shows the resulting maps of regression and correlation coefficients. The results in the Pacific are reassuring: positive ENSO phase (i.e., an El Nino) is clearly related to significantly warmer temperatures in the eastern Pacific and colder temperatures in the west Pacific. So the strongest ENSO signal is clearly in the Pacific. No surprise there since that s the region where the ENSO index measurements are done. There doesn t appear to be much of a significant relationship to ENSO in the interior of the continents (with the interesting exception of southern Africa), but there are some interesting teleconnection features in the ocean basins and along the coasts. Wouldn t it be fun to now re-do this analysis using precipitation instead of temperature? Problem 4: Estimate the incidence of extreme cold temperature events. Figure 6 illustrates the concept of this problem. The three thin lines in the figure are just the standard normal distribution f(z): f(z) = 1 ( ) z 2 exp (12) 2π 2 the black is f(z) for a mean of zero, the red and blue are shifted right and left by 0.2, respectively. The thick dashed line is the cumulative distribution function: F (z) = z f(t)dt (13) Assuming the winter temperatures in Lubbock are normally distributed, the black f(z) curve in Figure 6 is essentially the population distribution of Lubbock temperatures. In this context, the probability of having an extreme cold (-2σ) event is simply F ( 2) = Now, the question is: how would a correlation between temperature and ENSO alter this probability? Specifically, how much more probable is an extreme cold event during cold versus warm ENSO phases? Well, if both temperature and ENSO are standardized (i.e., mean is zero, variance is one), then the linear regression between the two would give us a correlation coefficient of: r = a1 = ENSO T and thus our best-fit line would be of the form: ˆT = r ENSO = (ENSO T )ENSO So, a change of ENSO by some amount δ will correspond to a temperature change of rδ. When ENSO is exactly ±1σ from its long term mean, that would correspond to a standardized temperature change of ±r. Hence, the mean of our standardized temperature distribution would shift by ±r, as illustrated in Figure 6. Naturally, if the correlation between temperature and ENSO is positive, then extreme cold events are more probable when δ < 0 and less probable when δ > 0. This will scale with r. That is, the greater the correlation with ENSO, the more probable cold events will be when δ < 0. To estimate the ratio η of the probability of 2σ extreme cold days during cold (δ = 1) versus warm (δ = +1) ENSO phases, we are essentially just computing the ratio of the cumulative distribution at 2 + r versus 2 r, i.e., η = P r{cold} F ( 2 + r) = P r{warm} F ( 2 r) So, all we have to do is solve for r given a value for η that is 2, or 3, or 4. This is difficult to do analytically because the equation for F (z) is difficult to solve analytically. Hence, we have to try something else. Here are some options: 1. Use trial and error with a table of values for F (z). 2. Use a numerical method like Newton s Method. 3. Use a graphical method. 8

9 All of these options are fine. I was hoping someone would try using Newton s Method. For example, on page 91, Wilks gives a close approximation to the cumulative distribution F (z) which has the form: [ F (z) 1 ( ) ] 2z 2 1 ± 1 exp (14) 2 π So, one could use this form of F (z) in Newton s method to solve for the root, i.e., solve for r such that: F ( 2 + r) η F ( 2 r) = 0 Instead of using Newton s Method, I solved this graphically. Figure 7 shows the result. To get this plot, I used IDL s built-in function called GAUSS PDF which gives the exact value of F (z) for any z. So, all I did here was to compute and plot F ( 2 + r) and η F ( 2 r) as a function of r. Plot them and see where they intersect. The values of r where they intersect are the r values we are looking for. I get the following: Interesting things to note here: η r If r = 0, the probability of an extreme cold event is obviously the same in both warm and cold phases of ENSO. It doesn t take too much of an increase in r to make cold days 2, 3, or 4 times more probable in the cold versus warm phase of ENSO. We could only solve the problem this way if we define opposing phases of ENSO as situations in which the ENSO index is 1 standard deviation from its long-term mean because that gives a specific value for F (z) in our computed ratio. If we defined oppositing phases as situations in which the ENSO index exceeds its long-term mean, then we have no specific value for z. 9

10 10 Figure 5: Map of (left) regression and (right) correlation coefficients between surface temperature and standardized ENSO index for winter months of Jan to Dec Contour interval is 0.2 with zero contour omitted. Positive values are in red. Negative values are in blue. The units of the regression coefficient are C per unit standard deviation of ENSO index. The gray-shaded areas indicate where the correlation exceeds the two-tailed 95% confidence level assuming 51 independent winters (49 degrees of freedom) at each grid point.

11 Figure 6: The standard normal distribution. The probability distribution functions f(z) are plotted against the left axis. The cumulative probability F(z) is plotted against the right axis. The thin black line is f(z) for a mean of zero. The thin red and blue lines are f(z+0.2) and f(z-0.2) (i.e., a mean of 0.2 and -0.2 respectively). The thick, black dashed line is F(z). The green line at z=-2 indicates the 2-sigma extreme cold event based on the zero mean distribution. Figure 7: Probability of a 2σ event for 1σ cold and warm phases of ENSO as a function of the correlation coefficient. The blue line corresponds to cold ENSO, i.e., F ( 2 + r). The black line is the warm ENSO, i.e., F ( 2 r). The three red lines are F ( 2 r) multiplied by 2, 3, and 4. The points of intersection with the blue line indicate the value of the correlation coefficient required to make a 2σ cold event 2, 3, and 4 times more probable during the cold ENSO phase relative to the warm ENSO phase. Note that a cold event is equally probable in both warm and cold ENSO phase if r = 0. 11

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Multiple Hypothesis Testing: The F-test

Multiple Hypothesis Testing: The F-test Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### Simple Regression and Correlation

Simple Regression and Correlation Today, we are going to discuss a powerful statistical technique for examining whether or not two variables are related. Specifically, we are going to talk about the ideas

### The correlation coefficient

The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

### Using Excel for inferential statistics

FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Data reduction and descriptive statistics

Data reduction and descriptive statistics dr. Reinout Heijungs Department of Econometrics and Operations Research VU University Amsterdam August 2014 1 Introduction Economics, marketing, finance, and most

### Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

### Multiple Regression: What Is It?

Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY

EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY Objective: To learn how to use the Excel spreadsheet to record your data, calculate values and make graphs. To analyze the data from the Acceleration Due

### Temporal variation in snow cover over sea ice in Antarctica using AMSR-E data product

Temporal variation in snow cover over sea ice in Antarctica using AMSR-E data product Michael J. Lewis Ph.D. Student, Department of Earth and Environmental Science University of Texas at San Antonio ABSTRACT

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

### One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

### Statistical Inference

Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this

### Chapter 7 Part 2. Hypothesis testing Power

Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship

### MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo. 3 MT426 Notebook 3 3. 3.1 Definitions... 3. 3.2 Joint Discrete Distributions...

MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo c Copyright 2004-2012 by Jenny A. Baglivo. All Rights Reserved. Contents 3 MT426 Notebook 3 3 3.1 Definitions............................................

### Statistical Inference and t-tests

1 Statistical Inference and t-tests Objectives Evaluate the difference between a sample mean and a target value using a one-sample t-test. Evaluate the difference between a sample mean and a target value

### A Summary of Error Propagation

A Summary of Error Propagation Suppose you measure some quantities a, b, c,... with uncertainties δa, δb, δc,.... Now you want to calculate some other quantity Q which depends on a and b and so forth.

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### SPSS: Descriptive and Inferential Statistics. For Windows

For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...

### Understanding GO/NO GO Gauges (Fixed Limit Gauging)

Understanding GO/NO GO Gauges (Fixed Limit Gauging) One of the most frequent questions that we are asked at A.A. Jansson is, How do I choose a plug gauge for my measurement application. Therefore I put

### Statistical Significance and Bivariate Tests

Statistical Significance and Bivariate Tests BUS 735: Business Decision Making and Research 1 1.1 Goals Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions,

### James Hansen, Reto Ruedy, Makiko Sato, Ken Lo

If It s That Warm, How Come It s So Damned Cold? James Hansen, Reto Ruedy, Makiko Sato, Ken Lo The past year, 2009, tied as the second warmest year in the 130 years of global instrumental temperature records,

### Climate and Weather. This document explains where we obtain weather and climate data and how we incorporate it into metrics:

OVERVIEW Climate and Weather The climate of the area where your property is located and the annual fluctuations you experience in weather conditions can affect how much energy you need to operate your

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### Point and Interval Estimates

Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

### Histogram. Graphs, and measures of central tendency and spread. Alternative: density (or relative frequency ) plot /13/2004

Graphs, and measures of central tendency and spread 9.07 9/13/004 Histogram If discrete or categorical, bars don t touch. If continuous, can touch, should if there are lots of bins. Sum of bin heights

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 000: Page 1: DESCRIPTIVE STATISTICS - FREQUENCY DISTRIBUTIONS AND AVERAGES: Inferential and Descriptive Statistics: There are four

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

### Scholar: Elaina R. Barta. NOAA Mission Goal: Climate Adaptation and Mitigation

Development of Data Visualization Tools in Support of Quality Control of Temperature Variability in the Equatorial Pacific Observed by the Tropical Atmosphere Ocean Data Buoy Array Abstract Scholar: Elaina

### Pooling and Meta-analysis. Tony O Hagan

Pooling and Meta-analysis Tony O Hagan Pooling Synthesising prior information from several experts 2 Multiple experts The case of multiple experts is important When elicitation is used to provide expert

### 17.0 Linear Regression

17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

### Introduction to Diophantine Equations

Introduction to Diophantine Equations Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles September, 2006 Abstract In this article we will only touch on a few tiny parts of the field

### Regression analysis in practice with GRETL

Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that

### Predictability of Average Inflation over Long Time Horizons

Predictability of Average Inflation over Long Time Horizons Allan Crawford, Research Department Uncertainty about the level of future inflation adversely affects the economy because it distorts savings

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Hypothesis testing - Steps

Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### Data Envelopment Analysis: A Primer for Novice Users and Students at all Levels

Data Envelopment Analysis: A Primer for Novice Users and Students at all Levels R. Samuel Sale Lamar University Martha Lair Sale Florida Institute of Technology In the three decades since the publication

### Factorial Analysis of Variance

Chapter 560 Factorial Analysis of Variance Introduction A common task in research is to compare the average response across levels of one or more factor variables. Examples of factor variables are income

### Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

### This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

### Pearson s Correlation

Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the

### OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

### Linear Models in STATA and ANOVA

Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

### Comparing two groups (t tests...)

Page 1 of 33 Comparing two groups (t tests...) You've measured a variable in two groups, and the means (and medians) are distinct. Is that due to chance? Or does it tell you the two groups are really different?

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Introductory Statistics Notes

Introductory Statistics Notes Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August

### Probability of rejecting the null hypothesis when

Sample Size The first question faced by a statistical consultant, and frequently the last, is, How many subjects (animals, units) do I need? This usually results in exploring the size of the treatment

### COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

### THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

### Plots, Curve-Fitting, and Data Modeling in Microsoft Excel

Plots, Curve-Fitting, and Data Modeling in Microsoft Excel This handout offers some tips on making nice plots of data collected in your lab experiments, as well as instruction on how to use the built-in

### Inferences About Differences Between Means Edpsy 580

Inferences About Differences Between Means Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Inferences About Differences Between Means Slide

### Chapter 9. Section Correlation

Chapter 9 Section 9.1 - Correlation Objectives: Introduce linear correlation, independent and dependent variables, and the types of correlation Find a correlation coefficient Test a population correlation

### Lecture 3 : Hypothesis testing and model-fitting

Lecture 3 : Hypothesis testing and model-fitting These dark lectures energy puzzle Lecture 1 : basic descriptive statistics Lecture 2 : searching for correlations Lecture 3 : hypothesis testing and model-fitting

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

### Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

### Examination 110 Probability and Statistics Examination

Examination 0 Probability and Statistics Examination Sample Examination Questions The Probability and Statistics Examination consists of 5 multiple-choice test questions. The test is a three-hour examination

### Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA

Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA Introduction A common question faced by many actuaries when selecting loss development factors is whether to base the selected

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### SPSS on two independent samples. Two sample test with proportions. Paired t-test (with more SPSS)

SPSS on two independent samples. Two sample test with proportions. Paired t-test (with more SPSS) State of the course address: The Final exam is Aug 9, 3:30pm 6:30pm in B9201 in the Burnaby Campus. (One

### Domain of a Composition

Domain of a Composition Definition Given the function f and g, the composition of f with g is a function defined as (f g)() f(g()). The domain of f g is the set of all real numbers in the domain of g such

### Analysis of Variance ANOVA

Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.