# Regression analysis in the Assistant fits a model with one continuous predictor and one continuous response and can fit two types of models:

Save this PDF as:

Size: px
Start display at page:

Download "Regression analysis in the Assistant fits a model with one continuous predictor and one continuous response and can fit two types of models:"

## Transcription

1 This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. The simple regression procedure in the Assistant fits linear and quadratic models with one continuous predictor (X) and one continuous response (Y) using least squares estimation. The user can select the model type or allow the Assistant to select the best fitting model. In this paper, we explain the criteria the Assistant uses to select the regression model. Additionally, we examine several factors that are important to obtain a valid regression model. First, the sample must be large enough to provide enough power for the test and to provide enough precision for the estimate of the strength of the relationship between X and Y. Next, it is important to identify unusual data that may affect the results of the analysis. We also consider the assumption that the error term follows a normal distribution and evaluate the impact of nonnormality on the hypothesis tests of the overall model and the coefficients. Finally, to ensure that the model is useful, it is important that the type of model selected accurately reflects the relationship between X and Y. Based on these factors, the Assistant automatically performs the following checks on your data and reports the findings in the Report Card: Amount of data Unusual data Normality Model fit In this paper, we investigate how these factors relate to regression analysis in practice and we describe how we established the guidelines to check for these factors in the Assistant.

2 Regression analysis in the Assistant fits a model with one continuous predictor and one continuous response and can fit two types of models: Linear: F(x) = β 0 + β 1 X Quadratic: F(x) = β 0 + β 1 X + β X The user can select the model before performing the analysis or can allow the Assistant to select the model. There are several methods that can be used to determine which model is most appropriate for the data. To ensure that the model is useful, it is important that the type of model selected accurately reflects the relationship between X and Y. We wanted to examine the different methods that can be used for model selection to determine which one to use in the Assistant. We examined three methods that are typically used for model selection (Neter et al., 1996). The first method identifies the model in which the highest order term is significant. The second method selects the model with the highest R adj value. The third method selects the model in which the overall F-test is significant. For more details, see Appendix A. To determine the approach in the Assistant, we examined the methods and compared their calculations to one another. We also gathered feedback from experts in quality analysis. Based on our research, we decided to use the method that selects the model based on the statistical significance of the highest order term in the model. The Assistant first examines the quadratic model and tests whether the square term (β ) in the model is statistically significant. If that term is not significant, then it drops the quadratic term from the model and tests the linear term (β 1 ). The model selected through this approach is presented in the Model Selection Report. Additionally, if the user selected a model that is different than the one selected by the Assistant, we report that in the Model Selection Report and the Report Card. We chose this method in part because of feedback from quality professionals who said they generally prefer simpler models, which exclude terms that are not significant. Additionally, based on our comparison of the methods, using the statistical significance of the highest term in the model is more stringent than the method that selects the model based on the highest R adj value. For more details, see Appendix A.

3 Although we use the statistical significance of highest model term to select the model, we also present the R adj value and the overall F-test for the model in the Model Selection Report. To see the status indicators presented in the Report Card, see the Model fit data check section below.

5 Regression can detect relationships between X and Y fairly easily. Therefore, if you find a statistically significant relationship, you should also evaluate the strength of the relationship using R adj. We found that if the sample size is not large enough, R adj is not very reliable and can vary widely from sample to sample. However, with a sample size of 40 or more, we found that R adj values are more stable and reliable. With a sample size of 40, you can be 90% confident that observed value of R adj will be within 0.0 of ρ adj regardless of the actual value and the model type (linear or quadratic). For more detail on the results of the simulations, see Appendix B. Based on these results, the Assistant displays the following information in the Report Card when checking the amount of data: In the Assistant Regression procedure, we define unusual data as observations with large standardized residuals or large leverage values. These measures are typically used to identify unusual data in regression analysis (Neter et al., 1996). Because unusual data can have a strong influence on the results, you may need to correct the data to make the analysis valid. However, unusual data can also result from the natural variation in the process. Therefore, it is important to identify the cause of the unusual behavior to determine how to handle such data points. We wanted to determine how large the standardized residuals and leverage values need to be to signal that a data point is unusual. We developed our guidelines for identifying unusual observations based on the standard Regression procedure in Minitab (Stat > Regression > Regression).

6 The standardized residual equals the value of a residual, e i, divided by an estimate of its standard deviation. In general, an observation is considered unusual if the absolute value of the standardized residual is greater than. However, this guideline is somewhat conservative. You would expect approximately 5% of all observations to meet this criterion by chance (if the errors are normally distributed). Therefore, it is important to investigate the cause of the unusual behavior to determine if an observation truly is unusual. Leverage values are related only to the X value of an observation and do not depend on the Y value. An observation is determined to be unusual if the leverage value is more than 3 times the number of model coefficients (p) divided by the number of observations (n). Again, this is a commonly used cut-off value, although some textbooks use p (Neter et al., 1996). If your data include any high leverage points, consider whether they have undue influence over the type of model selected to fit the data. For example, a single extreme X value could result in the selection of a quadratic model instead of a linear model. You should consider whether the observed curvature in the quadratic model is consistent with your understanding of the process. If it is not, fit a simpler model to the data or gather additional data to more thoroughly investigate the process. When checking for unusual data, the Assistant Report Card displays the following status indicators: n A typical assumption in regression is that the random errors (ε) are normally distributed. The normality assumption is important when conducting hypothesis tests of the estimates of the coefficients (β). Fortunately, even when the random errors are not normally distributed, the test results are usually reliable when the sample is large enough.

7 We wanted to determine how large the sample needs to be to provide reliable results based on the normal distribution. We wanted to determine how closely the actual test results matched the target level of significance (alpha, or Type I error rate) for the test; that is, whether the test incorrectly rejected the null hypothesis more often or less often than expected for different nonnormal distributions. To estimate the Type I error rate, we performed multiple simulations with skewed, heavy-tailed, and light-tailed distributions that depart substantially from the normal distribution. We conducted simulations for the linear and quadratic models using a sample size of 15. We examined both the overall F-test and the test of the highest order term in the model. For each condition, we performed 10,000 tests. We generated random data so that for each test, the null hypothesis is true. Then, we performed the tests using a target significance level of We counted the number of times out of 10,000 that the tests actually rejected the null hypothesis, and compared this proportion to the target significance level. If the test performs well, the Type I error rates should be very close to the target significance level. See Appendix C for more information on the simulations. For both the overall F-test and for the test of the highest order term in the model, the probability of finding statistically significant results does not differ substantially for any of the nonnormal distributions. The Type I error rates are all between and 0.059, very close to the target significance level of Because the tests perform well with relatively small samples, the Assistant does not test the data for normality. Instead, the Assistant checks the size of the sample and indicates when the sample is less than 15. The Assistant displays the following status indicators in the Report Card for Regression:

8 You can select the linear or quadratic model before performing the regression analysis or you can choose for the Assistant to select the model. Several methods can be used to select an appropriate model. We wanted to examine the different methods used to select a model type to determine which approach to use in the Assistant. We examined three methods that are typically used for model selection. The first method identifies the model in which the highest order term is significant. The second method selects the model with the highest R adj value. The third method selects the model in which the overall F-test is significant. For more details, see Appendix A. To determine the approach used in the Assistant, we examined the methods and how their calculations compared to one another. We also gathered feedback from experts in quality analysis. We decided to use the method that selects the model based on the statistical significance of the highest order term in the model. The Assistant first examines the quadratic model and tests whether the square term in the model (β 3 ) is statistically significant. If that term is not significant, then it tests the linear term (β 1 ) in the linear model. The model selected through this approach is presented in the Model Selection Report. Additionally, if the user selected a model that is different than the one selected by the Assistant, we report that in the Model Selection Report and the Report Card. For more information, see the Regression method section above. Based on our findings, the Assistant Report Card displays the following status indicator:

9 Neter, J., Kutner, M.H., Nachtsheim, C.J., & Wasserman, W. (1996). Applied linear statistical models. Chicago: Irwin.

10 A regression model relating a predictor X to a response Y is of the form: Y = f(x) + ε where the function f(x) represents the expected value (mean) of Y given X. In the Assistant, there are two choices for the form of the function f(x): β 0 + β 1 X β 0 + β 1 X + β X The values of the coefficients β are unknown and must be estimated from the data. The method of estimation is least squares, which minimizes the sum of squared residuals in the sample: n min (Y i f (X i )). i=1 A residual is the difference between the observed response Y i and the fitted value f (X i ) based on the estimated coefficients. The minimized value of this sum of squares is the SSE (error sum of squares) for a given model. To determine the method used in the Assistant to select the model type, we evaluated three options: Significance of the highest order term in the model The overall F-test of the model Adjusted R value (R adj ) In this approach, the Assistant starts with the quadratic model. The Assistant tests the hypotheses for the square term in the quadratic model: H 0 : β = 0 H 1 : β 0

15 In this section we consider how n, the number of observations, affects the power of the overall model test and the precision of R adj, the estimate of the strength of the model. To quantify the strength of the relationship, we introduce a new quantity, ρ adj, as the population counterpart of the sample statistic R adj. Recall that Therefore, we define R adj = 1 ρ adj = 1 SSE/(n p) SSTO/(n 1) E(SSE X)/(n p) E(SSTO X)/(n 1) The operator E( X) denotes the expected value, or the mean of a random variable given the value of X. Assuming the correct model is Y = f(x) + ε with independent identically distributed ε, we have E(SSE X) n p where f = 1 n Hence, = σ = Var(ε) n i=1 f(x i). E(SSTO X) n 1 ρ adj n = (f(x i) f ) (n 1) + σ + σ (f(x i) f ) (n 1) + σ i=1 n n i=1 = (f(x i) f ) i=1 (n 1) n (f(x i ) f ) (n 1) + σ i=1 When testing the statistical significance of the overall model, we assume that the random errors ε are independent and normally distributed. Then, under the null hypothesis that the mean of Y is constant (f(x) = β 0 ), the F-test statistic has an F(p 1, n p) distribution. Under the alternative hypothesis, the F-statistic has a noncentral F(p 1, n p, θ) distribution with noncentrality parameter: n θ = (f(x i ) f ) σ i=1 = (n 1)ρ adj 1 ρ adj

16 The probability of rejecting H 0 increases with the noncentrality parameter, which is increasing in both n and ρ adj. Using the formula above, we calculated the power of the overall F-tests for a range of ρ adj values when n = 15 for the linear and quadratic models. See Table for the results. Table Power for linear and quadratic models with different ρ adj values with n=15 ρ adj θ Overall, we found that the test has high power when the relationship between X and Y is strong and the sample size is at least 15. For example when ρ adj = 0.65, Table shows that the

17 probability of finding a statistically significant linear relationship at α = 0.05 is The failure to detect such a strong relationship with the F-test would occur in less than 0.5% of samples. Even for a quadratic model, the failure to detect the relationship with the F-test would occur in less than % of samples. Thus, when the test fails to find a statistically significant relationship with 15 or more observations, it is a good indication that the true relationship, if there is one at all, has a ρ adj value lower than Note that ρ adj does not have to be as large as 0.65 to be of practical interest. We also wanted to examine the power of the overall F-test when the sample size was larger (n=40). We determined that the sample size n = 40 is an important threshold for the precision of the R adj (see Strength of the relationship below) and we wanted to evaluate power values for the sample size. We calculated the power of the overall F-tests for a range of ρ adj values when n = 40 for the linear and quadratic models. See Table 3 for the results. Table 3 Power for linear and quadratic models with different ρ adj values with n = 40 ρ adj θ We found that the power was high, even when the relationship between X and Y was moderately weak. For example, even when ρ adj = 0.5, Table 3 shows that the probability of finding a statistically significant linear relationship at α = 0.05 is With 40 observations,

22 The regression models used in the Assistant are all of the form: Y = f(x) + ε The typical assumption about the random terms ε is that they are independent and identically distributed normal random variables with mean zero and common variance σ. The least squares estimates of the β parameters are still the best linear unbiased estimates, even if we forgo the assumption that the ε are normally distributed. The normality assumption only becomes important when we try to attach probabilities to these estimates, as we do in the hypothesis tests about f(x). We wanted to determine how large n needs to be so that we can trust the results of a regression analysis based on the normality assumption. We performed simulations to explore the Type I error rates of the hypothesis tests under a variety of nonnormal error distributions. Table 4 below shows the proportion of 10,000 simulations in which the overall F-test was significant at α = 0.05 for various distributions of ε for the linear and quadratic models. In these simulations, the null hypothesis, which states that there is no relationship between X and Y, was true. The X values were evenly spaced over an interval. We used a sample size of n=15 for all tests. Table 4 Type I error rates for overall F-tests for linear and quadratic models with n=15 for nonnormal distributions

23 Next, we examined the test of the highest order term used to select the best model. For each simulation, we considered whether the square term was significant. For cases where the square term was not significant, we considered whether the linear term was significant. In these simulations, the null hypothesis was true, target α = 0.05 and n=15. Table 5 Type I error rates for tests of highest order term for linear or quadratic models with n=15 for nonnormal distributions The simulation results show, that for both the overall F-test and for the test of the highest order term in the model, the probability of finding statistically significant results does not differ substantially for any of the error distributions. The Type I error rates are all between and

24 Minitab, Quality. Analysis. Results. and the Minitab logo are registered trademarks of Minitab, Inc., in the United States and other countries. Additional trademarks of Minitab Inc. can be found at All other marks referenced remain the property of their respective owners. 014 Minitab Inc. All rights reserved.

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Design of Experiments (DOE)

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Design

### Factorial Analysis of Variance

Chapter 560 Factorial Analysis of Variance Introduction A common task in research is to compare the average response across levels of one or more factor variables. Examples of factor variables are income

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Variables Control Charts

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Statistical Inference and t-tests

1 Statistical Inference and t-tests Objectives Evaluate the difference between a sample mean and a target value using a one-sample t-test. Evaluate the difference between a sample mean and a target value

### Prediction and Confidence Intervals in Regression

Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SH-DH. Hours are detailed in the syllabus.

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### Multiple Hypothesis Testing: The F-test

Multiple Hypothesis Testing: The F-test Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost

### CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples

### Standard Deviation Calculator

CSS.com Chapter 35 Standard Deviation Calculator Introduction The is a tool to calculate the standard deviation from the data, the standard error, the range, percentiles, the COV, confidence limits, or

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Lecture Notes Module 1

Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

### Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### AP Statistics 2002 Scoring Guidelines

AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought

### UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates

UCLA STAT 13 Statistical Methods - Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally

### Measuring the Power of a Test

Textbook Reference: Chapter 9.5 Measuring the Power of a Test An economic problem motivates the statement of a null and alternative hypothesis. For a numeric data set, a decision rule can lead to the rejection

### AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

### SIMPLE REGRESSION ANALYSIS

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

### Standard Deviation Estimator

CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

### Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit

Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The

### Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

Using Your TI-NSpire Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.

### t-tests and F-tests in regression

t-tests and F-tests in regression Johan A. Elkink University College Dublin 5 April 2012 Johan A. Elkink (UCD) t and F-tests 5 April 2012 1 / 25 Outline 1 Simple linear regression Model Variance and R

### Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

### Statistics 2014 Scoring Guidelines

AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Power and Sample Size Determination

Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 Power 1 / 31 Experimental Design To this point in the semester,

### Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

### Regression Analysis. Data Calculations Output

Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### AP Statistics 2012 Scoring Guidelines

AP Statistics 2012 Scoring Guidelines The College Board The College Board is a mission-driven not-for-profit organization that connects students to college success and opportunity. Founded in 1900, the

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

### PASS Sample Size Software

Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

### Statistical Significance and Bivariate Tests

Statistical Significance and Bivariate Tests BUS 735: Business Decision Making and Research 1 1.1 Goals Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions,

### Extending Hypothesis Testing. p-values & confidence intervals

Extending Hypothesis Testing p-values & confidence intervals So far: how to state a question in the form of two hypotheses (null and alternative), how to assess the data, how to answer the question by

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### Spearman s correlation

Spearman s correlation Introduction Before learning about Spearman s correllation it is important to understand Pearson s correlation which is a statistical measure of the strength of a linear relationship

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### Sampling Distribution of a Normal Variable

Ismor Fischer, 5/9/01 5.-1 5. Formal Statement and Examples Comments: Sampling Distribution of a Normal Variable Given a random variable. Suppose that the population distribution of is known to be normal,

### Notes on Applied Linear Regression

Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

### NEW TABLE AND NUMERICAL APPROXIMATIONS FOR KOLMOGOROV-SMIRNOV/LILLIEFORS/VAN SOEST NORMALITY TEST

NEW TABLE AND NUMERICAL APPROXIMATIONS FOR KOLMOGOROV-SMIRNOV/LILLIEFORS/VAN SOEST NORMALITY TEST PAUL MOLIN AND HERVÉ ABDI Abstract. We give new critical values for the Kolmogorov-Smirnov/Lilliefors/Van

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Chapter 14: 1-6, 9, 12; Chapter 15: 8 Solutions When is it appropriate to use the normal approximation to the binomial distribution?

Chapter 14: 1-6, 9, 1; Chapter 15: 8 Solutions 14-1 When is it appropriate to use the normal approximation to the binomial distribution? The usual recommendation is that the approximation is good if np

### Chapter 8: Introduction to Hypothesis Testing

Chapter 8: Introduction to Hypothesis Testing We re now at the point where we can discuss the logic of hypothesis testing. This procedure will underlie the statistical analyses that we ll use for the remainder

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

### STA 4163 Lecture 10: Practice Problems

STA 463 Lecture 0: Practice Problems Problem.0: A study was conducted to determine whether a student's final grade in STA406 is linearly related to his or her performance on the MATH ability test before

### INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

### Using JMP with a Specific

1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Instrumental Variables & 2SLS

Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20 - Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### Chapter Additional: Standard Deviation and Chi- Square

Chapter Additional: Standard Deviation and Chi- Square Chapter Outline: 6.4 Confidence Intervals for the Standard Deviation 7.5 Hypothesis testing for Standard Deviation Section 6.4 Objectives Interpret

### Hypothesis Testing I

ypothesis Testing I The testing process:. Assumption about population(s) parameter(s) is made, called null hypothesis, denoted. 2. Then the alternative is chosen (often just a negation of the null hypothesis),

### Nonparametric Statistics

1 14.1 Using the Binomial Table Nonparametric Statistics In this chapter, we will survey several methods of inference from Nonparametric Statistics. These methods will introduce us to several new tables

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### AP STATISTICS 2009 SCORING GUIDELINES (Form B)

AP STATISTICS 2009 SCORING GUIDELINES (Form B) Question 5 Intent of Question The primary goals of this question were to assess students ability to (1) state the appropriate hypotheses, (2) identify and

### Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA

Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA Introduction A common question faced by many actuaries when selecting loss development factors is whether to base the selected

### Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

### UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

### Don t forget the degrees of freedom: evaluating uncertainty from small numbers of repeated measurements

Don t forget the degrees of freedom: evaluating uncertainty from small numbers of repeated measurements Blair Hall b.hall@irl.cri.nz Talk given via internet to the 35 th ANAMET Meeting, October 20, 2011.

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### UNDERSTANDING THE ONE-WAY ANOVA

UNDERSTANDING The One-way Analysis of Variance (ANOVA) is a procedure for testing the hypothesis that K population means are equal, where K >. The One-way ANOVA compares the means of the samples or groups

### Hypothesis Testing Summary

Hypothesis Testing Summary Hypothesis testing begins with the drawing of a sample and calculating its characteristics (aka, statistics ). A statistical test (a specific form of a hypothesis test) is an

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables