The Philosophy of Hypothesis Testing, Questions and Answers 2006 Samuel L. Baker

Save this PDF as:

Size: px
Start display at page:

Transcription

1 HYPOTHESIS TESTING PHILOSOPHY 1 The Philosophy of Hypothesis Testing, Questions and Answers 2006 Samuel L. Baker Question: So I'm hypothesis testing. What's the hypothesis I'm testing? Answer: When you're testing a hypothesis on a regression coefficient, the hypothesis you're testing is that the true coefficient is equal to some value that you specify. For example, you often test the hypothesis that the true value of a coefficient is zero. Question: Wait a minute. I just did a regression and found that the slope coefficient, the coefficient of my X variable, is Isn't that the true value? Answer: Probably not, according to the classical philosophy of statistics. A classical statistician would say that the data you used in your regression were just one of many possible data sets that the true relationship might have produced. Even if the true value of the coefficient were , you'd have to be extraordinarily lucky to find a data set that would give you that number when you ran your regression. Question: I'd like to test the hypothesis that the true value of X's coefficient is zero. I find that the number in my regression output for the t-statistic is bigger than the number in the t table in the 0.05 column at the number of degrees of freedom my model has. Does this mean that I'm 95% sure that the true coefficient is not zero? Answer: It depends on what you mean by "95% sure." Question: Does it mean that the probability is 95% that the true coefficient is not zero? Some books t tables label the column you use to do a twotailed test at the 95% confidence level "0.025." Such a table is for a one-tailed t test. The critical value for a one-tailed test at the 97.5% level is the same as the critical value of a two-tailed test at the 95% level. Answer: Here's that fine point of statistical philosophy again. There's nothing probabilistic or random about the true value. For example, suppose that we want to know the average age of students in the Public Health School. That's a specific non-random number. We don't know what it is, though. If we estimate based on a random sample of students, then it's our estimated value that's random, because we get a different number depending on who happens to be in our sample. Question: That's OK for a random sample from a big population, but what if we have the whole population in our data? Suppose my dependent (Y) variable is the number of infant deaths in each county in South Carolina. Suppose my independent (X) variables are social and economic factors. Your theory has me treat my Y variable as if it were random. But, the number of infant deaths last year in Charleston County, for instance, is not a random number. DHEC counts it carefully, and it's based on the whole population of that county, not a sample. Answer: You have a point there. It's strange to think of reality as random while maintaining that something that exists only in theory, the true value, is absolute. But, that's the classical statistical philosophy. Plato would like it. Question: OK, classical statistical philosopher, what do I mean when I say that I'm 95% confident that some coefficient is not zero?

2 HYPOTHESIS TESTING PHILOSOPHY 2 Answer: You mean this: If the true value of the coefficient were zero, the probability is at least 95% that a regression would have given an estimated value for that coefficient that was closer to zero than the estimated value that you got. Question: That's quite a mouthful. Answer: Yes, and it's stated that way because it's the estimated value that's considered random, not the true value. Question: Still, it takes some work to follow the logic. Answer: Suppose you could run lots of regressions on different data sets, each of which was independently generated by the same true relationship in which the true coefficient of X is 0. (This means that truly there is no relationship.) The more regressions you run, the more likely it is that 95% of the t- statistics you get will be closer to 0 than the critical value in the t table. Question: By the way, what's so special about 95%? Why does everybody use it? Answer: It's just the conventional confidence level. It's reasonably conservative for many purposes. There are two types of errors of inference that you can make testing hypotheses. A Type I error (that's what it's actually called) is rejecting a hypothesis that is really true. A typical Type I error is declaring that a coefficient is not zero, but really it is. The 95% confidence level means that the probability of a Type I error is only 5%. Question: Why not choose a 99% confidence level, or a % confidence level, to make the probability of a Type I error very low? Answer: The lower the probability of a Type I error, the higher the probability of a Type II error. A Type II error is refusing to reject a hypothesis that is really false. This happens if the true coefficient is not zero, but you are unwilling to say it's not zero because your estimated coefficient was not far enough from zero to satisfy you. There is an unavoidable tradeoff here. The 95% confidence level is the usual compromise. Sometimes you'll use the 99% confidence level, if you're a lot more worried about a Type I error than a Type II error. This concern is not just academic. Consider the testing of the safety and efficacy of a new drug. Many lives and many dollars can ride on the selection of an appropriate confidence level.

3 HYPOTHESIS TESTING PHILOSOPHY 3 This table summarizes what Type I and Type II errors are. We do not reject the hypothesis. We say We do reject the hypothesis. Hypothesis is really True We are correct. Type I error False Type II error We are correct. The hypothesis can be true or false. We can refuse to reject the hypothesis, or we can reject the hypothesis. The table above shows the four combinations of those possibilities. If what we say matches the truth, then we are correct. Otherwise, we have made either a Type I or a Type II error. Question: Suppose I want to test the hypothesis that a coefficient equals some number other than zero? Answer: Subtract the hypothesized value from the estimated coefficient you got on your regression output. Divide that difference by the estimated standard error of that coefficient. Compare the quotient (ignoring the minus sign if you have one) with the critical value in the t table. If the quotient is bigger, reject the hypothesis. Question: What's a "confidence interval"? Answer: It's a short cut. It lets you avoid having to repeat the above calculation over and over. Once you compute the confidence interval, you know that you'd reject any hypothesized value that's outside the confidence interval, but not reject any hypothesized value inside the interval. Incidentally, you'll find that the 99% confidence interval is wider than the 95% confidence interval. The bigger confidence interval means a smaller chance of a Type I error, but a bigger chance of a Type II error. Question: Is the probability 95% that the true value lies inside the 95% confidence interval? Answer: Again, the answer is no, not according to classical statistics. The 95% confidence interval you calculate is considered random, because it depends on your data, which are presumed to be one of many possible realities. If, across lots of studies of different things, you make a practice of rejecting hypothesized values because they are outside the 95% confidence intervals, then the tendency is that you'll be right 95% of the time. Question: Classical statistical philosophy is awfully convoluted. Answer: Yes. There is an alternative approach, called Bayesian. This does allow you to apply the concept of probability to true values. However, the classical approach is by far the predominant one, so you should try to understand it so you can converse with other statisticians. Question: I can grind through the mechanics of hypothesis testing, but whole thing seems mysterious. What am I really doing?

4 HYPOTHESIS TESTING PHILOSOPHY 4 Answer: If you accept the idea that your data are the result of a random process, then any number ( statistic ) that you calculate from your data is also the result of the random process. If you make certain assumptions ( hypotheses ) about the random process that generated your data, then you can calculate the distribution of your statistic. You can calculate the probability that your statistic will take on any particular value or be inside any particular range of values. If your statistic is far from its expected value, such that being that far from the expected value is very unlikely, then you reject your hypothesis about how the data were generated. Here's how that works with a coin toss experiment. Flip the coin six times. You can then calculate a statistic, the total number of times it comes up heads in six flips. (This is not a complex calculation. You just count the number of heads and that's your statistic.) Now, hypothesize that the probability that the coin will come up heads is 0.5, and that the flips are independent. Based on your hypothesis -- those assumptions -- you can calculate the distribution of your statistic. It is: Possible values of the statistic, the number of heads in six coin tosses Probability that the statistic is one of those values, assuming that the hypothesis that the coin is fair is true 0 1/ / / / / / / All 1 1 Probability expressed as decimal This distribution has a fat middle, in the sense that the most likely values are towards the middle (2, 3, and 4). The least likely values are towards the tails (0 and 6). If our hypothesis is that the coin is fair, we want to reject that if our statistic is too far out towards either tail. That is, we want to be able to reject the hypothesis if we get either too many heads or too few. To do a two-tailed test, we combine the tails, like this:

5 HYPOTHESIS TESTING PHILOSOPHY 5 Possible values of the statistic, the number of heads in six coin tosses Table for a two-tailed test that the coin is fair Probability that the statistic is one of those values, assuming that the hypothesis that the coin is fair is true 0 or 6 2/64 = Probability that the statistic (the number of heads) is this far or further from its expected value (3). 1 or 5 12/64 = = or 4 30/64 = = the expected value 20/64 = All 1 1 Compare the value of your statistic (number of heads in your actual six tosses) with the table. If your statistic is 0 or 6, you have an outcome which is quite unlikely to happen -- less than a 0.05 probability -- if the hypothesis that the coin is fair is true. You therefore reject the hypothesis and say the coin is not fair. If your statistic is 1 or 5, you cannot reject the hypothesis, because your outcome is not unlikely enough. The probability of being that far from the expected value, 3 heads, is the probability of 1 or 5 plus the probability of 0 or 6, which is , for a total of This is greater than 0.05, your cutoff for rejecting an hypothesis. If you say that the coin is not fair, there is a probability that you are mistaken. That is too high a risk of a mistake for most researchers. The t-test calculation is more complicated, but the idea is the same. You hypothesize that the true coefficient is 0 (or whatever number you want) and you assume that the errors have the normal distribution. The hypothesis, plus the very strong assumption about the errors being normal, imply that the statistic you calculate from the data and call t has a known distribution that you can look up in a table. If, according to the table, the t value you got is very unlikely, then you can conclude that hypothesis is wrong. You reject the hypothesis. Question: We made two statements: (1) that the true coefficient is 0 (or whatever I wanted to test) and (2) that the errors have the normal distribution. When we reject a hypothesis, we always reject (1) and not (2). Why is that? If you aren t sure about (2) then you shouldn t use the t statistic for your hypothesis test. There are other tests that can help you determine whether (2) is appropriate.

6 T TESTS 6 More on T Tests You calculate a special number, called the t-statistic. You calculate it based on the assumption that a certain hypothesis is true. The hypothesis is usually that the true value of a certain one of your regression s coefficients is zero. What does it mean if the coefficient of a variable in a regression is zero? It means that you can leave that variable out of your equation without losing any predictive power. Consider a simple regression, of the form Y = " + \$X + error. If \$ is zero, the \$X term disappears, and the reduces to Y = " + error. Y does not depend on X. Knowing what X is does not help predict what Y is. To calculate the t-statistic for testing whether a particular \$ is zero, we divide our estimate of \$ by its standard error. This measures how steep the regression line is, relative to how far the observed Y values are from the regression line and how spread out the X values are. The t-statistic has no units of measurement. This means that the t-statistic comes out the same regardless of how X and Y are measured. For example, if X and Y are heights of people, as in Francis Galton s original study (to be mentioned in class), we get the same number for the t-statistic if we measure heights in inches or centimeters. Better than that, the t-statistic has a known distribution if the hypothesis that you are testing is true. The printed t-table shows what that distribution is. In any data set with two variables, X and Y, the data for the two variables will almost always correlate at least a little bit, even if there is no true underlying relationship. Suppose you have two rooms. In each room, you dump 1000 coins on the floor. Most likely, one of the rooms will have more heads than the other. A t-test can tell you if the difference is large enough to mean that there s really a difference between the rooms, or if the difference is just by luck. When we compare our calculated t-statistic with the critical value in the t-table, we are asking if the slope estimate we got might have happened just by luck. This could happen if the random errors correlate with X by luck. (Correlate means that there is a linear relationship.) When the errors correlate with X, then the Y values correlate with X, too. We can get fooled into thinking X and Y are related. If there is really no true relationship between X and Y, then random errors will probably work out so that the t-statistic will be close to zero. T-statistics far from zero are unlikely, unless the assumption about there being no true relationship between X and Y is wrong. If the data are such that higher X values are associated with higher Y values, then, when you do a regression with the equation Y = " + \$X + error, the t-statistic for the estimate of \$ will be positive. If lower X values are associated with higher Y values, then the t-statistic will be. The more that the X and Y data are correlated, and the steeper the line is that they seem to lie along, the further the t-statistic will be from zero, either positively or negatively. The table of t-statistic critical values has lots of rows. Theoretically, it could have an infinite number of rows, but my table shows only what will fit on one page. The multitude of rows in the table means that

7 T TESTS 7 there is a family of t distributions, one t distribution for each number of degrees of freedom. (The degrees of freedom is the number of observations minus the number of parameters in the regression equation). Each of these t distributions has a bell shape. When the number of degrees of freedom is small, the bell is wide, so the critical value numbers near the top of the t table are higher. When the number of degrees of freedom is large, the bell is narrower, and the critical value numbers in the t table are smaller. If the number of degrees of freedom is very large, the t distribution becomes indistinguishable from the normal distribution. The t distribution with 50 degrees of freedom. Shaded are the two tails that together have 5% of the total area under the curve. Reject the null hypothesis if the t-statistic is in either tail. Here is a t-distribution bell curve for 50 degrees of freedom. Strictly speaking, this curve is the density function for the t-distribution. T values are on the horizontal axis (here labeled ). The height of the bell curve over any t value is proportional to how likely it is that you would get something close to that t value purely by luck, with no true relationship between X and Y. The curve is highest above where the horizontal axis is 0, indicating that t-statistics near 0 are the most likely to happen by chance. As you move away from zero, in either the + or the - direction, the curve gets lower down. If you move far enough, you get to where a t value that far from zero would happen by luck only 5 times out of 100. Those places are the critical values of the t distribution at the 5% level. They are shown on this graph as the inside edges of the shaded areas. Because the t-statistic s bell curve is symmetrical left and right, The two places have the same number, except that one is positive and one is negative. This is the number that you find in a statistics book s t table, or in our t table in the downloadable file of tables. The portions of the t distribution that are to the right of the critical value in the positive direction, and to the left of the negative of the critical value in the negative direction, are called the tails of the distribution. These are shaded in the diagram above. The t distribution has two tails, one positive and one negative. We use two-tailed t tests in this course. We reject the null hypothesis that \$ is 0 if our calculated t statistic is in either the positive tail or the negative tail. This is appropriate whenever you are not sure in advance whether X is positively or negatively related to Y, so you want to allow for both possibilities. To put it another way, in a two-tailed test we reject the idea that the true \$ = 0 if our t statistic is greater than the critical t value or if it is less than the negative of the critical t value. That is the same as saying that we reject the hypothesis that \$ is 0 if the absolute value of our t-statistic is greater than the critical value. The t-test can be used to test for other possible coefficient values besides 0. The general formula is t-statistic = (\$^ - \$ test )/(standard error of \$^), where \$ test stands for the \$ value we want to test.

8 T TESTS 8 The null hypothesis is that the true \$ is \$ test. If this t-statistic is larger (in absolute value) than the critical value from the t table, reject the hypothesis that the true \$ is \$ test.

Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

Chapter 7 Part 2. Hypothesis testing Power

Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship

How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

Hypothesis testing - Steps

Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

Multiple Hypothesis Testing: The F-test

Multiple Hypothesis Testing: The F-test Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost

Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

Using Excel for inferential statistics

FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

Chapter 6: t test for dependent samples

Chapter 6: t test for dependent samples ****This chapter corresponds to chapter 11 of your book ( t(ea) for Two (Again) ). What it is: The t test for dependent samples is used to determine whether the

Introduction to Hypothesis Testing

I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

Pearson's Correlation Tests

Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

Hypothesis Testing - Relationships

- Relationships Session 3 AHX43 (28) 1 Lecture Outline Correlational Research. The Correlation Coefficient. An example. Considerations. One and Two-tailed Tests. Errors. Power. for Relationships AHX43

Week 4: Standard Error and Confidence Intervals

Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

Statistical Significance and Bivariate Tests

Statistical Significance and Bivariate Tests BUS 735: Business Decision Making and Research 1 1.1 Goals Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions,

Simple Regression Theory I 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY I 1 Simple Regression Theory I 2010 Samuel L. Baker Regression analysis lets you use data to explain and predict. A simple regression line drawn through data points In Assignment

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

Confidence intervals, t tests, P values

Confidence intervals, t tests, P values Joe Felsenstein Department of Genome Sciences and Department of Biology Confidence intervals, t tests, P values p.1/31 Normality Everybody believes in the normal

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript JENNIFER ANN MORROW: Welcome to "Introduction to Hypothesis Testing." My name is Dr. Jennifer Ann Morrow. In today's demonstration,

Example Hypotheses. Chapter 8-2: Basics of Hypothesis Testing. A newspaper headline makes the claim: Most workers get their jobs through networking

Chapter 8-2: Basics of Hypothesis Testing Two main activities in statistical inference are using sample data to: 1. estimate a population parameter forming confidence intervals 2. test a hypothesis or

Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

HYPOTHESIS TESTING AND TYPE I AND TYPE II ERROR

HYPOTHESIS TESTING AND TYPE I AND TYPE II ERROR Hypothesis is a conjecture (an inferring) about one or more population parameters. Null Hypothesis (H 0 ) is a statement of no difference or no relationship

Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

Inferential Statistics. Probability. From Samples to Populations. Katie Rommel-Esham Education 504

Inferential Statistics Katie Rommel-Esham Education 504 Probability Probability is the scientific way of stating the degree of confidence we have in predicting something Tossing coins and rolling dice

Descriptive statistics; Correlation and regression

Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

About Hypothesis Testing TABLE OF CONTENTS About Hypothesis Testing... 1 What is a HYPOTHESIS TEST?... 1 Hypothesis Testing... 1 Hypothesis Testing... 1 Steps in Hypothesis Testing... 2 Steps in Hypothesis

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

Statistical Inference

Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

Nonparametric Statistics

1 14.1 Using the Binomial Table Nonparametric Statistics In this chapter, we will survey several methods of inference from Nonparametric Statistics. These methods will introduce us to several new tables

Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

Hypothesis Testing. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

Hypothesis Testing Lecture 4 Hypothesis Testing Hypothesis testing is about making decisions Is a hypothesis true or false? Are women paid less, on average, than men? Principles of Hypothesis Testing The

11. Logic of Hypothesis Testing

11. Logic of Hypothesis Testing A. Introduction B. Significance Testing C. Type I and Type II Errors D. One- and Two-Tailed Tests E. Interpreting Significant Results F. Interpreting Non-Significant Results

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

6 3 The Standard Normal Distribution

290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

Extending Hypothesis Testing. p-values & confidence intervals

Extending Hypothesis Testing p-values & confidence intervals So far: how to state a question in the form of two hypotheses (null and alternative), how to assess the data, how to answer the question by

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Describing Populations Statistically: The Mean, Variance, and Standard Deviation BIOLOGICAL VARIATION One aspect of biology that holds true for almost all species is that not every individual is exactly

Comparing Means in Two Populations

Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

Statistical tests for SPSS

Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

Correlational Research

Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

9.1 Basic Principles of Hypothesis Testing

9. Basic Principles of Hypothesis Testing Basic Idea Through an Example: On the very first day of class I gave the example of tossing a coin times, and what you might conclude about the fairness of the

4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

Crosstabulation & Chi Square

Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

AP Statistics 2002 Scoring Guidelines

AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought

Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner)

Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner) The Exam The AP Stat exam has 2 sections that take 90 minutes each. The first section is 40 multiple choice questions, and the second

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

1. How different is the t distribution from the normal?

Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. t-distributions.

Sampling Distribution of the Mean & Hypothesis Testing

Sampling Distribution of the Mean & Hypothesis Testing Let s first review what we know about sampling distributions of the mean (Central Limit Theorem): 1. The mean of the sampling distribution will be

Choosing The More Likely Hypothesis. Richard Startz. revised October 2014. Abstract

Choosing The More Likely Hypothesis Richard Startz revised October 2014 Abstract Much of economists statistical work centers on testing hypotheses in which parameter values are partitioned between a null

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

HypoTesting. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Name: Class: Date: HypoTesting Multiple Choice Identify the choice that best completes the statement or answers the question. 1. A Type II error is committed if we make: a. a correct decision when the

Statistical Functions in Excel

Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

Chapter 11 Testing Hypotheses About Proportions Hypothesis testing method: uses data from a sample to judge whether or not a statement about a population may be true. Steps in Any Hypothesis Test 1. Determine

Contrasts ask specific questions as opposed to the general ANOVA null vs. alternative

Chapter 13 Contrasts and Custom Hypotheses Contrasts ask specific questions as opposed to the general ANOVA null vs. alternative hypotheses. In a one-way ANOVA with a k level factor, the null hypothesis

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

Measuring the Power of a Test

Textbook Reference: Chapter 9.5 Measuring the Power of a Test An economic problem motivates the statement of a null and alternative hypothesis. For a numeric data set, a decision rule can lead to the rejection

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

Statistical Foundations:

Statistical Foundations: Hypothesis Testing Psychology 790 Lecture #9 9/19/2006 Today sclass Hypothesis Testing. General terms and philosophy. Specific Examples Hypothesis Testing Rules of the NHST Game

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty