How to Conduct a Hypothesis Test

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "How to Conduct a Hypothesis Test"

Transcription

1 How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some cause that we should be looking for? We need to have a way to differentiate between events that easily occur by chance and those that are highly unlikely to occur randomly. Such a method should be streamlined and well defined so that others can replicate our statistical experiments. There are a few different methods used to conduct hypothesis tests. One of these methods is known as the traditional method, and another involves what is known as a p- value. The steps of these two most common methods are identical up to a point, then diverge slightly. Both the traditional method for hypothesis testing and the p-value method are outlined below. The Traditional Method The traditional method is as follows: 1. Begin by stating the claim or hypothesis that is being tested. Also form a statement for the case that the hypothesis is false. 2. Express both of the statements from the first step in mathematical symbols. These statements will use symbols such as inequalities and equals signs. 3. Identify which of the two symbolic statements does not have equality in it. This could simply be a "not equals" sign, but could also be an "is less than" sign ( ). The statement containing inequality is called the alternative hypothesis, and is denoted H1 or Ha. 4. The statement from the first step that makes the statement that a parameter equals a particular value is called the null hypothesis, denoted H0. 5. Choose which significance level that we want. A significance level is typically denoted by the Greek letter alpha. Here we should consider Type I errors. A Type I error occurs when we reject a null hypothesis that is actually true. If we are very concerned about this possibility occurring, then our value for alpha should be small. There is a bit of a trade off here. The smaller the alpha, the most costly the experiment. The values 0.05 and 0.01 are common values used for alpha, but any positive number between 0 and 0.50 could be used for a significance level. 6. Determine which statistic and distribution we should use. The type of distribution is dictated by features of the data. Common distributions include: z score, t score and chi-squared. 7. Find the test statistic and critical value for this statistic. Here we will have to consider if we are conducting a two tailed test (typically when the alternative hypothesis contains a is not equal to symbol, or a one tailed test (typically used when an inequality is involved in the statement of the alternative hypothesis). 8. From the type of distribution, confidence level, critical value and test statistic we sketch a graph. 9. If the test statistic is in our critical region, then we must reject the null hypothesis. The alternative hypothesis stands. If the test statistic is not in our critical region, then we fail to reject the null hypothesis. This does not prove that the null hypothesis is true, but gives a way to quantify how likely it is to be true. 10. We now state the results of the hypothesis test in such a way that the original claim is addressed.

2 The p-value Method The p-value method is nearly identical to the traditional method. The first six steps are the same. For step seven we find the test statistic and p-value. We then reject the null hypothesis if p-value is less than or equal to alpha. We fail to reject the null hypothesis if the p-value is greater than alpha. We then wrap up the test as before, by clearly stating the results. An Example of a Hypothesis Test Mathematics and statistics are not for spectators. To truly understand what is going on, we should read through and work through several examples. If we know about the ideas behind hypothesis testing and seen an overview of the method, then the next step is to see an example. The following shows an example of the both traditional method of a hypothesis test and the p-value method. A Statement of the Problem Suppose that a doctor claims that 17 year olds have an average body temperature that is higher than the commonly accepted average human temperature of 98.6 degrees Fahrenheit. A simple random statistical sample of 25 people, each of age 17, is selected. The average temperature of the 17 year olds is found to be 98.9 degrees, with standard deviation of 0.6 degrees. The Null and Alternative Hypotheses The claim being investigated is that the average body temperature of 17 year olds is greater than 98.6 degrees This corresponds to the statement x The negation of this is that the population average is not greater than 98.6 degrees. In other words the average temperature is less than or equal to 98.6 degrees. In symbols this is x < One of these statements must become the null hypothesis, and the other should be the alternative hypothesis. The null hypothesis contains equality. So for the above, the null hypothesis H0 : x = It is common practice to only state the null hypothesis in terms of an equals sign, and not a greater than or equal to or less than or equal to. The statement that does not contain equality is the alternative hypothesis, or H1 : x >98.6. Mathematics and statistics are not for spectators. To truly understand what is going on, we should read through and work through several examples. If we know about the ideas behind hypothesis testing and seen an overview of the method, then the next step is to see an example. The following shows an example of the both traditional method of a hypothesis test and the p-value method.

3 A Statement of the Problem Suppose that a doctor claims that 17 year olds have an average body temperature that is higher than the commonly accepted average human temperature of 98.6 degrees Fahrenheit. A simple random statistical sample of 25 people, each of age 17, is selected. The average temperature of the 17 year olds is found to be 98.9 degrees, with standard deviation of 0.6 degrees. The Null and Alternative Hypotheses The claim being investigated is that the average body temperature of 17 year olds is greater than 98.6 degrees This corresponds to the statement x The negation of this is that the population average is not greater than 98.6 degrees. In other words the average temperature is less than or equal to 98.6 degrees. In symbols this is x < One of these statements must become the null hypothesis, and the other should be the alternative hypothesis. The null hypothesis contains equality. So for the above, the null hypothesis H0 : x = It is common practice to only state the null hypothesis in terms of an equals sign, and not a greater than or equal to or less than or equal to. The statement that does not contain equality is the alternative hypothesis, or H1 : x >98.6. What is the Difference Between Alpha and P-Values In conducting a test of significance or hypothesis test there are two numbers that are easy to get confused. One number is called the p-value of the test statistic. The other number of interest is the level of significance, or alpha. These numbers are easily confused because they are both numbers between zero and one, and are in fact probabilities. Alpha The Level of Significance The number alpha is the threshold value that we measure p values against. It tells us how extreme observed results must be in order to reject the null hypothesis of a significance test. The value of alpha is associated to the confidence level of our test. The following lists some levels of confidence with their related values of alpha: For results with a 90% level of confidence, the value of alpha is = For results with a 95% level of confidence, the value of alpha is = For results with a 99% level of confidence, the value of alpha is = And in general, for results with a C% level of confidence, the value of alpha is 1 C/100. Although in theory and practice many numbers can be used for alpha, the most commonly used is The reason for this both because consensus shows that this level is appropriate, and historically it has been accepted as the standard.

4 The alpha value gives us the probability of a type I error. Type I errors occur when we reject a null hypothesis that is actually true. Thus, in the long run, for a test with level of significance of 0.05 = 1/20, a true null hypothesis will be rejected one out of every 20 times. P-Values (more on p-values below) The other number that is part of a test of significance is a p-value. A p-value is also a probability, but it comes from a different source than alpha. Every test statistic has a corresponding probability or p-value. This value is the probability that the observed statistic occurred by chance alone. Since there are a number of different test statistics, there are a number of different ways to find a p-value. For some cases we need to know the probability distribution of the population. The p-value of the test statistic is a way of saying how extreme that statistic is for our sample data. The smaller the p-value, the more unlikely the observed sample. Statistical Significance To determine if an observed outcome is statistically significant, we compare the values of alpha and the p - value. There are two possibilities that emerge: The p-value is less than or equal to alpha. In this case we reject the null hypothesis. When this happens we say that the result is statistically significant. In other words, we are reasonably sure that there is something besides chance alone that gave us an observed sample. The p-value is greater than alpha. In this case we fail to reject the null hypothesis. When this happens we say that the result is not statistically significant. In other words, we are reasonably sure that our observed data can be explained by chance alone. The implication of the above is that the smaller the value of alpha is, the more difficult it is to claim that a result is statistically significant. On the other hand, the larger the value of alpha is the easier is it to claim that a result is statistically significant. Coupled with this, however, is the higher probability that what we observed can be attributed to chance. What Level of Alpha Determines Statistical Significance Not all results of hypothesis tests are equal. A hypothesis test or test of statistical significance typically has a level of significance attached to it. This level of significance is a number that is typically denoted with the Greek letter alpha. One question that comes up in statistics class is, What value of alpha should be used for our hypothesis tests? The answer to this question, as with many other questions in statistics is, It depends on the situation. We will explore what we mean by this. Many journals throughout different disciplines define that statistically significant

5 results are those for which alpha is equal to 0.05 or 5%. But the main point to note is that there is not a universal value of alpha that should be used for all statistical tests. Commonly Used Values Levels of Significance The number represented by alpha is a probability, so it can take a value of any nonnegative real number less than one. Although in theory any number between 0 and 1 can be used for alpha, when it comes to statistical practice this is not the case. Of all levels of significance the values of 0.10, 0.05 and 0.01 are the ones most commonly used for alpha. As we will see, there could be reasons for using values of alpha other than the most commonly used numbers. Level of Significance and Type I Errors One consideration against a one size fits all value for alpha has to do with what this number is the probability of. The level of significance of a hypothesis test is exactly equal to the probability of a Type I error. A Type I error consists of incorrectly rejecting the null hypothesis when the null hypothesis is actually true. The smaller the value of alpha, the less likely it is that we reject a true null hypothesis. There are different instances where it is more acceptable to have a Type I error. A larger value of alpha, even one greater than 0.10 may be appropriate when a smaller value of alpha results in a less desirable outcome. In medical screening for a disease, consider the possibilities of a test that falsely tests positive for a disease with one that falsely tests negative for a disease. A false positive will result in anxiety for our patient, but will lead to other tests that will determine that the verdict of our test was indeed incorrect. A false negative will give our patient the incorrect assumption that he does not have a disease when he in fact does. The result is that the disease will not be treated. Given the choice we would rather have conditions that result in a false positive than a false negative. In this situation we would gladly accept a greater value for alpha if it resulted in a tradeoff of a lower likelihood of a false negative. Level of Significance and P-Values A level of significance is a value that we set to determine statistical significance. This is ends up being the standard by which we measure the calculated p-value of our test statistic. To say that a result is statistically significant at the level alpha just means that the p-value is less than alpha. For instance, for a value of alpha = 0.05, if the p-value is greater than 0.05, then we fail to reject the null hypothesis. There are some instances in which we would need a very small p-value to reject a null hypothesis. If our null hypothesis concerns something that is widely accepted as true, then there must be a high degree of evidence in

6 favor of rejecting the null hypothesis. This is provided by a p-value that is much smaller than the commonly used values for alpha. Conclusion There is not one value of alpha that determines statistical significance. Although numbers such as 0.10, 0.05 and 0.01 are values commonly used for alpha, there is no overriding mathematical theorem that says these are the only levels of significance that we can use. As with many things in statistics we must think before we calculate and above all use common sense. What is a P-Value? Hypothesis tests or test of significance involve the calculation of a number known as a p-value. This number is very important to the conclusion of our test. P-values are related to the test statistic and give us a measurement of evidence against the null hypothesis. Null and Alternative Hypotheses Tests of statistical significance all begin with a null and an alternative hypothesis. The null hypothesis is the statement of no effect or a statement of commonly accepted state of affairs. The alternative hypothesis is what we are attempting to prove. The working assumption in a hypothesis test is that the null hypothesis is true. Test Statistic We will assume that the conditions are met for the particular test that we are working with. A simple random sample gives us sample data. From this data we can calculate a test statistic. Test statistics vary greatly depending upon what parameters our hypothesis test concerns. Some common test statistics include: z - statistic for hypothesis tests concerning the population mean, when we know the population standard deviation. t - statistic for hypothesis tests concerning the population mean, when we do not know the population standard deviation. t - statistic for hypothesis tests concerning the difference of two independent population mean, when we do not know the standard deviation of either of the two populations. z - statistic for hypothesis tests concerning a population proportion. Chi-square - statistic for hypothesis tests concerning the difference between an expected and actual count for categorical data. Calculation of P-Values Test statistics are helpful, but it can be more helpful to assign a p-value to these statistics. A p-value is the probability that, if the null hypothesis were true, we would observe a statistic at least as extreme as the one observed. To calculate a p-value we use the appropriate software or statistical table that corresponds with our test statistic.

7 For example, we would use a standard normal distribution when calculating a z test statistic. Values of z with large absolute values (such as those over 2.5) are not very common and would give a small p- value. Values of z that are closer to zero are more common, and would give much larger p-values. Interpretation of the P-Value As we have noted, a p-value is a probability. This means that it is a real number from 0 and 1. While a test statistic is one way to measure how extreme a statistic is for a particular sample, p-values are another way of measuring this. When we obtain a statistical given sample, the question that we should always is, Is this sample the way it is by chance alone with a true null hypothesis, or is the null hypothesis false? If our p-value is small, then this could mean one of two things: The null hypothesis is true, but we were just very lucky in obtaining our observed sample. Our sample is the way it is due to the fact that the null hypothesis is false. In general, the smaller the p-value, the more evidence that we have against our null hypothesis. How Small Is Small Enough? How small of a p-value do we need in order to reject the null hypothesis? The answer to this is, It depends. A common rule of thumb is that the p-value must be less than or equal to 0.05, but there is nothing universal about this value. Typically, before we conduct a hypothesis test, we choose a threshold value. If we have any p-value that is less than or equal to this threshold, then we reject the null hypothesis. Otherwise we fail to reject the null hypothesis. This threshold is called the level of significance of our hypothesis test, and is denoted by the Greek letter alpha. There is no value of alpha that always defines statistical significance. How to Construct a Confidence Interval for the Population Variance One of the goals of inferential statistics is to estimate an unknown population parameter from a statistical sample. The estimate that we obtain is an interval of potential values, and is called a confidence interval. Attached to the interval is a level of confidence, indicating the reliability of our estimate. One parameter that we may want to estimate is the variance. The variance is a measurement of variability, or in other words, how spread out a data set is. We will see the steps and the theory behind the construction of a confidence interval for a population variance.

8 Assumptions It is always a good idea to clearly state what assumptions we need to make in order move forward. We assume that we are working with simple random sample of size n from a normal distribution. Or we assume that our sample size is large enough that we can invoke the central limit theorem. Chi-Square Random Variable If there is any variability whatsoever in a random variable, then the variance is always nonnegative. Due to this fact, the population variance is not distributed normally. Using some mathematical theory from mathematical statistics, given our assumptions the following is a chi-square random variable with n - 1 degrees of freedom. (n - 1)s 2 / σ 2 Here s 2 is the sample variance and σ 2 is the population variance. Confidence Interval For a two-sided 1 - α confidence interval, we locate the row that corresponds with our number of degrees of freedom. Next we read two numbers from this row. The first, denoted by A is the table value with probability α/2 to the left. The second table value, denoted by B is the table value with α/2 to the right. This means that 1- α is of our chi-square distribution is between these two numbers. This gives us: A < (n - 1)s 2 / σ 2 < B Since we want an interval for σ 2 we rearrange our inequality: A /[ (n - 1)s 2 ] < 1 / σ 2 < B / [ (n - 1)s 2 ] This gives us the following confidence interval: [ (n - 1)s 2 ] / B < σ 2 < [ (n - 1)s 2 ] / A. Note on Symmetry Many other confidence intervals are of the form estimate +/- margin of error. These confidence intervals, such as those for a population mean, are symmetric about the estimate that is used. Confidence intervals for the variance do not have this property. Variances are always nonnegative, and a chisquare distribution is too. Furthermore, a chi-square distribution is not symmetric.

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test

More information

Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7. THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

More information

Hypothesis Testing: Two Means, Paired Data, Two Proportions

Hypothesis Testing: Two Means, Paired Data, Two Proportions Chapter 10 Hypothesis Testing: Two Means, Paired Data, Two Proportions 10.1 Hypothesis Testing: Two Population Means and Two Population Proportions 1 10.1.1 Student Learning Objectives By the end of this

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

Lesson 9 Hypothesis Testing

Lesson 9 Hypothesis Testing Lesson 9 Hypothesis Testing Outline Logic for Hypothesis Testing Critical Value Alpha (α) -level.05 -level.01 One-Tail versus Two-Tail Tests -critical values for both alpha levels Logic for Hypothesis

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

Name: Date: Use the following to answer questions 3-4:

Name: Date: Use the following to answer questions 3-4: Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so: Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a

More information

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTING FOR GOODNESS OF FIT CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394 BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394 1. Does vigorous exercise affect concentration? In general, the time needed for people to complete

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

The Chi-Square Test. STAT E-50 Introduction to Statistics

The Chi-Square Test. STAT E-50 Introduction to Statistics STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Name: (b) Find the minimum sample size you should use in order for your estimate to be within 0.03 of p when the confidence level is 95%.

Name: (b) Find the minimum sample size you should use in order for your estimate to be within 0.03 of p when the confidence level is 95%. Chapter 7-8 Exam Name: Answer the questions in the spaces provided. If you run out of room, show your work on a separate paper clearly numbered and attached to this exam. Please indicate which program

More information

Crosstabulation & Chi Square

Crosstabulation & Chi Square Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

More information

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Chi-square test Fisher s Exact test

Chi-square test Fisher s Exact test Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

More information

November 08, 2010. 155S8.6_3 Testing a Claim About a Standard Deviation or Variance

November 08, 2010. 155S8.6_3 Testing a Claim About a Standard Deviation or Variance Chapter 8 Hypothesis Testing 8 1 Review and Preview 8 2 Basics of Hypothesis Testing 8 3 Testing a Claim about a Proportion 8 4 Testing a Claim About a Mean: σ Known 8 5 Testing a Claim About a Mean: σ

More information

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

More information

p ˆ (sample mean and sample

p ˆ (sample mean and sample Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics

More information

8. THE NORMAL DISTRIBUTION

8. THE NORMAL DISTRIBUTION 8. THE NORMAL DISTRIBUTION The normal distribution with mean μ and variance σ 2 has the following density function: The normal distribution is sometimes called a Gaussian Distribution, after its inventor,

More information

Section 12 Part 2. Chi-square test

Section 12 Part 2. Chi-square test Section 12 Part 2 Chi-square test McNemar s Test Section 12 Part 2 Overview Section 12, Part 1 covered two inference methods for categorical data from 2 groups Confidence Intervals for the difference of

More information

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

12.5: CHI-SQUARE GOODNESS OF FIT TESTS 125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

More information

5.1 Radical Notation and Rational Exponents

5.1 Radical Notation and Rational Exponents Section 5.1 Radical Notation and Rational Exponents 1 5.1 Radical Notation and Rational Exponents We now review how exponents can be used to describe not only powers (such as 5 2 and 2 3 ), but also roots

More information

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS The Islamic University of Gaza Faculty of Commerce Department of Economics and Political Sciences An Introduction to Statistics Course (ECOE 130) Spring Semester 011 Chapter 10- TWO-SAMPLE TESTS Practice

More information

Mind on Statistics. Chapter 12

Mind on Statistics. Chapter 12 Mind on Statistics Chapter 12 Sections 12.1 Questions 1 to 6: For each statement, determine if the statement is a typical null hypothesis (H 0 ) or alternative hypothesis (H a ). 1. There is no difference

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

3. Mathematical Induction

3. Mathematical Induction 3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

More information

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for One Standard Deviation Using Standard Deviation Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

Module 2 Probability and Statistics

Module 2 Probability and Statistics Module 2 Probability and Statistics BASIC CONCEPTS Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The standard deviation of a standard normal distribution

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Confidence intervals

Confidence intervals Confidence intervals Today, we re going to start talking about confidence intervals. We use confidence intervals as a tool in inferential statistics. What this means is that given some sample statistics,

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

Business Statistics, 9e (Groebner/Shannon/Fry) Chapter 9 Introduction to Hypothesis Testing

Business Statistics, 9e (Groebner/Shannon/Fry) Chapter 9 Introduction to Hypothesis Testing Business Statistics, 9e (Groebner/Shannon/Fry) Chapter 9 Introduction to Hypothesis Testing 1) Hypothesis testing and confidence interval estimation are essentially two totally different statistical procedures

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

1 The Brownian bridge construction

1 The Brownian bridge construction The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof

More information

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Jan Krhovják Basic Idea Behind the Statistical Tests Generated random sequences properties as sample drawn from uniform/rectangular

More information

ELEMENTARY STATISTICS

ELEMENTARY STATISTICS ELEMENTARY STATISTICS Study Guide Dr. Shinemin Lin Table of Contents 1. Introduction to Statistics. Descriptive Statistics 3. Probabilities and Standard Normal Distribution 4. Estimates and Sample Sizes

More information

Point Biserial Correlation Tests

Point Biserial Correlation Tests Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

More information

MATH 140 Lab 4: Probability and the Standard Normal Distribution

MATH 140 Lab 4: Probability and the Standard Normal Distribution MATH 140 Lab 4: Probability and the Standard Normal Distribution Problem 1. Flipping a Coin Problem In this problem, we want to simualte the process of flipping a fair coin 1000 times. Note that the outcomes

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Likelihood: Frequentist vs Bayesian Reasoning

Likelihood: Frequentist vs Bayesian Reasoning "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

6: Introduction to Hypothesis Testing

6: Introduction to Hypothesis Testing 6: Introduction to Hypothesis Testing Significance testing is used to help make a judgment about a claim by addressing the question, Can the observed difference be attributed to chance? We break up significance

More information

1 Error in Euler s Method

1 Error in Euler s Method 1 Error in Euler s Method Experience with Euler s 1 method raises some interesting questions about numerical approximations for the solutions of differential equations. 1. What determines the amount of

More information

Power Analysis for Correlation & Multiple Regression

Power Analysis for Correlation & Multiple Regression Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful types of power analyses Simple correlations Full

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Mathematical Induction

Mathematical Induction Mathematical Induction In logic, we often want to prove that every member of an infinite set has some feature. E.g., we would like to show: N 1 : is a number 1 : has the feature Φ ( x)(n 1 x! 1 x) How

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

Domain of a Composition

Domain of a Composition Domain of a Composition Definition Given the function f and g, the composition of f with g is a function defined as (f g)() f(g()). The domain of f g is the set of all real numbers in the domain of g such

More information

Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

More information

Comparing the Means of Two Populations: Independent Samples

Comparing the Means of Two Populations: Independent Samples CHAPTER 14 Comparing the Means of Two Populations: Independent Samples 14.1 From One Mu to Two Do children in phonics-based reading programs become better readers than children in whole language programs?

More information

NPV Versus IRR. W.L. Silber -1000 0 0 +300 +600 +900. We know that if the cost of capital is 18 percent we reject the project because the NPV

NPV Versus IRR. W.L. Silber -1000 0 0 +300 +600 +900. We know that if the cost of capital is 18 percent we reject the project because the NPV NPV Versus IRR W.L. Silber I. Our favorite project A has the following cash flows: -1 + +6 +9 1 2 We know that if the cost of capital is 18 percent we reject the project because the net present value is

More information

Notes on the Negative Binomial Distribution

Notes on the Negative Binomial Distribution Notes on the Negative Binomial Distribution John D. Cook October 28, 2009 Abstract These notes give several properties of the negative binomial distribution. 1. Parameterizations 2. The connection between

More information

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia

More information

In the past, the increase in the price of gasoline could be attributed to major national or global

In the past, the increase in the price of gasoline could be attributed to major national or global Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null

More information

Testing Hypotheses About Proportions

Testing Hypotheses About Proportions Chapter 11 Testing Hypotheses About Proportions Hypothesis testing method: uses data from a sample to judge whether or not a statement about a population may be true. Steps in Any Hypothesis Test 1. Determine

More information

Chi Square Tests. Chapter 10. 10.1 Introduction

Chi Square Tests. Chapter 10. 10.1 Introduction Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Mind on Statistics. Chapter 4

Mind on Statistics. Chapter 4 Mind on Statistics Chapter 4 Sections 4.1 Questions 1 to 4: The table below shows the counts by gender and highest degree attained for 498 respondents in the General Social Survey. Highest Degree Gender

More information

Probability Distributions

Probability Distributions CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution

More information

Confidence Intervals for Cpk

Confidence Intervals for Cpk Chapter 297 Confidence Intervals for Cpk Introduction This routine calculates the sample size needed to obtain a specified width of a Cpk confidence interval at a stated confidence level. Cpk is a process

More information

12: Analysis of Variance. Introduction

12: Analysis of Variance. Introduction 1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

AP CALCULUS AB 2007 SCORING GUIDELINES (Form B)

AP CALCULUS AB 2007 SCORING GUIDELINES (Form B) AP CALCULUS AB 2007 SCORING GUIDELINES (Form B) Question 4 Let f be a function defined on the closed interval 5 x 5 with f ( 1) = 3. The graph of f, the derivative of f, consists of two semicircles and

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

AP Statistics 2010 Scoring Guidelines

AP Statistics 2010 Scoring Guidelines AP Statistics 2010 Scoring Guidelines The College Board The College Board is a not-for-profit membership association whose mission is to connect students to college success and opportunity. Founded in

More information

Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.

Chapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data. Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Binary Diagnostic Tests Two Independent Samples

Binary Diagnostic Tests Two Independent Samples Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2 Proofs Intuitively, the concept of proof should already be familiar We all like to assert things, and few of us

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory LA-UR-12-24572 Approved for public release; distribution is unlimited Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory Alicia Garcia-Lopez Steven R. Booth September 2012

More information

AP STATISTICS (Warm-Up Exercises)

AP STATISTICS (Warm-Up Exercises) AP STATISTICS (Warm-Up Exercises) 1. Describe the distribution of ages in a city: 2. Graph a box plot on your calculator for the following test scores: {90, 80, 96, 54, 80, 95, 100, 75, 87, 62, 65, 85,

More information

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit? ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

More information

Lesson 20. Probability and Cumulative Distribution Functions

Lesson 20. Probability and Cumulative Distribution Functions Lesson 20 Probability and Cumulative Distribution Functions Recall If p(x) is a density function for some characteristic of a population, then Recall If p(x) is a density function for some characteristic

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information