Using Stata for One Sample Tests



Similar documents
In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%

Syntax Menu Description Options Remarks and examples Stored results Methods and formulas References Also see. level(#) , options2

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Hypothesis testing - Steps

Normal distribution. ) 2 /2σ. 2π σ

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Introduction to Hypothesis Testing

Name: Date: Use the following to answer questions 3-4:

Estimation of σ 2, the variance of ɛ

Using Stata for Categorical Data Analysis

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

How To Test For Significance On A Data Set

Standard Deviation Estimator

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Confidence Intervals for One Standard Deviation Using Standard Deviation

Confidence Intervals for Cp

Two Related Samples t Test

Confidence Intervals

HYPOTHESIS TESTING WITH SPSS:

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Chapter 2 Probability Topics SPSS T tests

Independent t- Test (Comparing Two Means)

MULTIPLE REGRESSION EXAMPLE

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

3.4 Statistical inference for 2 populations based on two samples

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for Cpk

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Tests for Two Proportions

NCSS Statistical Software. One-Sample T-Test

HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Chapter 2. Hypothesis testing in one population

CALCULATIONS & STATISTICS

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Testing a claim about a population mean

One-Way Analysis of Variance

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Comparing Means in Two Populations

Pearson's Correlation Tests

Non-Inferiority Tests for Two Proportions

Unit 26 Estimation with Confidence Intervals

22. HYPOTHESIS TESTING

Sample Size Calculation for Longitudinal Studies

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

2 Precision-based sample size calculations

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Practice problems for Homework 12 - confidence intervals and hypothesis testing. Open the Homework Assignment 12 and solve the problems.

Hypothesis Test for Mean Using Given Data (Standard Deviation Known-z-test)

NCSS Statistical Software

Two-sample hypothesis testing, II /16/2004

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

4. Continuous Random Variables, the Pareto and Normal Distributions

Chapter 7 Section 1 Homework Set A

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

The normal approximation to the binomial

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Tests for One Proportion

Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Confidence Intervals for Exponential Reliability

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

Analysis of Variance ANOVA

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Hypothesis Testing. Steps for a hypothesis test:

Basic Statistical and Modeling Procedures Using SAS

Module 2 Probability and Statistics

Guide to Microsoft Excel for calculations, statistics, and plotting data

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

The normal approximation to the binomial

Introduction. Statistics Toolbox

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Two Correlated Proportions (McNemar Test)

Lecture Notes Module 1

Chapter 7 Section 7.1: Inference for the Mean of a Population

Hypothesis Testing --- One Mean

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Confidence intervals

Business Statistics, 9e (Groebner/Shannon/Fry) Chapter 9 Introduction to Hypothesis Testing

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

Statistiek I. Proportions aka Sign Tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen.

Binary Diagnostic Tests Two Independent Samples

Need for Sampling. Very large populations Destructive testing Continuous production process

Principles of Hypothesis Testing for Public Health

Normal distributions in SPSS

ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS

2 Sample t-test (unequal sample sizes and unequal variances)

Transcription:

Using Stata for One Sample Tests All of the one sample problems we have discussed so far can be solved in Stata via either (a) statistical calculator functions, where you provide Stata with the necessary summary statistics for means, standard deviations, and sample sizes; these commands end with an i, where the i stands for immediate (but other commands also sometimes end with an i) (b) modules that directly analyze raw data; or (c) both. Some of these solutions require, or may be easier to solve, if you first add the Stataquest menus and commands; see http://www.stata.com/support/faqs/res/quest7.html The commands shown below can all be generated via Stata s pulldown menus if you prefer to use them. A. Single Sample Tests Case I: Sampling distribution of X, Normal parent population (i.e. X is normally distributed), σ is known. Problem. A manufacture of steel rods considers that the manufacturing process is working properly if the mean length of the rods is 8.6. The standard deviation of these rods always runs about 0.3 inches. Suppose a random sample of size n = 36 yields an average length of 8.7 inches. Should the manufacturer conclude the process is working properly or improperly? Use the.05 level of significance. Solution. I don t know of any Stata routine that will do this by directly analyzing raw data. However, the ztesti command (which is installed with Stataquest) will do this when you have the summary statistics. Enter ztesti 36 8.7.3 8.6, level(95) where the 5 parameters are the sample size N, the sample mean, the known population standard deviation, the hypothesized mean (under H 0 ), and the desired CI level (e.g. 95 for 95% confidence interval, 99 for 99% c.i.) The Stata results (which match up perfectly with our earlier analysis) are. ztesti 36 8.7.3 8.6, level(95) Number of obs = 36 Variable Mean Std. Err. z P> z [95% Conf. Interval] x 8.7.05 174 0.0000 8.602002 8.797998 Ho: mean(x) = 8.6 Ha: mean < 8.6 Ha: mean ~= 8.6 Ha: mean > 8.6 z = 2.0000 z = 2.0000 z = 2.0000 P < z = 0.9772 P > z = 0.0455 P > z = 0.0228 Using Stata for one sample tests Page 1

As is typical with many Stata commands, the output gives you the probability levels for both possible 1-tailed alternatives as well as for the 2-tailed alternative. In this case, the 2-tailed probability is.0455, which is less than.05, so we reject the null. Also, the 95% confidence interval does not include the hypothesized value of 8.6, so reject the null. B. Case II: Sampling distribution for the binomial parameter p. Problem. The mayor contends that 25% of the city s employees are black. Various left-wing and right-wing critics have claimed that the mayor is either exaggerating or understating the number of black employees. A random sample of 120 employees contains 18 blacks. Test the mayor s claim at the.01 level of significance. Exact Solution. We ve shown how to get approximate solutions using the normal approximation to the binomial. However, with Stata, there is no need to rely on an approximation, as the bitesti and bitest commands can give you the exact answer, i.e. Stata is smart enough to work with the binomial distribution directly. Using the statistical calculator function bitesti, the format is bitesti 120 18.25 where the parameters are the number of trials, the observed number of successes, and the predicted probability of success. Stata gives you. bitesti 120 18.25 N Observed k Expected k Assumed p Observed p ------------------------------------------------------------ 120 18 30 0.25000 0.15000 Pr(k >= 18) = 0.997208 (one-sided test) Pr(k <= 18) = 0.005645 (one-sided test) Pr(k <= 18 or k >= 43) = 0.011020 (two-sided test) For a two-tailed test, this result is significant at the.011 level. Since that is slightly more than.01, you stick with the null (barely). If the alternative hypotheses were H 0 : p <.25, you would reject the null using the.01 level of significance, since.005645 is less than.01. To get the default Clopper-Pearson (aka exact ) 99% confidence interval, use the cii command: cii 120 18, level(99) where the parameters are the number of trials and the number of successes. The level parameter indicates that you want the 99% c.i.; the default is the 95% c.i.. cii 120 18, level(99) -- Binomial Exact -- 120.15.032596.0771953.2517566 Again, the mayor s claim of.25 falls within the interval, so the null is not rejected. Using Stata for one sample tests Page 2

While widely used, the Clopper-Pearson CI is criticized by some for being too conservative, i.e. it can produces confidence intervals that are very wide. Many therefore prefer Wilson s Confidence Interval or one of the other options (Jeffries, Agresti-Coull) that Stata offers. If you want Wilson s C.I., add the wilson parameter:. cii 120 18, level(99) wilson ------ Wilson ------ 120.15.032596.0845733.2521024 If you had the raw data, you would have 102 cases coded 0 on a variable called black (meaning 102 employees were not black), and 18 cases coded 1 on the variable black (meaning 18 employees were black.). The bitest command analyzes the raw data:. bitest black=.25 Variable N Observed k Expected k Assumed p Observed p -------------+------------------------------------------------------------ black 120 18 30 0.25000 0.15000 Pr(k >= 18) = 0.997208 (one-sided test) Pr(k <= 18) = 0.005645 (one-sided test) Pr(k <= 18 or k >= 43) = 0.011020 (two-sided test) The parameters are variable name = hypothesized probability of success. Likewise, to get the socalled exact (Clopper-Pearson) confidence interval, use the ci command with the binomial parameter:. ci black, binomial level(99) -- Binomial Exact -- black 120.15.032596.0771953.2517566 To get the Wilson interval instead,. ci black, binomial level(99) wilson ------ Wilson ------ black 120.15.032596.0845733.2521024 Note that we get identical results whether we use the immediate or raw data versions of the commands. Using Stata for one sample tests Page 3

Normal Approximation to the Binomial Solution. You can also get the normal approximation to the binomial using the prtesti and prtest commands. The prtesti format is prtesti 120 18.25, count level(99) where the parameters are N (the number of trials), the observed number of successes, and the predicted probability of success. If you didn t include the count parameter, you would say.15 instead of 18, i.e. you d give the observed probability of success rather than the observed number of successes. The confidence interval is only approximate (and in this case wrong, because it does not include.25). The output is. prtesti 120 18.25, count level(99) One-sample test of proportion x: Number of obs = 120 Variable Mean Std. Err. [99% Conf. Interval] - x.15.032596.0660382.2339618 Ho: proportion(x) =.25 Ha: x <.25 Ha: x!=.25 Ha: x >.25 z = -2.530 z = -2.530 z = -2.530 P < z = 0.0057 P > z = 0.0114 P > z = 0.9943 Note that Stata does not do the correction for continuity by default. The bintesti command with the normal option, which comes with Stataquest, lets you make the correction by hand, i.e. you can add or subtract.5 from the observed number of successes as is appropriate.. bintesti 120 18.5.25, normal level(99) Variable Obs Proportion Std. Error ---------+---------------------------------- x 120.1541667.0329645 Ho: p = z = -2.42 Pr > z = 0.0153 99% CI = (0.0693,0.2391) Incidentally, in this case, you actually come slightly closer to the correct result by NOT making the correction for continuity. After having tried several problems, my impression (possibly wrong) is that, when p 0 is close to.5, the correction improves accuracy, but as p 0 gets further and further away from.5 it may actually do more harm than good. The moral is that, if it is a very close call, you probably want to get the exact solution rather than rely on the normal approximation. Indeed, the Stata documentation for prtest says Researchers are advised to use bitest when possible, especially for small samples. With raw data, to get the normal approximation (without correction for continuity I m not sure how you would do the correction for continuity using raw data), use the prtest command with parameters varname=predicted probability of success. Using Stata for one sample tests Page 4

. prtest black =.25, level(99) One-sample test of proportion black: Number of obs = 120 Variable Mean Std. Err. [99% Conf. Interval] - black.15.032596.0660382.2339618 Ho: proportion(black) =.25 Ha: black <.25 Ha: black!=.25 Ha: black >.25 z = -2.530 z = -2.530 z = -2.530 P < z = 0.0057 P > z = 0.0114 P > z = 0.9943 C. Case III: Sampling distribution of X, normal parent population, σ unknown. Problem. The Deans contend that the average graduate student makes $8,000 a year. Zealous administration budget cutters contend that the students are being paid more than that, while the Graduate Student Union contends that the figure is less. A random sample of 6 students has an average income (measured in thousands of dollars) of 6.5 and a sample variance of 2. Using both confidence intervals and significance tests, test the Deans claim at the.10 and.02 levels of significance. Solution. Use the ttesti or the ttest command. If you just have the summary statistics and you want the 90% confidence interval, enter the command ttesti 6 6.5 1.414213562 8, level(90) The parameters are N, the sample mean, the sample sd, the predicted mean, and the CI level. The results are. ttesti 6 6.5 1.414213562 8, level(90) Obs Mean Std. Err. Std. Dev. [90% Conf. Interval] x 6 6.5.5773503 1.414214 5.336611 7.663389 Ho: mean(x) = 8 P < t = 0.0242 P > t = 0.0484 P > t = 0.9758 Using Stata for one sample tests Page 5

To get the 98% c.i., just change the level parameter:. ttesti 6 6.5 1.414213562 8, level(98) Obs Mean Std. Err. Std. Dev. [98% Conf. Interval] x 6 6.5.5773503 1.414214 4.557257 8.442743 Ho: mean(x) = 8 P < t = 0.0242 P > t = 0.0484 P > t = 0.97588 If you just want the confidence interval, use the cii command, where the parameters are N, the sample mean, the sample sd, and the CI level.. cii 6 6.5 1.414213562, level(90) Variable Obs Mean Std. Err. [90% Conf. Interval] 6 6.5.5773503 5.336611 7.663389 If you are analyzing the original raw data, use the ttest command. In this example, pay is the variable containing salary information, and 8 is the hypothesized mean of pay. The level(90) parameter gives us the 90% confidence interval.. ttest pay = 8, level(90) Variable Obs Mean Std. Err. Std. Dev. [90% Conf. Interval] pay 6 6.5.5773503 1.414214 5.336611 7.663389 Ho: mean(pay) = 8 P < t = 0.0242 P > t = 0.0484 P > t = 0.9758 Using Stata for one sample tests Page 6

To get the 98% c.i.,. ttest pay = 8, level(98) Variable Obs Mean Std. Err. Std. Dev. [98% Conf. Interval] pay 6 6.5.5773503 1.414214 4.557257 8.442743 Ho: mean(pay) = 8 P < t = 0.0242 P > t = 0.0484 P > t = 0.9758 If you just wanted to get the 90% confidence interval without the t-test, use the ci command specifying whatever level you want:. ci pay, level(90) Variable Obs Mean Std. Err. [90% Conf. Interval] pay 6 6.5.5773503 5.336611 7.663389 For the 98% c.i.,. ci pay, level(98) Variable Obs Mean Std. Err. [98% Conf. Interval] pay 6 6.5.5773503 4.557257 8.442743 As before, the output from ttest and ttesti is pretty much identical; if you did see any differences, it would be because of rounding error in the summary statistics. Using Stata for one sample tests Page 7