One-sample inference: Categorical Data

Size: px
Start display at page:

Download "One-sample inference: Categorical Data"

Transcription

1 One-sample inference: Categorical Data October 8

2 One-sample vs. two-sample studies A common research design is to obtain two groups of people and look for differences between them We will learn how to analyze these types of two-group, or two-sample studies in a few weeks We are going to start, however, with a simpler case: the one-sample study

3 One-sample inference For example, a researcher collects a random sample of individuals, measures their heights, and wants to make a generalization about the heights in the population Or a researcher collects a random sample of individuals, determines whether or not they smoke, and wants to make inferences about the percentage of the population that smokes These are examples of one-sample inference problems the first involving continuous data, the second involving categorical data

4 One-sample inference: categorical data Today s topic is inference for one-sample categorical data The object of such inference is percentages: What percent of patients survive surgery? What percent of women develop breast cancer? What percent of people who do better on one therapy than another? Investigators see one percentage in their sample, but what does that tell them about the population percentage? In short, how accurate are percentages?

5 The normal approximation Approximate approach Exact approach The big picture A percentage is a kind of average the average number of times an event occurs per opportunity Thus, one approach is to use the central limit theorem, which tells us that: The expected value of the sample percentage is the population percentage The standard error of the sample average is equal to the population standard deviation divided by the square root of n The shape of the sampling distribution is approximately normal (how accurate this is depends on n)

6 Approximate approach Exact approach The big picture The normal approximation (cont d) Statisticians often use p to represent the population proportion, and ˆp to represent the sample proportion Thus, if we observe ˆp in our sample, the central limit theorem suggests that ˆp is a good estimate of p If ˆp is a good estimate of the population percentage, then it follows that ˆp(1 ˆp) is a good estimate of the population standard deviation Continuing, a good estimate for the SE is ˆp(1 ˆp) SE = n

7 Approximate approach Exact approach The big picture The probability that p and ˆp are close If the probability that ˆp is within 1 standard error of p is 68%, what is the probability that p is within 1 standard error of ˆp? Also 68%; it s the same thing, just worded differently Therefore, if p plus or minus 1.96 standard errors has a 95% chance of containing ˆp, then ˆp plus or minus 1.96 standard errors has a 95% chance of containing p

8 The form of confidence intervals Approximate approach Exact approach The big picture Thus, x% confidence intervals look like: (ˆp z x% SE, ˆp + z x% SE) where z x% contains the middle x% of the standard normal distribution For 95% confidence intervals, then, z is always 1.96

9 Approximate approach Exact approach The big picture Procedure for finding confidence intervals To sum up, the central limit theorem tells us that we can create x% confidence intervals by: #1 Calculate the standard error: SE = ˆp(1 ˆp)/n #2 Determine the values of the normal distribution that contain the middle x% of the data; denote these values ±z x% #3 Calculate the confidence interval: (ˆp z x% SE, ˆp + z x% SE)

10 Approximate approach Exact approach The big picture Example: Survival of premature infants In order to estimate the survival chances of infants born prematurely, researchers at Johns Hopkins surveyed the records of all premature babies born at their hospital in a three-year period They found 39 babies who were born at 25 weeks gestation, 31 of which survived at least 6 months Their best estimate (point estimate) is that 31/39 = 79.5% of all babies (in other hospitals, in future years) born at 25 weeks gestation would survive at least 6 months, but how accurate is that percentage?

11 Approximate approach Exact approach The big picture Example: Survival of premature infants (cont d) The standard error of the percentage is.795(1.795) SE = 39 = So, one way of expressing the accuracy of the estimated percentage is: 79.5% ± 6.5% (this would be about a 68% confidence interval) Another way wold be to calculate the 95% confidence interval: ( (6.47), (6.47)) = (66.8%, 92.2%)

12 Approximate approach Exact approach The big picture Problems with the normal approximation That approach works pretty well, but if you think about it, the distribution our data isn t normal it s binomial The normal approximation works because the binomial distribution looks a lot like the normal distribution when n is large and p isn t close to 0 or 1 Other times, the normal approximation doesn t work as well n=39, p=0.8 n=15, p=0.95 Probability Probability

13 Approximate approach Exact approach The big picture Example: Survival of premature infants, part II In their study, the Johns Hopkins researchers also found 29 infants born at 22 weeks gestation, none of which survived 6 months The normal approximation is clearly not going to work here, for two reasons: The estimated standard deviation will be 0 Even if it wasn t, the confidence interval will be symmetric about 0, so half of it would be negative

14 Approximate approach Exact approach The big picture Using the binomial distribution directly But why settle for an approximation? The number of infants who survive is going to follow a binomial distribution; why not use that directly? It seems pretty obvious that the lower limit of our confidence interval should be 0, but how can we use the binomial distribution to find an upper limit? The upper limit should be a number p such that there would only be a 2.5% probability of observing 0 infants who survive if the probability of surviving really were p

15 Finding the upper limit for p Approximate approach Exact approach The big picture P(0 out of 29 infants survive) p

16 Exact confidence intervals Approximate approach Exact approach The big picture Thus, the exact confidence interval for the population percentage of infants who survive after being born at 22 weeks is (0%,11.9%) The exact confidence interval for the population percentage of infants who survive after being born at 25 weeks is (63.5%,90.7%) Recall that our approximate confidence interval for the population percentage of infants who survive after being born at 25 weeks was (66.8%, 92.2%)

17 Exact vs. approximate intervals Approximate approach Exact approach The big picture When n is large and p isn t close to 0 or 1, it doesn t really matter whether you choose the approximate or the exact approach The advantage of the approximate approach is that it s easy to do by hand In comparison, finding exact confidence intervals by hand is quite time-consuming

18 Approximate approach Exact approach The big picture Exact vs. approximate intervals (cont d) However, we live in an era with computers, which do the work of finding confidence intervals instantly (as we will see in lab) If we can obtain the exact answer easily, there is no reason to settle for the approximate answer That said, in practice, people use and report the approximate approach all the time Possibly, this is because the analyst knew it wouldn t matter, but more likely, it s because the analyst learned the approximate approach in their introductory statistics course and doesn t know any other way to calculate a confidence interval

19 One-sample hypothesis tests Paired samples The sign test The z-test It is relatively rare to have specific hypotheses about population percentages One important exception is the collection of paired samples In a paired sampling design, we collect n pairs of observations and analyze the difference between the pairs

20 Paired samples The sign test The z-test Hypothetical example: A sunblock study Suppose we are conducting a study investigating whether sunblock A is better than sunblock B at preventing sunburns The first design that comes to mind is probably to randomly assign sunblock A to one group and sunblock B to a different group This is nothing wrong with this design, but we can do better

21 Signal and noise Introduction Paired samples The sign test The z-test Generally speaking, our ability to make generalizations about the population depends on two factors: signal and noise Signal is the magnitude of the difference between the two groups in the present context, how much better one sunblock is than the other Noise is the variability present in the outcome from all other sources besides the one you re interested in in the sunblock experiment, this would include factors like how sunny the day was, how much time the person spent outside, how easily the person burns, etc. depend on the ratio of signal to noise how easily we can distinguish the treatment effect from all other sources of variability

22 Signal to noise ratio Introduction Paired samples The sign test The z-test To get a larger signal-to-noise ratio, we must either increase the signal or reduce the variability The signal is usually determined by nature and out of our control Instead, we are going to have to reduce the variability/noise If our sunblock experiment were controlled, we could attempt such steps as forcing all participants to spend an equal amount of time outside, on the same day, in an equally sunny area, etc.

23 Person-to-person variability Paired samples The sign test The z-test But what can be done about person-to-person variability (how easily certain people burn)? A powerful technique for reducing person-to-person variability is pairing For each person, we can apply sunblock A to one of their arms, and sunblock B to the other arm, and as an outcome, look at the difference between the two arms In this experiment, the items that we randomly sample from the population are pairs of arms belonging to the same person

24 Benefits of paired designs Paired samples The sign test The z-test What do we gain from this? As variability goes down, become narrower become more powerful How much narrower? How much more powerful? This depends on the fraction of the total variability that comes from person-to-person variability

25 More examples Introduction Paired samples The sign test The z-test Investigators have come up with all kinds of clever ways to use pairing to cut down on variability: Before-and-after studies Crossover studies Split-plot experiments

26 Pairing in observational studies Paired samples The sign test The z-test Pairing is also widely used in observational studies Twin studies Matched studies In a matched study, the investigator will pair up ( match ) subjects on the basis of variables such as age, sex, or race, then analyze the difference between the pairs In addition to increasing power, pairing in observational studies also eliminates (some of the) potential confounding variables

27 Cystic fibrosis experiment Paired samples The sign test The z-test You may not have known it at the time, but you have already conducted an exact hypothesis test for paired categorical data in your homework Recall our cystic fibrosis experiment in which each patient took both drug and placebo and the reduction in their lung function (measured by FVC) over a 25-week period was recorded This is a crossover study, an example of a paired design

28 The null hypothesis Introduction Paired samples The sign test The z-test The null hypothesis here is that the drug provides no benefit that whether the patient received drug or placebo has no impact on their lung function Under the null hypothesis, then, the probability that a patient does better on drug than placebo (let s call this p) is 50% So, another, more compact and mathematical way of writing the null hypothesis, is p 0 =.5 (statisticians like to use a subscript 0 to denote the null hypothesis)

29 The sign test Introduction Paired samples The sign test The z-test We can test this null hypothesis by using our knowledge that, under the null hypothesis, the number of patients who do better on the drug than placebo (x) will follow a binomial distribution with n = 14 and p = 0.5 This approach to hypothesis testing is called the sign test All we need to do is calculate the p-value (the probability of obtaining results as extreme or more extreme than the one observed in the data, given that the null hypothesis is true)

30 As extreme or more extreme Paired samples The sign test The z-test The result observed in the data was that 11 patients did better on the drug But what exactly is meant by as extreme or more extreme than 11? It is uncontroversial that 11, 12, 13, and 14 are as extreme or more extreme than 11 But what about 0? Is that more extreme than 11? Under the null, P (11) = 2.2%, while P (0) =.006% So 0 is more extreme than 11, but in a different direction

31 One-sided vs. two-sided tests Paired samples The sign test The z-test Potentially, then, we have two different approaches to calculating this p-value: Find the probability that x 11 Find the probability that x 11 x 3 (the number that is as far away from the expected value of 7 as 11 is, but in the other direction) These are both reasonable things to do, and intelligent people have argued both sides of the debate However, the statistical and scientific community has for the most part come down in favor of the latter the so called two-sided test For this class, all of our tests will be two-sided tests

32 The sign test Introduction Paired samples The sign test The z-test Thus, the p-value of the sign test is p = P (x 3) + P (x 11) = P (x = 0) + + P (x = 3) + P (x = 11) + + P (x = 14) =.006% +.09% +.6% + 2.2% + 2.2% +.6% +.09% +.006% = 5.7% One might call this result borderline significant it isn t below.05, but it s close These results suggest that the drug has potential, but with a sample size of only 14, it s hard to say for sure

33 Introduction Paired samples The sign test The z-test Thinking about the sign test, what enabled us to calculate the p-value? How were we able to attach a specific number to the probability that x would take on certain values? We were able to do this because we knew that, under the null, x followed a specific distribution (in that case, the binomial) This is the most common strategy for developing hypothesis tests to calculate from the data a quantity for which we know its distribution under the null hypothesis Note that in general, we would not know the distribution of the number of patients who do better on drug than placebo only under the null hypothesis

34 Test statistics Introduction Paired samples The sign test The z-test This quantity that we know the distribution of under the null hypothesis is called a test statistic Because we can calculate the test statistic from the data, and because we know its distribution under the null hypothesis, we can calculate the probability of obtaining a result as extreme or more extreme than the observed result (the p-value)

35 The z test statistic Introduction Paired samples The sign test The z-test As we did before with confidence intervals, we can use the central limit theorem for this problem, now to create a test statistic From the central limit theorem, we know that z, the number of standard errors away from p that ˆp falls, follows (approximately) a standard normal distribution Our test statistic, then is z = ˆp p 0 SE Having calculated z, we can get p-values from the standard normal distribution This approach to hypothesis testing is called the z-test

36 The standard error Introduction Paired samples The sign test The z-test What about the standard error? Under the null, the population standard deviation is p0 (1 p 0 ), which means that, under the null, SE = p0 (1 p 0 ) n

37 Procedure for a z-test Introduction Paired samples The sign test The z-test The procedure for a z-test is then: #1 Calculate the standard error: SE = p 0 (1 p 0 )/n #2 Calculate the test statistic z = (ˆp p 0 )/SE #3 Calculate the area under the normal curve outside ±z

38 Paired samples The sign test The z-test The z-test for the cystic fibrosis experiment For the cystic fibrosis experiment, p 0 = 0.5 Therefore, p0 (1 p 0 ) SE = n 0.5(0.5) = 14 =.134

39 Paired samples The sign test The z-test The z-test for the cystic fibrosis experiment (cont d) The test statistic is therefore z = ˆp p 0 SE =.134 = 2.14 The p-value of this test is therefore 2(1.6%) = 3.2%

40 can produce hypothesis tests It may not be obvious, but there is a close connection between confidence intervals and hypothesis tests For example, suppose our hypothesis test was to construct a 95% confidence interval and then reject the null hypothesis if p 0 was outside the interval It turns out that this is exactly the same as conducting a hypothesis test with α = 5%

41 can produce confidence intervals Alternatively, suppose we formed a collection of all the values of p 0 for which the p-value of our hypothesis test was above 5% This would form a 95% confidence interval for p Note, then, that there is a correspondence between hypothesis testing at significance level α and confidence intervals with confidence level 1 α It turns out that the z-test corresponds to the approximate interval, and that the sign test corresponds to the exact interval

42 Introduction In general, then, confidence levels and hypothesis tests always lead to the same conclusion This is a good thing it would be confusing otherwise Furthermore, this is not just true of confidence intervals for one-sample categorical data; it is generally true of all confidence intervals and hypothesis tests However, the information provided by each technique is different: the confidence interval is an attempt to estimate a parameter, while the hypothesis test is an attempt to measure the evidence against the hypothesis that the parameter is equal to a certain, specific number

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Week 4: Standard Error and Confidence Intervals

Week 4: Standard Error and Confidence Intervals Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

Point and Interval Estimates

Point and Interval Estimates Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters. Sample Multiple Choice Questions for the material since Midterm 2. Sample questions from Midterms and 2 are also representative of questions that may appear on the final exam.. A randomly selected sample

More information

CHAPTER 14 NONPARAMETRIC TESTS

CHAPTER 14 NONPARAMETRIC TESTS CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1 Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

1.6 The Order of Operations

1.6 The Order of Operations 1.6 The Order of Operations Contents: Operations Grouping Symbols The Order of Operations Exponents and Negative Numbers Negative Square Roots Square Root of a Negative Number Order of Operations and Negative

More information

Introduction to Hypothesis Testing OPRE 6301

Introduction to Hypothesis Testing OPRE 6301 Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

More information

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

More information

Chapter 6: Probability

Chapter 6: Probability Chapter 6: Probability In a more mathematically oriented statistics course, you would spend a lot of time talking about colored balls in urns. We will skip over such detailed examinations of probability,

More information

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015 Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation

More information

Section 12 Part 2. Chi-square test

Section 12 Part 2. Chi-square test Section 12 Part 2 Chi-square test McNemar s Test Section 12 Part 2 Overview Section 12, Part 1 covered two inference methods for categorical data from 2 groups Confidence Intervals for the difference of

More information

Mind on Statistics. Chapter 4

Mind on Statistics. Chapter 4 Mind on Statistics Chapter 4 Sections 4.1 Questions 1 to 4: The table below shows the counts by gender and highest degree attained for 498 respondents in the General Social Survey. Highest Degree Gender

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Mind on Statistics. Chapter 12

Mind on Statistics. Chapter 12 Mind on Statistics Chapter 12 Sections 12.1 Questions 1 to 6: For each statement, determine if the statement is a typical null hypothesis (H 0 ) or alternative hypothesis (H a ). 1. There is no difference

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Characteristics of Binomial Distributions

Characteristics of Binomial Distributions Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Math 251, Review Questions for Test 3 Rough Answers

Math 251, Review Questions for Test 3 Rough Answers Math 251, Review Questions for Test 3 Rough Answers 1. (Review of some terminology from Section 7.1) In a state with 459,341 voters, a poll of 2300 voters finds that 45 percent support the Republican candidate,

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

Stat 5102 Notes: Nonparametric Tests and. confidence interval

Stat 5102 Notes: Nonparametric Tests and. confidence interval Stat 510 Notes: Nonparametric Tests and Confidence Intervals Charles J. Geyer April 13, 003 This handout gives a brief introduction to nonparametrics, which is what you do when you don t believe the assumptions

More information

p ˆ (sample mean and sample

p ˆ (sample mean and sample Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

Online 12 - Sections 9.1 and 9.2-Doug Ensley

Online 12 - Sections 9.1 and 9.2-Doug Ensley Student: Date: Instructor: Doug Ensley Course: MAT117 01 Applied Statistics - Ensley Assignment: Online 12 - Sections 9.1 and 9.2 1. Does a P-value of 0.001 give strong evidence or not especially strong

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

7.6 Approximation Errors and Simpson's Rule

7.6 Approximation Errors and Simpson's Rule WileyPLUS: Home Help Contact us Logout Hughes-Hallett, Calculus: Single and Multivariable, 4/e Calculus I, II, and Vector Calculus Reading content Integration 7.1. Integration by Substitution 7.2. Integration

More information

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science Mondays 2:10 4:00 (GB 220) and Wednesdays 2:10 4:00 (various) Jeffrey Rosenthal Professor of Statistics, University of Toronto

More information

Likelihood: Frequentist vs Bayesian Reasoning

Likelihood: Frequentist vs Bayesian Reasoning "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Elasticity. I. What is Elasticity?

Elasticity. I. What is Elasticity? Elasticity I. What is Elasticity? The purpose of this section is to develop some general rules about elasticity, which may them be applied to the four different specific types of elasticity discussed in

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Chi-square test Fisher s Exact test

Chi-square test Fisher s Exact test Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Name: Date: Use the following to answer questions 3-4:

Name: Date: Use the following to answer questions 3-4: Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

More information

6.3 Conditional Probability and Independence

6.3 Conditional Probability and Independence 222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Chapter 19 Operational Amplifiers

Chapter 19 Operational Amplifiers Chapter 19 Operational Amplifiers The operational amplifier, or op-amp, is a basic building block of modern electronics. Op-amps date back to the early days of vacuum tubes, but they only became common

More information

Clocking In Facebook Hours. A Statistics Project on Who Uses Facebook More Middle School or High School?

Clocking In Facebook Hours. A Statistics Project on Who Uses Facebook More Middle School or High School? Clocking In Facebook Hours A Statistics Project on Who Uses Facebook More Middle School or High School? Mira Mehta and Joanne Chiao May 28 th, 2010 Introduction With Today s technology, adolescents no

More information

Binomial Sampling and the Binomial Distribution

Binomial Sampling and the Binomial Distribution Binomial Sampling and the Binomial Distribution Characterized by two mutually exclusive events." Examples: GENERAL: {success or failure} {on or off} {head or tail} {zero or one} BIOLOGY: {dead or alive}

More information

P (B) In statistics, the Bayes theorem is often used in the following way: P (Data Unknown)P (Unknown) P (Data)

P (B) In statistics, the Bayes theorem is often used in the following way: P (Data Unknown)P (Unknown) P (Data) 22S:101 Biostatistics: J. Huang 1 Bayes Theorem For two events A and B, if we know the conditional probability P (B A) and the probability P (A), then the Bayes theorem tells that we can compute the conditional

More information

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394 BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394 1. Does vigorous exercise affect concentration? In general, the time needed for people to complete

More information

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

More information

Lesson 17: Margin of Error When Estimating a Population Proportion

Lesson 17: Margin of Error When Estimating a Population Proportion Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information

More information

Prospective, retrospective, and cross-sectional studies

Prospective, retrospective, and cross-sectional studies Prospective, retrospective, and cross-sectional studies Patrick Breheny April 3 Patrick Breheny Introduction to Biostatistics (171:161) 1/17 Study designs that can be analyzed with χ 2 -tests One reason

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Sample Practice problems - chapter 12-1 and 2 proportions for inference - Z Distributions Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Chapter 1 Introduction to Correlation

Chapter 1 Introduction to Correlation Chapter 1 Introduction to Correlation Suppose that you woke up one morning and discovered that you had been given the gift of being able to predict the future. Suddenly, you found yourself able to predict,

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

PRACTICE PROBLEMS FOR BIOSTATISTICS

PRACTICE PROBLEMS FOR BIOSTATISTICS PRACTICE PROBLEMS FOR BIOSTATISTICS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION 1. The duration of time from first exposure to HIV infection to AIDS diagnosis is called the incubation period.

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010 MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information