# Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. University at Buffalo The State University of New York

2 Topics 1. Hypothesis Testing 1. t-test for means 2. Chi-Square test of independence 3. Komogorov-Smirnov Test to compare distributions 2. Sampling Methods 1. Random Sampling 2. Stratified Sampling 3. Cluster Sampling 2

3 Motivation If a data mining algorithm generates a potentially interesting hypothesis we want to explore it further Commonly, value of a parameter Is new treatment better than standard one Two variables related in a population Conclusions based on a sample of population 3

4 Classical Hypothesis Testing 1. Define two complementary hypotheses Null hypothesis and alternative hypothesis Often null hypothesis is a point value, e.g., draw conclusions about a parameter θ Null Hypothesis is H 0 : θ = θ 0 Alternative Hypothesis is H 1 : θ θ 0 2. Using data calculate a statistic Which depends on nature of hypotheses Determine expected distribution of chosen statistic Observed value would be one point in this distribution 3. If in tail then either unlikely event or H 0 false More extreme observed value less confidence in null hypothesis

5 Example Problem Hypotheses are mutually exclusive if one is true, other is false Determine whether a coin is fair H 0 : P = 0.5 H a : P 0.5 Flipped coin 50 times: 40 Heads and 10 Tails Inclined to reject H 0 and accept H a Accept/reject decision focuses on single test statistic

6 Test for Comparing two Means Whether population mean differs from hypothesized value Called one-sample t test Common problem Does Your Group Come from a Different Population than the One Specified? One-sample t-test (given sample of one population) Two-sample t-test (two populations)

7 One-sample t test Fix significance level in [0,1], e.g., 0.01, 0.05, 1.0 Degrees of freedom, DF = n - 1 n is no of observations in sample Compute test statistic (t-score) x: sample mean, x: hypothesized mean (H 0 ), s std dev of sample Compute p-value from studentʼs t-distribution reject null hypothesis if p-value < significance level Used when population variances are equal / unequal, and with large or small samples.

8 Test statistic Rejection Region mean score, proportion, t-score, z-score One and Two tail tests Hyp Set Null hyp Alternative hyp No of tails 1 μ = M μ M 2 2 μ > M μ < M 1 3 μ < M μ > M 1 Values outside region of acceptance is region of rejection Equivalent Approaches: p-value and region of acceptance Size of region is significance level 8

9 Power of a Test Compare different test procedures Power of Test Probability it will correctly reject a false null hypothesis (1-β) False Negative Rate Significance of Test Test's probability of incorrectly rejecting the null hypothesis (α) True Negative Rate Accept Null Reject Null Type 1 and Type 2 errors are denoted α and β Null is True 1 α True Positive α True Negative Null is False β False Positive 1-β False Negative

10 Likelihood Ratio Statistic Good strategy to find statistic is to use the Likelihood Ratio Likelihood Ratio Statistic to test hypothesis H 0 :θ = θ 0 H 1 : θ θ 0 is defined as λ = L(θ 0 D) sup ϕ L(ϕ D) where D ={x(1),..,x(n)} i.e., Ratio of likelihood when θ=θ 0 to the largest value of the likelihood when θ is unconstrained Null hypothesis rejected when λ is small Generalizable to when null is not single point

11 Testing for Mean of Normal Given a sample of n points drawn from Normal with unknown mean and unit variance Likelihood under null hypothesis n 1 L(0 x(1),.., x(n)) = p(x(i) 0) = Maximum likelihood estimator is sample mean Ratio simplifies to Rejection region: {λ λ< c} for a suitably chosen c Expression written as exp 1 (x(i) 0)2 2π 2 i=1 n 1 L(µ x(1),.., x(n)) = p(x(i) µ) = exp 1 (x(i) x )2 2π 2 i=1 λ = exp( n(x 0) 2 /2) i x i 2 n lnc Compare sample mean with a constant

12 Types of Tests used Frequently Differences between means Compare variances Compare observed distribution with a hypothesized distribution Called goodness-of-fit test t-test for difference between means of two independent groups 12

13 Two sample t-test Whether two means have the same value x(1),..x(n) drawn from N(µ x,σ 2 ), y(1),..y(n) drawn from N(µ y,σ 2 ) H O : µ x =µ y Likelihood Ratio statistic t = x y s 2 (1/n +1/m) with where s = s x 2 s 2 x = (x x ) 2 /(n 1) t has t-distribution with n+m-2 degrees of freedom Test robust to departures from normal Test is widely used Difference between sample means adjusted by standard deviation of that difference n 1 n + m 2 + s 2 m 1 y n + m 2 weighted sum of sample variances

14 Test for Relationship between Variables Whether distribution of value taken by one variable is independent of value taken by another Chi-squared test Goodness-of-fit test with null hypothesis of independence Two categorical variables x takes values x i, i=1,..,r with probabilities p(x i ) y takes values y j j=1,..,s with probabilities p(y j )

15 Chi-squaredTest for Independence If x and y are independent p(x i,y j )=p(x i )p(y j ) n(x i )/n and n(y i )/n are estimates of probabilities of x taking value x i and y taking value y i If independent estimate of p(x i,y j ) is n(x i )n(y j )/n 2 We expect to find n(x i )n(y j )/n samples in (x i,y j ) cell Number the cells 1 to t where t=r.s E k is expected number in kth cell, O k is observed number Aggregation given by X 2 = k=1,t (E k O k ) 2 If null hyp holds X 2 has χ 2 distrib With (r-1)(s-1) degrees of freedom Found in tables or directly computed E k Chi-squared with k degrees of freedom: Disrib of sum of squares of n values each with std normal dist As n increases density becomes flatter. Special case of Gamma

16 Testing for Independence Example Outcome\Hospital Referral Nonreferral No improvement Partial Improvement Complete Improvement 10 Total for referral= Total with no improvement=90 Overall total=367 Medical data Whether outcome of surgery is independent of hospital type Under independence, top left cell has expected no=82x90/367=20.11 Observed number is 43. Contributes ( ) 2 /20.11 to χ 2 Total value is χ 2 =49.8 Comparing with χ 2 distribution with (3-1)(2-1)=2 degrees of freedom reveals very high degree of significance Suggests that outcome is dependent on hospital type 16

17 Chi-squared Goodness of Fit Test Chi-squared test is more versatile than t-test. Used for categorical distributions Used for testing normal distribution as well E,g. 10 measurements: {x 1, x 2,..., x 10 }. They are supposed to be "normally" distributed with mean µ and standard deviation σ. You could calculate χ 2 = i (x i µ) 2 σ 2 we expect the measurements to deviate from the mean by the standard deviation, so: (x i -µ) is about the same thing as σ. Thus in calculating chi-square we add-up 10 numbers that would be near 1. We expect it to approximately equal k, the number of data points. If chi-square is "a lot" bigger than expected something is wrong. Thus one purpose of chi-square is to compare observed results with expected results and see if the result is likely.

18 Randomization/permutation tests Earlier tests assume random sample drawn, to: Make probability statement about a parameter Make inference about population from sample Consider medical example: Compare treatment and control group H 0 : no effect (distrib of those treated same as those not) Samples may not be drawn independently What difference in sample means would there be if difference is consequence of imbalance of popultns Randomization tests: Allow us to make statements conditioned on input samples 18

19 Distribution-free Tests Other Tests assume form of distribution from which samples are drawn Distribution-free tests replace values by ranks Examples If samples from same distribution: ranks well-mixed If mean is larger, ranks of one larger than other Test statistics, called nonparametric tests, are: Sign test statistic Kolmogorov- Smirnov Test Statistic Rank sum test statistic Wilocoxon test statistic

20 Kolmogorov Smirnov Test Determining whether two data sets have the same distribution To compare two CDFs, S N1 (x) and S N2 (x), Compute the statistic D, the max of absolute difference between them: D = max S N1 (x) S N2 (x) <x< which is mapped to a probability of similarity, P P = Q KS N e N D e where the Q KS function is defined as Q KS (λ) = 2 ( 1) j 1 e 2 j 2 λ 2, Q KS (0) =1, Q KS ( ) = 0 j=1 and N e is the effective number of data points N e = N 1N 2 N where N 1 and N 2 are the no of data points in each 1 + N 2

21 Comparing Hypotheses: Bayesian Perspective Done by comparison posterior probabilities p(h i x) p(x H i )p(h i ) Ratio leads to factorization in terms of prior odds and likelihood ratio p(h 0 x) p(h 1 x) p(h ) 0 p(h 1 ) p(x H ) 0 p(x H 1 ) Some complications: Likelihoods are marginal likelihoods obtained by integrating over unspecified parameters Prior probs zero if parameter has specific value Assign discrete non-zero values to given values of θ

22 Hypothesis Testing in Context In Data mining things are more complicated 1. Data sets are large Expect to obtain statistical significance Slight departures from model will be seen as significant 2. Sequential model fitting is common 3. Results have various implications Many models will be examined e.g., discrete values for mean If m true independent hypotheses are examined at 0.05 probability of incorrect rejection, with 100 hypotheses we are certain to reject one hypothesis If we set family error rate as 0.05, we get α =

23 Simultaneous Test Procedures Address difficulties of multiple hypotheses Bonferroni Inequality p(a 1, a 2,..,a n ) p(a 1 ) + p(a 2 )+.+p(a n ) n + 1 By including other terms in the expansion, we can develop more accurate bounds but require knowledge of dependence relationships 23

24 Sampling in Data Mining Differences between Statistical inference and data mining Data set in data mining may consists of entire population Experimental design in statistics is concerned with optimal ways of collecting data More often database is sample of population Contain records for every object in population but the analysis is based on a sample 24

25 Systematic Sampling Strategy for Representativeness Taking one out of every two records Sampling fraction =0.5 Can lead to unexpected problems when there are regularities in database E.g., data set where records are of married couples, one at a time 25

26 Random Sampling Avoiding regularities Epsem Sampling: Equal probability of selection method Each record has same probability of being chosen Simple random sampling n records are chosen from database of size N such that each set of n records has the same probability of being chosen 26

27 Results of Simple Random Sampling Population True mean =0.5 Samples of size n= 10, 100 and 1000 Repeat procedure 200 times and plot histograms Larger the sample more closely are values of sample mean are distributed about true mean n=10 n=100 n=

28 Variance of Mean of Random Sample If variance of population of size N is σ 2 variance of mean of a simple random sample of size n without replacement is Since second term is small, variance decreases as sample size increases When sample size is doubled standard deviation is is reduced by factor of sqrt(2) Estimate σ 2 from the sample using σ 2 n (1 n N ) (x(i) x) /(n 1)

29 Stratified Random Sampling Split population into non-overlapping subpopulations or strata Advantages: Enables making statements about subpopulations If strata are relatively homogeneous, most of variability is accounted for by differences between strata 29

30 Mean of Stratified Sample Total size of population is N k th stratum has N k elements in it n k are chosen for the sample from this stratum Sample mean within k th stratum is Estimate of Population Mean: x k N k x k N Variance of estimator 30 1 N 2 N 2 k var(x k)

31 Cluster Sampling Hierarchical Data Letters occur in words which lie in sentences grouped into paragraphs occurring in chapters forming books sitting in libraries Simple random sample is difficult To draw a sample of elements Instead draw a sample of units that contain several elements 31

32 Unequal Cluster Sizes Size of random sample from k th cluster is n k Sample mean r x k / n k Overall sampling fraction is f (which is small and ignorable) Variance of r 1 f ( n k ) 2 a 1 a ( s 2 k + r 2 2 n k 2r s k n k ) 32

33 Nothing is certain Conclusion Data mining objective is to make discoveries from data We want to be as confident as we can that our conclusions are correct Fundamental tool is probability Universal language for handling uncertainty Allows us to obtain best estimates even with data inadequacies and small samples 33

### Lecture 13: Kolmogorov Smirnov Test & Power of Tests

Lecture 13: Kolmogorov Smirnov Test & Power of Tests S. Massa, Department of Statistics, University of Oxford 2 February 2016 An example Suppose you are given the following 100 observations. -0.16-0.68-0.32-0.85

### The t Test. April 3-8, exp (2πσ 2 ) 2σ 2. µ ln L(µ, σ2 x) = 1 σ 2. i=1. 2σ (σ 2 ) 2. ˆσ 2 0 = 1 n. (x i µ 0 ) 2 = i=1

The t Test April 3-8, 008 Again, we begin with independent normal observations X, X n with unknown mean µ and unknown variance σ The likelihood function Lµ, σ x) exp πσ ) n/ σ x i µ) Thus, ˆµ x ln Lµ,

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### Lecture 2: Statistical Estimation and Testing

Bioinformatics: In-depth PROBABILITY AND STATISTICS Spring Semester 2012 Lecture 2: Statistical Estimation and Testing Stefanie Muff (stefanie.muff@ifspm.uzh.ch) 1 Problems in statistics 2 The three main

### Null Hypothesis Significance Testing Signifcance Level, Power, t-tests Spring 2014 Jeremy Orloff and Jonathan Bloom

Null Hypothesis Significance Testing Signifcance Level, Power, t-tests 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Simple and composite hypotheses Simple hypothesis: the sampling distribution is

### 3.6: General Hypothesis Tests

3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.

### Testing: is my coin fair?

Testing: is my coin fair? Formally: we want to make some inference about P(head) Try it: toss coin several times (say 7 times) Assume that it is fair ( P(head)= ), and see if this assumption is compatible

### Lecture Topic 6: Chapter 9 Hypothesis Testing

Lecture Topic 6: Chapter 9 Hypothesis Testing 9.1 Developing Null and Alternative Hypotheses Hypothesis testing can be used to determine whether a statement about the value of a population parameter should

### Examination 110 Probability and Statistics Examination

Examination 0 Probability and Statistics Examination Sample Examination Questions The Probability and Statistics Examination consists of 5 multiple-choice test questions. The test is a three-hour examination

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### FALL 2005 EXAM C SOLUTIONS

FALL 005 EXAM C SOLUTIONS Question #1 Key: D S ˆ(300) = 3/10 (there are three observations greater than 300) H ˆ (300) = ln[ S ˆ (300)] = ln(0.3) = 1.0. Question # EX ( λ) = VarX ( λ) = λ µ = v = E( λ)

### Sociology 6Z03 Topic 15: Statistical Inference for Means

Sociology 6Z03 Topic 15: Statistical Inference for Means John Fox McMaster University Fall 2016 John Fox (McMaster University) Soc 6Z03: Statistical Inference for Means Fall 2016 1 / 41 Outline: Statistical

### Statistics for Management II-STAT 362-Final Review

Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### MCQ TESTING OF HYPOTHESIS

MCQ TESTING OF HYPOTHESIS MCQ 13.1 A statement about a population developed for the purpose of testing is called: (a) Hypothesis (b) Hypothesis testing (c) Level of significance (d) Test-statistic MCQ

### Chi-Square & F Distributions

Chi-Square & F Distributions Carolyn J. Anderson EdPsych 580 Fall 2005 Chi-Square & F Distributions p. 1/55 Chi-Square & F Distributions... and Inferences about Variances The Chi-square Distribution Definition,

### Lecture 1: t tests and CLT

Lecture 1: t tests and CLT http://www.stats.ox.ac.uk/ winkel/phs.html Dr Matthias Winkel 1 Outline I. z test for unknown population mean - review II. Limitations of the z test III. t test for unknown population

### The Chi-Square Distributions

MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation " of a normally distributed measurement and to test the goodness

### Hypothesis Testing hypothesis testing approach formulation of the test statistic

Hypothesis Testing For the next few lectures, we re going to look at various test statistics that are formulated to allow us to test hypotheses in a variety of contexts: In all cases, the hypothesis testing

### Inferences About Differences Between Means Edpsy 580

Inferences About Differences Between Means Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Inferences About Differences Between Means Slide

### ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing Robert Vanderbei Fall 2015 Slides last edited on December 11, 2015 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

### Statistical foundations of machine learning

Machine learning p. 1/45 Statistical foundations of machine learning INFO-F-422 Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Machine learning p. 2/45

### I. Basics of Hypothesis Testing

Introduction to Hypothesis Testing This deals with an issue highly similar to what we did in the previous chapter. In that chapter we used sample information to make inferences about the range of possibilities

### Biostatistics Lab Notes

Biostatistics Lab Notes Page 1 Lab 1: Measurement and Sampling Biostatistics Lab Notes Because we used a chance mechanism to select our sample, each sample will differ. My data set (GerstmanB.sav), looks

### Chapter 7. Section Introduction to Hypothesis Testing

Section 7.1 - Introduction to Hypothesis Testing Chapter 7 Objectives: State a null hypothesis and an alternative hypothesis Identify type I and type II errors and interpret the level of significance Determine

### c. The factor is the type of TV program that was watched. The treatment is the embedded commercials in the TV programs.

STAT E-150 - Statistical Methods Assignment 9 Solutions Exercises 12.8, 12.13, 12.75 For each test: Include appropriate graphs to see that the conditions are met. Use Tukey's Honestly Significant Difference

### MAT X Hypothesis Testing - Part I

MAT 2379 3X Hypothesis Testing - Part I Definition : A hypothesis is a conjecture concerning a value of a population parameter (or the shape of the population). The hypothesis will be tested by evaluating

### Math 225. Chi-Square Tests. 1. A Chi-Square Goodness-of-Fit Test. Conditions. Chi-Square Distribution. The Test Statistic

Chi-Square Tests Math 5 Chapter 11 In this chapter, you will learn about two chi-square tests: Goodness of fit. Are the proportions of the different outcomes in this population equal to the hypothesized

### Unit 21 Student s t Distribution in Hypotheses Testing

Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between

### Hypothesis Testing with One Sample. Introduction to Hypothesis Testing 7.1. Hypothesis Tests. Chapter 7

Chapter 7 Hypothesis Testing with One Sample 71 Introduction to Hypothesis Testing Hypothesis Tests A hypothesis test is a process that uses sample statistics to test a claim about the value of a population

### Power and Sample Size Determination

Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 Power 1 / 31 Experimental Design To this point in the semester,

### Factorial Analysis of Variance

Chapter 560 Factorial Analysis of Variance Introduction A common task in research is to compare the average response across levels of one or more factor variables. Examples of factor variables are income

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### PASS Sample Size Software

Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### AP STATISTICS 2009 SCORING GUIDELINES (Form B)

AP STATISTICS 2009 SCORING GUIDELINES (Form B) Question 5 Intent of Question The primary goals of this question were to assess students ability to (1) state the appropriate hypotheses, (2) identify and

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction

### Quantitative Biology Lecture 5 (Hypothesis Testing)

15 th Oct 2015 Quantitative Biology Lecture 5 (Hypothesis Testing) Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Classification Errors Statistical significance T-tests Q-values (Traditional)

### Classical Hypothesis Testing and GWAS

Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 15, 2014 Outline General Principles Classical statistical hypothesis testing involves the test of a null hypothesis

### MATH4427 Notebook 3 Spring MATH4427 Notebook Hypotheses: Definitions and Examples... 3

MATH4427 Notebook 3 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 3 MATH4427 Notebook 3 3 3.1 Hypotheses: Definitions and Examples............................

### Nonparametric tests, Bootstrapping

Nonparametric tests, Bootstrapping http://www.isrec.isb-sib.ch/~darlene/embnet/ Hypothesis testing review 2 competing theories regarding a population parameter: NULL hypothesis H ( straw man ) ALTERNATIVEhypothesis

### 1. Chi-Squared Tests

1. Chi-Squared Tests We'll now look at how to test statistical hypotheses concerning nominal data, and specifically when nominal data are summarized as tables of frequencies. The tests we will considered

### Hypothesis Testing. April 21, 2009

Hypothesis Testing April 21, 2009 Your Claim is Just a Hypothesis I ve never made a mistake. Once I thought I did, but I was wrong. Your Claim is Just a Hypothesis Confidence intervals quantify how sure

### Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm.

Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling

### 1. Rejecting a true null hypothesis is classified as a error. 2. Failing to reject a false null hypothesis is classified as a error.

1. Rejecting a true null hypothesis is classified as a error. 2. Failing to reject a false null hypothesis is classified as a error. 8.5 Goodness of Fit Test Suppose we want to make an inference about

### The Modified Kolmogorov-Smirnov One-Sample Test Statistic

Thailand Statistician July 00; 8() : 43-55 http://statassoc.or.th Contributed paper The Modified Kolmogorov-Smirnov One-Sample Test Statistic Jantra Sukkasem epartment of Mathematics, Faculty of Science,

### Chi-square test Testing for independeny The r x c contingency tables square test

Chi-square test Testing for independeny The r x c contingency tables square test 1 The chi-square distribution HUSRB/0901/1/088 Teaching Mathematics and Statistics in Sciences: Modeling and Computer-aided

### Statistical Inference and t-tests

1 Statistical Inference and t-tests Objectives Evaluate the difference between a sample mean and a target value using a one-sample t-test. Evaluate the difference between a sample mean and a target value

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

### Confindence Intervals and Probability Testing

Confindence Intervals and Probability Testing PO7001: Quantitative Methods I Kenneth Benoit 3 November 2010 Using probability distributions to assess sample likelihoods Recall that using the µ and σ from

### Water Quality Problem. Hypothesis Testing of Means. Water Quality Example. Water Quality Example. Water quality example. Water Quality Example

Water Quality Problem Hypothesis Testing of Means Dr. Tom Ilvento FREC 408 Suppose I am concerned about the quality of drinking water for people who use wells in a particular geographic area I will test

### Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

### 1 Confidence intervals

Math 143 Inference for Means 1 Statistical inference is inferring information about the distribution of a population from information about a sample. We re generally talking about one of two things: 1.

### 4. Sum the results of the calculation described in step 3 for all classes of progeny

F09 Biol 322 chi square notes 1. Before proceeding with the chi square calculation, clearly state the genetic hypothesis concerning the data. This hypothesis is an interpretation of the data that gives

### Advanced Techniques for Mobile Robotics Statistical Testing

Advanced Techniques for Mobile Robotics Statistical Testing Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz Statistical Testing for Evaluating Experiments Deals with the relationship between

### Analysis of numerical data S4

Basic medical statistics for clinical and experimental research Analysis of numerical data S4 Katarzyna Jóźwiak k.jozwiak@nki.nl 3rd November 2015 1/42 Hypothesis tests: numerical and ordinal data 1 group:

### Statistiek (WISB361)

Statistiek (WISB361) Final exam June 29, 2015 Schrijf uw naam op elk in te leveren vel. Schrijf ook uw studentnummer op blad 1. The maximum number of points is 100. Points distribution: 23 20 20 20 17

### Business Statistics. Lecture 8: More Hypothesis Testing

Business Statistics Lecture 8: More Hypothesis Testing 1 Goals for this Lecture Review of t-tests Additional hypothesis tests Two-sample tests Paired tests 2 The Basic Idea of Hypothesis Testing Start

### One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots

1 / 27 One-sample normal hypothesis Testing, paired t-test, two-sample normal inference, normal probability plots Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis

### For eg:- The yield of a new paddy variety will be 3500 kg per hectare scientific hypothesis. In Statistical language if may be stated as the random

Lecture.9 Test of significance Basic concepts null hypothesis alternative hypothesis level of significance Standard error and its importance steps in testing Test of Significance Objective To familiarize

### STT 430/630/ES 760 Lecture Notes: Chapter 6: Hypothesis Testing 1. February 23, 2009 Chapter 6: Introduction to Hypothesis Testing

STT 430/630/ES 760 Lecture Notes: Chapter 6: Hypothesis Testing 1 February 23, 2009 Chapter 6: Introduction to Hypothesis Testing One of the primary uses of statistics is to use data to infer something

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Homework Zero. Max H. Farrell Chicago Booth BUS41100 Applied Regression Analysis. Complete before the first class, do not turn in

Homework Zero Max H. Farrell Chicago Booth BUS41100 Applied Regression Analysis Complete before the first class, do not turn in This homework is intended as a self test of your knowledge of the basic statistical

### Chi-Square Distribution. is distributed according to the chi-square distribution. This is usually written

Chi-Square Distribution If X i are k independent, normally distributed random variables with mean 0 and variance 1, then the random variable is distributed according to the chi-square distribution. This

Math 1 MPS Homework 6 Answers to additional problems 1. What are the two types of hypotheses used in a hypothesis test? How are they related? Ho: Null Hypotheses A statement about a population parameter

### Median of the p-value Under the Alternative Hypothesis

Median of the p-value Under the Alternative Hypothesis Bhaskar Bhattacharya Department of Mathematics, Southern Illinois University, Carbondale, IL, USA Desale Habtzghi Department of Statistics, University

### Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

### TESTING OF HYPOTHESIS

TESTING OF HYPOTHESIS Rajender Parsad I.A.S.R.I., Library Avenue, New Delhi 11 12 rajender@iasri.res.in In applied investigations or in experimental research, one may wish to estimate the yield of a new

### Hypothesis Testing. Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University

Hypothesis Testing Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 AMU / Bon-Tech, LLC, Journi-Tech Corporation Copyright 2015 Learning Objectives Upon successful

### CHAPTER 9 HYPOTHESIS TESTING

CHAPTER 9 HYPOTHESIS TESTING A hypothesis test is a formal way to make a decision based on statistical analysis. A hypothesis test has the following general steps: Set up two contradictory hypotheses.

### Statistics revision. Dr. Inna Namestnikova. Statistics revision p. 1/8

Statistics revision Dr. Inna Namestnikova inna.namestnikova@brunel.ac.uk Statistics revision p. 1/8 Introduction Statistics is the science of collecting, analyzing and drawing conclusions from data. Statistics

### Intro to Parametric Statistics,

Descriptive Statistics vs. Inferential Statistics vs. Population Parameters Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom

### NPTEL STRUCTURAL RELIABILITY

NPTEL Course On STRUCTURAL RELIABILITY Module # 02 Lecture 6 Course Format: Web Instructor: Dr. Arunasis Chakraborty Department of Civil Engineering Indian Institute of Technology Guwahati 6. Lecture 06:

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### Power & Effect Size power Effect Size

Power & Effect Size Until recently, researchers were primarily concerned with controlling Type I errors (i.e. finding a difference when one does not truly exist). Although it is important to make sure

### Multiple One-Sample or Paired T-Tests

Chapter 610 Multiple One-Sample or Paired T-Tests Introduction This chapter describes how to estimate power and sample size (number of arrays) for paired and one sample highthroughput studies using the.

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### Hypothesis Testing. Concept of Hypothesis Testing

Quantitative Methods 2013 Hypothesis Testing with One Sample 1 Concept of Hypothesis Testing Testing Hypotheses is another way to deal with the problem of making a statement about an unknown population

### AP Statistics 2008 Scoring Guidelines Form B

AP Statistics 2008 Scoring Guidelines Form B The College Board: Connecting Students to College Success The College Board is a not-for-profit membership association whose mission is to connect students

### Confidence intervals, t tests, P values

Confidence intervals, t tests, P values Joe Felsenstein Department of Genome Sciences and Department of Biology Confidence intervals, t tests, P values p.1/31 Normality Everybody believes in the normal

### Chapter 3: Nonparametric Tests

B. Weaver (15-Feb-00) Nonparametric Tests... 1 Chapter 3: Nonparametric Tests 3.1 Introduction Nonparametric, or distribution free tests are so-called because the assumptions underlying their use are fewer

### COMP6053 lecture: Hypothesis testing, t-tests, p-values, type-i and type-ii errors.

COMP6053 lecture: Hypothesis testing, t-tests, p-values, type-i and type-ii errors jn2@ecs.soton.ac.uk The t-test This lecture introduces the t-test -- our first real statistical test -- and the related

### AP Statistics 2013 Scoring Guidelines

AP Statistics 2013 Scoring Guidelines The College Board The College Board is a mission-driven not-for-profit organization that connects students to college success and opportunity. Founded in 1900, the

### Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

### PASS Sample Size Software. Linear Regression

Chapter 855 Introduction Linear regression is a commonly used procedure in statistical analysis. One of the main objectives in linear regression analysis is to test hypotheses about the slope (sometimes

### Two-Sample T-Test from Means and SD s

Chapter 07 Two-Sample T-Test from Means and SD s Introduction This procedure computes the two-sample t-test and several other two-sample tests directly from the mean, standard deviation, and sample size.

### The Purpose of Hypothesis Testing

Section 8 1A: An Introduction to Hypothesis Testing The Purpose of Hypothesis Testing Seeʼs Candy states that a box of itʼs candy weighs 16 oz. They do not mean that every single box weights exactly 16

### 6.1 The Elements of a Test of Hypothesis

University of California, Davis Department of Statistics Summer Session II Statistics 13 August 22, 2012 Date of latest update: August 20 Lecture 6: Tests of Hypothesis Suppose you wanted to determine

### Hypothesis testing. Null hypothesis H 0 and alternative hypothesis H 1.

Hypothesis testing Null hypothesis H 0 and alternative hypothesis H 1. Hypothesis testing Null hypothesis H 0 and alternative hypothesis H 1. Simple and compound hypotheses. Simple : the probabilistic

### Inference in Regression Analysis

Yang Feng Inference in the Normal Error Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of

### tests whether there is an association between the outcome variable and a predictor variable. In the Assistant, you can perform a Chi-Square Test for

This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. In practice, quality professionals sometimes

### Formulas and Tables by Mario F. Triola

Formulas and Tables by Mario F. Triola Copyright 010 Pearson Education, Inc. Ch. 3: Descriptive Statistics x Mean f # x x f Mean (frequency table) 1x - x s B n - 1 Standard deviation n 1 x - 1 x Standard

### 0.1 Solutions to CAS Exam ST, Fall 2015

0.1 Solutions to CAS Exam ST, Fall 2015 The questions can be found at http://www.casact.org/admissions/studytools/examst/f15-st.pdf. 1. The variance equals the parameter λ of the Poisson distribution for

### Two-Variable Regression: Interval Estimation and Hypothesis Testing

Two-Variable Regression: Interval Estimation and Hypothesis Testing Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Confidence Intervals & Hypothesis Testing

### A) to D) to B) to E) to C) to Rate of hay fever per 1000 population for people under 25

Unit 0 Review #3 Name: Date:. Suppose a random sample of 380 married couples found that 54 had two or more personality preferences in common. In another random sample of 573 married couples, it was found