Lecture 2: Statistical Estimation and Testing
|
|
- Bennett Barnett
- 7 years ago
- Views:
Transcription
1 Bioinformatics: In-depth PROBABILITY AND STATISTICS Spring Semester 2012 Lecture 2: Statistical Estimation and Testing Stefanie Muff 1
2 Problems in statistics 2
3 The three main questions in statistics are Estimation: estimate the unknown value of θ, given observations of X. Question: what is the most likely value for θ? Testing: test a hypothesis about the unknown value of θ. Base acceptance/rejection upon observation of X. Question: is my hypothesis compatible with the observed data? Confidence intervals: give an interval of parameter values that explain the data reasonably well. Question: which parameters would be compatible with my data? We will concentrate on the first two questions. 3
4 4
5 Given: a probability model X P θ For example: X Bin(100,p) but the probability p is unknown. How to obtain a guess of p? => Estimation! The collection x1, x2,..., xn is called (observed) sample of X1, X2,..., Xn. 5
6 Estimator, estimate 6
7 Examples of Estimators 7
8 Desirable properties of estimators 8
9 9
10 Likelihood function for discrete RVs 10
11 Likelihood function for continuous RVs The likelihood function for continuous random variables can be set equal to the density function L(x 1,x 2,..., x n ; ˆθ) =f X (x 1,x 2,...,x n ; ˆθ), whereas f X is the joint density of (X 1,X 2,..., X n ). If X 1,X 2,..., X n are independent L(x 1,x 2,..., x n ; ˆθ) =f X1 (x 1 ; ˆθ) f X2 (x 2 ; ˆθ)...f Xn (x n ; ˆθ). 11
12 Maximum likelihood estimator 12
13 Maximum likelihood estimate 13
14 Properties of MLEs 14
15 Example: ML for the binomial distribution 15
16 16
17 Compare this to the estimators on Slide 7: the ML estimator! is 17
18 The Log Likelihood 18
19 Likelihoods are not just for independent observations! 19
20 Example: Log likelihood for the binomial distribution Instead of optimizing The log likelihood x 1 log(θ) + (100 x 1 ) log(1 θ) x n log(θ) + (100 x n ) log(1 θ) = log(θ) x i + log(1 θ) (100 x i ) i i has to be optimized to obtain the ML estimator. The result is exactly the same as in the non-log case (check as an exercise). 20
21 Example: MLE for a normal distribution Remember: f(x, µ, σ 2 )= 1 (x µ)2 e 2σ 2 2πσ 2 Given a set of n independent observations x1, x2,...,xn.the log likelihood then is log(f(x 1,...,x n ; µ, σ 2 )) = n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 n (x i µ) 2 i=1 This expression has to be derived with respect to σ 2 and µ separately and be set to 0. => Obtain two equations to estimate two parameters. See example in the exercises. 21
22 MLE in practice Analytical formulas for the ML estimator can be found only in relatively simple models. In other cases, approximate ML estimators can be found by iterative numerical optimization (Expectation-Maximization algorithm, Newton- Raphson algorithm) second-order Taylor approximations. These calculations are left to the computer (R). 22
23 Statistical Testing 23
24 24
25 Introductory example revisited g a g g a t t a c g g t a c t a g a t t c a t a a a c a c t g a c a c a t c a c t g c a c t c g c t a a Two DNA sequences of length 26. Matches at 11 of 26 positions. Is this sufficient to conclude that the two sequences are evolutionarily related? In order to answer this question, we have to find out how unlikely it would be to see 11 out of 26 matches by chance. Need to know the probability distribution of the random variable describing this experiment. Can then calculate the probability of the event. This is the essence of statistical testing. 25
26 Steps in a statistical test 1. Formulate null and alternative hypotheses H0 and H1. 2. Determine a test statistic T. 3. Determine the distribution of T under H0. 4. Choose the significance level α. 5. Calculate the critical value C. 6. Obtain the data and decide. For illustration, we now go through steps 1-6 for the binomal test. 26
27 1. Formulate the hypotheses A hypothesis typically specifies a value in a distribution. Here: X Bin(26, p), but p is not known. The null hypothesis H0 is the default hypothesis: H 0 : X Bin(26,p), p=0.25 The alternative hypothesis H1 is the controversial hypothesis. Strong evidence is needed to accept it in favour of H0: H 1 : X Bin(26,p), p > 0.25 Aim of a test: to find evidence against H0 in order to reject it. 27
28 2. Determine a test statistic A test statistic T is a numerical value that can be determined from the outcome of a chance experiment. Note that, by definition, T is a random variable as well! Here, T = number of matches between the two sequences (= X) (There is only one realization) Usually there is more than one realization in a random sample, and the test statistic depends on all realizations: Other examples: T (X 1,...,X n )= X X n n T (X 1,..., X n )= (X µ 0) ˆσ/ n = X (mean) (T-statistic) 28
29 3. Distribution of T under H0 In case of H0 (pure chance alignment), the distribution of T is T Bin(26, 0.25) (Note that in reality Bin(26,p) is not the right distribution for this problem, we only use it to illustrate the idea of statistical testing.) 29
30 4. Choose the significance level α In our example we reject H0 if the number of matches is too high, so that it is unlikely to happen by chance. α determines what unlikely means. Let us choose α=0.05. The significance level α fixes the probability with which H0 is rejected, although it is true. Interpretation: In 5% of the cases (1 out of 20) we will find a value of T so high that we do not believe it has happened by chance - although it did! α = probability to reject a true null hypothesis = probability to make a type I error. 30
31 5. Calculate the critical value We now calculate a value C for the test statistic T, above which we consider it unlikely that H0 is true: P(T C H 0 )=α In our example with H0: T=X Bin(26,0.25) P(X 7 H 0 ) = P(X 8 H 0 ) = P(X 9 H 0 ) = P(X 10 H 0 ) = P(X 11 H 0 ) = => C = 11! 26 ( ) 26 (which is calculated as P(X k H 0 )= 0.25 i i ) i i=k 31
32 6. Decide Only now is it finally allowed to calculate the value of T. Here, we already know that T=11, since X=T. From step 5 we have the following rule: Reject H0 if T 11 and do not reject H0 if T < 11 Decision: we reject H0. Thus we do not believe that 11 out of 26 matches can happen by chance. We say: There is statistical evidence that the two sequences are related due to evolution. 32
33 Statistical significance Note: The decision to reject H0 on the previous slide depends on the significance level α. We would not have rejected H0 if α < 0.04! Whether the outcome of an experiment is statistically significant or not depends crucially on α! For α=1 any result is significant... (but meaningless). Scientific results that claim statistical significance without giving α should at least be doubted... 33
34 p-values The p-value is the probability to see something at least as extreme as just observed under H0. It depends on the data. In our example: P(X 11 H 0 ) = Thus the p-value of our experiment is p= Many statistics programmes (R, SPSS,...) compute directly this. Your results are then significant if p < α. Interpretation: The p-value tells you for which α your data would be significant. 34
35 Type I and type II errors The type I error depends on the significance level α. It is the probability to reject the null hypothesis, although it is true. The probability for a type I error is The type II error is the other kind of false decisions: it is the probability that the null hypothesis is not rejected, although it is wrong: 35
36 The power of a test The power is typically more complicated to compute, especially if H1 is unknown. 36
37 Example 37
38 BUT if we would have chosen α=0.01, the power (1-β) would be lower! E.g. 1 β = P(X 11 p =0.26) = β = P(X 11 p =0.3) =
39 Fact: The decrease of the type I error comes at the expense of an increased type II error - and vice versa. There is a compromise between a low significance level α and high power 1-β. 39
40 Bin(20,0.25) and Bin(20,0.3) distribution f(x) x Power if H0 : p =0.25, H1 : p =0.3 α =
41 Bin(20,0.25) and Bin(20,0.6) distribution f(x) Power if H0 : p =0.25, H1 : p =0.6 x α =
42 Bayesian Hypothesis Testing Remember: P(A j B) = P(B A j ) P(A j ) n i=1 P(B A i) P(A i ) Bayes theorem Example (from Ewans/Grant): A bag contains 10 coins, where only 3 of them are fair. The other 7 have a chance to show heads with ph=0.6. Take one coin at random and flip it five times. All five flips give heads (event D). Then: P(H)=0.3 (prior probability that coin is fair) P(H c )=0.7 (prior probability that coin is unfair) P(D H)=0.5 5 P(D H c )=
43 Now, the posterior probability that the coin was fair, given the outcome, can be calculated: P(H D) = = P(D H) P (H) P(D H) P (H)+P(D H c ) P (H c ) =0.147 This is lower than the prior distribution of H, so evidence against it. Moreover: P(H c D) = So there is a much higher posterior probability (given the outcome and the prior) that the coin I picked was unfair. The same setup works mit multiple hypotheses H1, H2,..., Hn. Identical calculations as above lead to posterior probabilities and the hypothesis with the highest posterior is chosen. 43
44 Other statistical tests There is a large variety of statistical tests. The choice of the correct test depends on the type and qualitiy of the data, the assumptions and the question to be answered. Examples: z-test t-test sign-test Wilcoxon-test Mann-Whitney / U-test χ 2 goodness-of-fit test / χ 2 test for independence... 44
45 The z-test The simplest version of a z-test: One-sample problem Situation: Given n independent measurements Xi, 1 i n. Question: Can the expected value E[X]=µ be equal to, larger or lower than some theoretical value µo? Paired two-sample problem Situation: Given n independent measurements Yi and Zi, 1 i n of the same feature in two different states. E.g., the blood pressure of each person is measured before and after the intake of a special drug. Question: Is there a significant difference between the two states? I.e., is the difference Xi = Yi - Zi 0 (or < 0, >0) or, equivalently: is E[X] 0? 45
46 Assumptions In the z-test it is assumed that X i N(µ X, σ 2 ) Thus the measurements should follow a normal distribution. Moreover, the variance σ 2 of Xi is known. 46
47 1. Hypotheses H 0 : X i N(µ 0, σ0 2 ), 1 i n, independent, with known variance σ2 0 H 1 : X i N(µ 1, σ 0 2 ), 1 i n, independent, with known variance σ2 0 with either µ 1 >µ 0, µ 1 <µ 0 or µ 0 µ 1 2. Test statistic Z = X µ 0 σ 0 / n 3. Distribution of Z under H0 Z N(0, 1) 47
48 4. Choose the significance level α E.g., α=5% (or a lower level, is stronger signifiance is needed). 5. Calculate the critical value The values can be looked up in a table. The most important ones (for the α=5% level) are given here: µ 1 >µ 0 : c =1.64 with R: > qnorm(0.95) => Ho is rejected, if Z > 1.64 µ 1 <µ 0 : c = 1.64 => Ho is rejected, if Z < with R: > qnorm(0.05) µ 0 µ 1 : c =1.96 => Ho is rejected, if Z > 1.96 with R: > qnorm(0.975) where do these values come from...? 48
49 One-sided test µ 1 >µ 0 : µ 1 <µ 0 : N(0,1) distribution N(0,1) distribution f(x) f(x) x Rejection range 5% x 49
50 Two-sided test µ 0 µ 1 N(0,1) distribution f(x) % 2.5% x 50
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationTwo-Sample T-Tests Assuming Equal Variance (Enter Means)
Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of
More informationHypothesis Test for Mean Using Given Data (Standard Deviation Known-z-test)
Hypothesis Test for Mean Using Given Data (Standard Deviation Known-z-test) A hypothesis test is conducted when trying to find out if a claim is true or not. And if the claim is true, is it significant.
More informationTwo-Sample T-Tests Allowing Unequal Variance (Enter Difference)
Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption
More informationMONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010
MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationBayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify
More informationHYPOTHESIS TESTING WITH SPSS:
HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER
More informationTHE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.
THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM
More informationPeople have thought about, and defined, probability in different ways. important to note the consequences of the definition:
PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationTwo-sample hypothesis testing, II 9.07 3/16/2004
Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,
More informationTesting Hypotheses About Proportions
Chapter 11 Testing Hypotheses About Proportions Hypothesis testing method: uses data from a sample to judge whether or not a statement about a population may be true. Steps in Any Hypothesis Test 1. Determine
More informationName: Date: Use the following to answer questions 3-4:
Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationGeneral Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.
General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationTesting a claim about a population mean
Introductory Statistics Lectures Testing a claim about a population mean One sample hypothesis test of the mean Department of Mathematics Pima Community College Redistribution of this material is prohibited
More informationIntroduction. Hypothesis Testing. Hypothesis Testing. Significance Testing
Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationExperimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test
Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely
More informationLikelihood: Frequentist vs Bayesian Reasoning
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More informationLecture 9: Bayesian hypothesis testing
Lecture 9: Bayesian hypothesis testing 5 November 27 In this lecture we ll learn about Bayesian hypothesis testing. 1 Introduction to Bayesian hypothesis testing Before we go into the details of Bayesian
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationC. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.
Sample Multiple Choice Questions for the material since Midterm 2. Sample questions from Midterms and 2 are also representative of questions that may appear on the final exam.. A randomly selected sample
More informationQUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
More informationHYPOTHESIS TESTING: POWER OF THE TEST
HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
More informationChapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing
Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing
More informationChapter 2. Hypothesis testing in one population
Chapter 2. Hypothesis testing in one population Contents Introduction, the null and alternative hypotheses Hypothesis testing process Type I and Type II errors, power Test statistic, level of significance
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationresearch/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other
1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric
More informationTwo-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption
Two-sample t-tests. - Independent samples - Pooled standard devation - The equal variance assumption Last time, we used the mean of one sample to test against the hypothesis that the true mean was a particular
More informationHypothesis testing - Steps
Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
More informationThe Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationChapter 6: Point Estimation. Fall 2011. - Probability & Statistics
STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationHypothesis testing. c 2014, Jeffrey S. Simonoff 1
Hypothesis testing So far, we ve talked about inference from the point of estimation. We ve tried to answer questions like What is a good estimate for a typical value? or How much variability is there
More informationBayesian Analysis for the Social Sciences
Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November
More informationWhat is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference
0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures
More informationIntroduction to Hypothesis Testing OPRE 6301
Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about
More informationTesting Research and Statistical Hypotheses
Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you
More informationLecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
More informationIntroduction to Hypothesis Testing
I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true
More informationChapter 2 Probability Topics SPSS T tests
Chapter 2 Probability Topics SPSS T tests Data file used: gss.sav In the lecture about chapter 2, only the One-Sample T test has been explained. In this handout, we also give the SPSS methods to perform
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationComparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples
Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationThe sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].
Probability Theory Probability Spaces and Events Consider a random experiment with several possible outcomes. For example, we might roll a pair of dice, flip a coin three times, or choose a random real
More informationHYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...
HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men
More informationLecture 8: Signal Detection and Noise Assumption
ECE 83 Fall Statistical Signal Processing instructor: R. Nowak, scribe: Feng Ju Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(, σ I n n and S = [s, s,...,
More informationProjects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
More informationAPPLIED MATHEMATICS ADVANCED LEVEL
APPLIED MATHEMATICS ADVANCED LEVEL INTRODUCTION This syllabus serves to examine candidates knowledge and skills in introductory mathematical and statistical methods, and their applications. For applications
More information5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
More informationNon-Inferiority Tests for Two Proportions
Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which
More informationHYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...
HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men
More informationp-values and significance levels (false positive or false alarm rates)
p-values and significance levels (false positive or false alarm rates) Let's say 123 people in the class toss a coin. Call it "Coin A." There are 65 heads. Then they toss another coin. Call it "Coin B."
More informationLecture 25. December 19, 2007. Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationCHAPTER 14 NONPARAMETRIC TESTS
CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationChapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution
Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS Part 3: Discrete Uniform Distribution Binomial Distribution Sections 3-5, 3-6 Special discrete random variable distributions we will cover
More informationTwo-sample inference: Continuous data
Two-sample inference: Continuous data Patrick Breheny April 5 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data As
More informationDongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
More informationStat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015
Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a t-distribution as an approximation
More informationChapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:
Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a
More informationLAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationUnderstanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation
Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation Leslie Chandrakantha lchandra@jjay.cuny.edu Department of Mathematics & Computer Science John Jay College of
More information3.4. The Binomial Probability Distribution. Copyright Cengage Learning. All rights reserved.
3.4 The Binomial Probability Distribution Copyright Cengage Learning. All rights reserved. The Binomial Probability Distribution There are many experiments that conform either exactly or approximately
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationHaving a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.
Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal
More informationTests for Two Proportions
Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationThe normal approximation to the binomial
The normal approximation to the binomial The binomial probability function is not useful for calculating probabilities when the number of trials n is large, as it involves multiplying a potentially very
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationStatistiek I. Proportions aka Sign Tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen. http://www.let.rug.nl/nerbonne/teach/statistiek-i/
Statistiek I Proportions aka Sign Tests John Nerbonne CLCG, Rijksuniversiteit Groningen http://www.let.rug.nl/nerbonne/teach/statistiek-i/ John Nerbonne 1/34 Proportions aka Sign Test The relative frequency
More informationLecture 8. Confidence intervals and the central limit theorem
Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of
More informationNonparametric Two-Sample Tests. Nonparametric Tests. Sign Test
Nonparametric Two-Sample Tests Sign test Mann-Whitney U-test (a.k.a. Wilcoxon two-sample test) Kolmogorov-Smirnov Test Wilcoxon Signed-Rank Test Tukey-Duckworth Test 1 Nonparametric Tests Recall, nonparametric
More informationPoint Biserial Correlation Tests
Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationSCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationNonparametric Statistics
Nonparametric Statistics References Some good references for the topics in this course are 1. Higgins, James (2004), Introduction to Nonparametric Statistics 2. Hollander and Wolfe, (1999), Nonparametric
More informationThe Wilcoxon Rank-Sum Test
1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We
More informationSolutions: Problems for Chapter 3. Solutions: Problems for Chapter 3
Problem A: You are dealt five cards from a standard deck. Are you more likely to be dealt two pairs or three of a kind? experiment: choose 5 cards at random from a standard deck Ω = {5-combinations of
More informationHypothesis Testing. Steps for a hypothesis test:
Hypothesis Testing Steps for a hypothesis test: 1. State the claim H 0 and the alternative, H a 2. Choose a significance level or use the given one. 3. Draw the sampling distribution based on the assumption
More information