# NOTES FOR SUMMER STATISTICS INSTITUTE COURSE COMMON MISTAKES IN STATISTICS SPOTTING THEM AND AVOIDING THEM

Save this PDF as:

Size: px
Start display at page:

Download "NOTES FOR SUMMER STATISTICS INSTITUTE COURSE COMMON MISTAKES IN STATISTICS SPOTTING THEM AND AVOIDING THEM"

## Transcription

1 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE COMMON MISTAKES IN STATISTICS SPOTTING THEM AND AVOIDING THEM Day 2: Important Details about Statistical Inference The devil is in the details (anonymous) MAY 23-26, 2016 Instructor: Martha K. Smith III. Overview of Frequentist Hypothesis Testing 11 Basic elements of most frequentist hypothesis tests 11 Illustration: Large sample z-test 13 Common confusions in terminology and concepts 14 Sampling distribution 16 Role of model assumptions 20 Importance of model assumptions in general 25 Robustness 26 IV. Frequentist Confidence Intervals 28 Point vs interval estimates 28 Illustration: Large sample z-procedure 30 Cautions and common confusions 36, 37, 39, 40, 42, 43 Importance of model assumptions 29, 35, 38, 40, 43 Robustness 40 Variations and trade-offs 40, 41, 42 V. More on Frequentist Hypothesis Tests 44 Illustration: One-sided t-test for a sample mean 45 p-values 49 VI. Misinterpretations and Misuses of p-values (as time permits)56 VII. Type I error and significance level (if time permits) 63 VIII. Pros and cons of setting a significance level (if time permits) 68

2 I. MORE PRECISE DEFINITION OF SIMPLE RANDOM SAMPLE In practice in applying statistical techniques, we re interested in random variables defined on the population under study. Recall the examples mentioned yesterday: 1. In a medical study, the population might be all adults over age 50 who have high blood pressure. 2. In another study, the population might be all hospitals in the U.S. that perform heart bypass surgery. 3. If we re studying whether a certain die is fair or weighted, the population is all possible tosses of the die. In these examples, we might be interested in the following random variables: Example 1: The difference in blood pressure with and without taking a certain drug. Example 2: The number of heart bypass surgeries performed in a particular year, or the number of such surgeries that are successful, or the number in which the patient has complications of surgery, etc. Example 3: The number that comes up on the die. 3 Connection with Independent Random Variables: If we take a sample of units from the population, we have a corresponding sample of values of the random variable. In Example 1: The random variable is difference in blood pressure with and without taking the drug. o Call this random variable Y (upper case Y) The sample of units from the population is a sample of adults over age 50 who have high blood pressure. o Call them person 1, person 2, etc. The corresponding sample of values of the random variable will consist of values we will call y 1, y 2,..., y n (lower case y s), where o n = number of people in the sample; o y 1 = the difference in blood pressures (that is, the value of Y) for the first person in the sample; o y 2 = the difference in blood pressures (that is, the value of Y) for the second person in the sample; 4 o etc.

3 5 6 We can look at this another way, in terms of n random variables Y 1, Y 2,..., Y n, described as follows: The random process for Y 1 is pick the first person in the sample ; the value of Y 1 is the value of Y for that person i.e., y 1. The random process for Y 2 is pick the second person in the sample ; the value of Y 2 is the value of Y for that person i.e., y 2. etc. The difference between using the small y's and the large Y's is that when we use the small y's we are thinking of a fixed sample of size n from the population, but when we use the large Y's, we are thinking of letting the sample vary (but always with size n). Note: The Y i s are sometimes called identically distributed, because they have the same probability distribution (in this example, the distribution of Y). Precise definition of simple random sample of a random variable: "The sample y 1, y 2,..., y n is a simple random sample" means that the associated random variables Y 1, Y 2,..., Y n are independent. Intuitively speaking, "independent" means that the values of any subset of the random variables Y 1, Y 2,..., Y n do not influence the probabilities of the values of the other random variables in the list. Recall: We defined a random sample as one that is chosen by a random process. Where is the random process in the precise definition? Note: To emphasize that the Y i s all have the same distribution, the precise definition is sometimes stated as, Y 1, Y 2,..., Y n are independent, identically distributed, sometimes abbreviated as iid.

4 7 8 Connection with the initial definition of simple random sample Recall the preliminary definition (from Moore and McCabe, Introduction to the Practice of Statistics) given in Simple Random Samples, Part 1: "A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected." Recall Example 3 above: We toss a die; the number that comes up on the die is the value of our random variable Y. In terms of the preliminary definition: o The population is all possible tosses of the die. o A simple random sample is n different tosses. The different tosses of the die are independent events (i.e., what happens in some tosses has no influence on the other tosses), which means that in the precise definition above, the random variables Y 1, Y 2,..., Y n are indeed independent: The numbers that come up in some tosses in no way influence the numbers that come up in other tosses. Compare this with example 2: The population is all hospitals in the U.S. that perform heart bypass surgery. Using the preliminary definition of simple random sample of size n, we end up with n distinct hospitals. This means that when we have chosen the first hospital in our simple random sample, we cannot choose it again to be in our simple random sample. Thus the events "Choose the first hospital in the sample; choose the second hospital in the sample;...," are not independent events: The choice of first hospital restricts the choice of the second and subsequent hospitals in the sample. If we now consider the random variable Y = the number of heart bypass surgeries performed in 2008, then it follows that the random variables Y 1, Y 2,..., Y n are not independent. The Bottom Line: In many cases, the preliminary definition does not coincide with the more precise definition. More specifically, the preliminary definition allows sampling without replacement, whereas the more precise definition requires sampling with replacement.

5 The Bad News: The precise definition is the one used in the mathematical theorems that justify many of the procedures of statistical inference. (More detail later.) The Good News: 1. If the population is large enough, the preliminary definition is close enough for all practical purposes. 2. In many cases where the population is not large enough, there are alternate theorems giving rise to alternate procedures using a finite population correction factor that will work. Unfortunately, the question, "How large is large enough?" does not have a one-size-fits-all answer; in particular, it depends on the procedure. However, the answer is relative to sample size we only need to worry for samples that are large relative to population size 3. In many cases, even if the population is not large enough, there are alternate procedures (known as permutation or randomization or resampling tests) that are applicable. Consequent Problems with Small Populations: 1) Using a large population procedure with a small population is a common mistake. 2) One more difficulty in selecting an appropriate sample, which leads to one more source of uncertainty. 9 II. WHY RANDOM SAMPLING IS IMPORTANT Recall the Myth: "A random sample will be representative of the population". A slightly better explanation (partly true but partly Urban Legend): "Random sampling prevents bias by giving all individuals an equal chance to be chosen." The element of truth: Random sampling does eliminate systematic bias. A practical rationale: This explanation is often the best plausible explanation that is acceptable to someone with little mathematical background. However, this statement could easily be misinterpreted as the myth above. An additional, very important, reason why random sampling is important, at least in frequentist statistical procedures, which are those most often taught (especially in introductory classes) and used: The Real Reason: The mathematical theorems that justify most parametric frequentist statistical procedures apply only to truly (suitably) random samples. The next section elaborates. 10

6 11 12 III. OVERVIEW OF FREQUENTIST HYPOTHESIS TESTING Type of Situation where a Hypothesis Test is used: We suspect a certain pattern in a certain situation. But we realize that natural variability or imperfect measurement might produce an apparent pattern that isn t really there. Note: The sampling distribution is the probability distribution of the test statistic, when considering all possible suitably random samples of the same size. (More later.) The exact details of these four elements will depend on the particular hypothesis test. In particular, the form of the sampling distribution will depend on the hypothesis test. Basic Elements of Most Frequentist Hypothesis Tests: Most commonly-used ( parametric ), frequentist hypothesis tests involve the following four elements: i. Model assumptions ii. iii. Null and alternative hypotheses A test statistic This is something that a. Is calculated by a rule from a sample; b. Is a measure of the strength of the pattern we are studying; and c. Has the property that, if the null hypothesis is true, extreme values of the test statistic are rare, and hence cast doubt on the null hypothesis. iv. A mathematical theorem saying, "If the model assumptions and the null hypothesis are both true, then the sampling distribution of the test statistic has a certain particular form."

7 13 14 Illustration: Large Sample z-test for the mean, with two-sided alternative The above elements for this test are: 1. Model assumptions: We are working with simple random samples of a random variable Y that has a normal distribution with known standard deviation. 2. Null hypothesis: The mean of the random variable Y is a certain value! 0. Alternative hypothesis: "The mean of the random variable Y is not! 0." (This is called the two-sided alternative.) 3. Test statistic: y (the sample mean of a simple random sample of size n from the random variable Y). Before discussing item 4 (the mathematical theorem), we first need to: 1. Clarify terminology 2. Discuss sampling distributions 1. Terminology and common confusions: The mean of the random variable Y is also called the expected value or the expectation of Y. o It s denoted E(Y). o It s also called the population mean, often denoted as µ. o It s what we do not know in this example. A sample mean is typically denoted y (read "y-bar"). o It s calculated from a sample y 1, y 2,..., y n of values of Y by the familiar formula y = (y 1 + y y n )/n. The sample mean y is an estimate of the population mean µ, but they are usually not the same. o Confusing them is a common mistake. Note the articles: "the population mean" but "a sample mean". o There is only one population mean associated with the random variable Y. o However, a sample mean depends on the sample chosen. o Since there are many possible samples, there are many possible sample means. Illustration: or https://istats.shinyapps.io/sampdist_cont/

8 Putting this in a more general framework: A parameter is a constant associated with a population. o In our example: The population mean µ is a parameter. o When applying statistics, parameters are usually unknown. o However, the goal is often to gain some information about parameters. To help gain information about (unknown) parameters, we use estimates that are calculated from a sample. o In our example: We calculate the sample mean y as an estimate of the population mean (parameter) µ. Variance is another common example where parameter and estimate might be confused: The population variance (or the variance of Y ) is a parameter, usually called! 2 (or Var(Y)). If we have a sample from Y, we can calculate the sample variance, usually called s 2. A sample variance is an estimate of the population variance. Different samples may give different estimates. Confusing population and sample variance is a common mistake Sampling Distribution: Although we apply a hypothesis test using a single sample, we need to step back and consider all possible suitably random samples of Y of size n, in order to understand the test. In our example: For each simple random sample of Y of size n, we get a value of y. We thus have a new random variable Y n : o The associated random process is pick a simple random sample of size n o The value of Y n is the sample mean y for this sample Note that o Y n stands for the new random variable o y stands for the value of Y n, for a particular sample of size n. o y (the value of Y n ) depends on the sample, and typically varies from sample to sample. The distribution of the new random variable Y n is called the sampling distribution of Y n (or the sampling distribution of the mean). 16 Note: Y n is an example of an estimator: a random variable whose values are estimates.

9 17 18 Now we can state the theorem that the large sample z-test for the mean relies on: 4. The theorem states: If the model assumptions are all true (i.e., if Y is normal and all samples considered are simple random samples), and if in addition the mean of Y is indeed! 0 (i.e., if the null hypothesis is true), then i. The sampling distribution of Y n is normal ii. The sampling distribution of Y n has mean! 0 iii. The sampling distribution of Y n has standard deviation " n, where " is the standard deviation of the original random variable Y. Check that this is consistent with what the simulation shows. (Try sample size n = 25) Also note: " is smaller than " (if n is larger than 1) n The larger n is, the smaller " n is. Why is this nice? More Terminology: " is called the population standard deviation of Y; it is not the same as the sample standard deviation s, although s is an estimate of ". The following chart and picture summarize the conclusion of the theorem and related information: Type of Distribution Mean Standard deviation Considering the Considering one Considering all population sample suitable samples!!! Random variable Y (population distribution) Y has a normal distribution Population mean! (! =! 0 if null hypothesis true) Population standard deviation! Related quantity calculated from a sample y 1, y 2,, y n The sample is from the (normal) distribution of Y Sample mean y = (y 1 + y 2 + y n )/n y is an estimate of the population mean! Sample standard deviation s = 1 n "1 n # i=1 (y " y i ) 2 s is an estimate of the population standard deviation! Random variable Y n (sampling distribution) Y has a n normal distribution Mean of the sampling distribution of y -- it s also!. (! =! 0 if null hypothesis true) Sampling distribution standard deviation " n " From the Theorem

10 19 20 The roles of the model assumptions for this hypothesis test (large sample z-test for the mean): Recall: The theorem has three assumptions: Assumption 1: Y has a normal distribution (a model assumption). Assumption 2: All samples considered are simple random samples (also a model assumption). Assumption 3: The null hypothesis is true (assumption for the theorem, but not a model assumption). Which column of the chart corresponds to the blue distribution? Which column of the chart corresponds to the red distribution? How could you tell without looking at the legend? The theorem also has three conclusions: Conclusion 1: The sampling distribution of Y n is normal Conclusion 2: The sampling distribution of Y n has mean! 0 Conclusion 3: The sampling distribution of Y n has standard deviation ", where " is the standard deviation of the n original random variable Y. The following chart shows which assumptions each conclusion depends upon:

11 21 22 Conclusions about Sampling Distribution (Distribution of Y n )! 1: Normal Assumptions of Theorem! 2: Simple random samples (model assumption) 1. Y normal (model assumption)!! 3: Null hypothesis true (not a model assumption) 2: Mean! 0! 3: Standard deviation " n " Corresponds to third column in table on p. 18 Note that the model assumption that the sample is a simple random sample (in particular, that the Y i s as defined earlier are independent) is used to prove: 1. that the sampling distribution is normal and! Consequences (More detail later): 1. If the conclusion of the theorem is true, the sampling distribution of Y n is narrower than the original distribution of Y In fact, conclusion 3 of the theorem gives us an idea of just how narrow it is, depending on n. This will allow us to construct a useful hypothesis test. 2. The only way we know the conclusion is true is if we know the hypotheses of the theorem (the model assumptions and the null hypothesis) are true. 3. Thus: If the model assumptions are not true, then we do not know that the theorem is true, so we do not know that the hypothesis test is valid. In the example (large sample z-test for a mean), this translates to: If the sample is not a simple random sample, or if the random variable is not normal, then the reasoning establishing the validity of the test breaks down. 2. (even more importantly) that the standard deviation of the sampling distribution is " n. This illustrates a general phenomenon that independence conditions are usually very important in statistical inference.

12 23 24 QUIZ: Would this test be reasonable to use in the following situations? Why or why not? b. 1. The histogram of values of Y for the sample is a Frequency 20 Frequency Reasonable to use test? Why or why not? s Reasonable to use test? Why or why not? Y is a random variable that can only take on values between 0 and 1. Reasonable to use test? Why or why not? 3. The sample consists of people who responded to a request on a website. Reasonable to use test? Why or why not?

13 25 26 Importance of Model Assumptions in General: Different hypothesis tests have different model assumptions. Some tests apply to random samples that are not simple. For many tests, the model assumptions consist of several assumptions. If any one of these model assumptions is not true, we do not know that the test is valid. (The bad news) Robustness: Many techniques are robust to some departures from at least some model assumptions. (The good news) Robustness means that if the particular assumption is not too far from true, then the technique is still approximately valid. For example, the large sample z-test for the mean is somewhat robust to departures from normality. In particular, for large enough sample sizes, the test is very close to accurate. o Unfortunately, how large is large enough depends on the distribution of the random variable Y. (More bad news) Robustness depends on the particular procedure; there are no "one size fits all" rules. o Unfortunately, I have not been able to find easily available compilations of robustness results for a variety of procedures. o Unfortunately, some literature on robustness is incorrect. Caution re terminology: Robust is used in other ways for example, a finding is robust could be used to say that the finding appears to be true in a wide variety of situations, or that it has been established in several ways.

14 A very common mistake in using statistics: Using a hypothesis test without paying attention to whether or not the model assumptions are true and whether or not the technique is robust to possible departures from model assumptions is a. Unfortunately, the process of peer review often does not catch these and other statistical mistakes. This is one of many reasons why the results published in peer-reviewed journals are often false or not well substantiated. Compounding the problem, journals rarely retract articles with mistakes. For examples and discussion, see o Allison et al, A Tragedy of Errors, Nature 530, 4 February 2016, 28 29, o Kamoun, S. and C. Zipfel, Scientific record: Class uncorrected errors as misconduct, Nature 531, 10 March 2016, 173, e.html o Gelman and commenters, Resources for finding (at least some) retractions and other reports of errors include o PubPeer (https://pubpeer.com/) o Retraction Watch (http://retractionwatch.com/) o Authors sometimes post errors or corrections on their own website. o biorxiv (http://biorxiv.org/) is a biology preprint server that also allows moderated public comments on posted papers. It classifies replication studies as Confirmatory or Contradictory. 27 IV: FREQUENTIST CONFIDENCE INTERVALS Before continuing the discussion of hypothesis tests, it will be helpful first to discuss the related concept of confidence intervals. The General Situation: We re considering a random variable Y. We re interested in a certain parameter (e.g., a proportion, or mean, or regression coefficient, or variance) associated with the random variable Y (i.e., associated with the population) We don t know the value of the parameter. Goal 1: We d like to estimate the unknown parameter, using data from a sample. (A point estimate) But since estimates are always uncertain, we also need: Goal 2: We d like to get some sense of how good our estimate is. (Typically achieved by an interval estimate) The first goal is usually easier than the second. Example: If the parameter we re interested in estimating is the mean of the random variable (i.e., the population mean, which we call ), we can estimate it using a sample mean (which we call ). 28

15 The rough idea for achieving the second goal (getting some sense of how good our estimate is): We d like to get a range of plausible values for the unknown parameter. ( What we want. ) We will call this range of plausible values a confidence interval. The usual method for calculating a confidence interval has a lot in common with hypothesis testing: It involves the sampling distribution It depends on model assumptions. A little more specifically: Although we typically have just one sample at hand when we do statistics, the reasoning used in classical frequentist inference depends on thinking about all possible suitable samples of the same size n. Which samples are considered "suitable" will depend on the particular statistical procedure to be used. Each confidence interval procedure has model assumptions that are needed to ensure that the reasoning behind the procedure is sound. The model assumptions determine (among other things) which samples are "suitable." The procedure is applicable only to suitable samples. 29 Illustration: Large Sample z-procedure for a Confidence Interval for a Mean The parameter we want to estimate is the population mean µ = E(Y) of the random variable Y. The model assumptions for this procedure are: The random variable is normal, and samples are simple random samples. o o So in this case, "suitable sample" means simple random sample. For this procedure, we also need to know that Y is normal, so that both model assumptions are satisfied. Notation and terminology: We ll use " to denote the (population) standard deviation of Y. We have a simple random sample, say of size n, consisting of observations y 1, y 2,..., y n. o For example, if Y is "height of an adult American male," we take a simple random sample of n adult American males; y 1, y 2,..., y n are their heights. We use the sample mean y = (y 1 + y y n )/n as our estimate of the population mean µ. o This is an example of a point estimate -- a numerical estimate with no indication of how good the estimate is. 30

16 More details: To get an idea of how good our estimate is, we use the concept of confidence interval. o This is an example of an interval estimate. To understand the concept of confidence interval in detail, we need to consider all possible simple random samples of size n from Y. o In the specific example, we consider all possible simple random samples of n adult American males. o For each such sample, the heights of the men in the sample of people constitute our simple random sample of size n from Y. We consider the sample means y for all possible simple random samples of size n from Y. o This amounts to defining a new random variable, which we will call Y n (read Y-bar sub n, or Y-sub-n- bar ). o We can describe the random variable Y n briefly as "sample mean of a simple random sample of size n from Y", or more explicitly as: "pick a simple random sample of size n from Y and calculate its sample mean". o Note that each value of Y n is an estimate of the population mean µ. # e.g. each simple random sample of n adult American males gives us an estimate of the population mean µ. 31 This new random variable Y n has a distribution, called a sampling distribution (since it arises from considering varying samples). o o o o o The values of Y n are all the possible values of sample means y of simple random samples of size n of Y i.e, the values of our estimates of µ. The sampling distribution (distribution of Y n ) gives us information about the variability (as samples vary) of our estimates of the population mean µ. A mathematical theorem tells us that if the model assumptions are true, then: 1. The sampling distribution is normal 2. The mean of the sampling distribution is also µ. 3. The sampling distribution has standard deviation " n Use these conclusions to compare and contrast the shapes of the distribution of Y and the distribution of Y n # What is the same? # What is different? # How do the standard deviations compare? The chart (best read one column at a time) and picture below summarize some of this information. 32

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem Ernie Croot September 17, 2008 On more than one occasion I have heard the comment Probability does not exist in the real world, and

### Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

### Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation

Understanding Confidence Intervals and Hypothesis Testing Using Excel Data Table Simulation Leslie Chandrakantha lchandra@jjay.cuny.edu Department of Mathematics & Computer Science John Jay College of

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

### Parametric and Nonparametric: Demystifying the Terms

Parametric and Nonparametric: Demystifying the Terms By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences Research who provides consultations through the Mayo Clinic CTSA BERD

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

### Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

### Introduction to Hypothesis Testing

I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

### Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................

### p ˆ (sample mean and sample

Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics

### A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### The Variability of P-Values. Summary

The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 boos@stat.ncsu.edu August 15, 2009 NC State Statistics Departement Tech Report

### WISE Power Tutorial All Exercises

ame Date Class WISE Power Tutorial All Exercises Power: The B.E.A.. Mnemonic Four interrelated features of power can be summarized using BEA B Beta Error (Power = 1 Beta Error): Beta error (or Type II

### Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

### Book Review of Rosenhouse, The Monty Hall Problem. Leslie Burkholder 1

Book Review of Rosenhouse, The Monty Hall Problem Leslie Burkholder 1 The Monty Hall Problem, Jason Rosenhouse, New York, Oxford University Press, 2009, xii, 195 pp, US \$24.95, ISBN 978-0-19-5#6789-8 (Source

### Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Econometrics and Data Analysis I

Econometrics and Data Analysis I Yale University ECON S131 (ONLINE) Summer Session A, 2014 June 2 July 4 Instructor: Doug McKee (douglas.mckee@yale.edu) Teaching Fellow: Yu Liu (dav.yu.liu@yale.edu) Classroom:

### 8. THE NORMAL DISTRIBUTION

8. THE NORMAL DISTRIBUTION The normal distribution with mean μ and variance σ 2 has the following density function: The normal distribution is sometimes called a Gaussian Distribution, after its inventor,

### Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes Simcha Pollack, Ph.D. St. John s University Tobin College of Business Queens, NY, 11439 pollacks@stjohns.edu

### Basic Concepts in Research and Data Analysis

Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the

### 4 G: Identify, analyze, and synthesize relevant external resources to pose or solve problems. 4 D: Interpret results in the context of a situation.

MAT.HS.PT.4.TUITN.A.298 Sample Item ID: MAT.HS.PT.4.TUITN.A.298 Title: College Tuition Grade: HS Primary Claim: Claim 4: Modeling and Data Analysis Students can analyze complex, real-world scenarios and

### STA 371G: Statistics and Modeling

STA 371G: Statistics and Modeling Decision Making Under Uncertainty: Probability, Betting Odds and Bayes Theorem Mingyuan Zhou McCombs School of Business The University of Texas at Austin http://mingyuanzhou.github.io/sta371g

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Lab 11. Simulations. The Concept

Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that

### Name: Date: Use the following to answer questions 3-4:

Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

### MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS

MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS Instructor: Mark Schilling Email: mark.schilling@csun.edu (Note: If your CSUN email address is not one you use regularly, be sure to set up automatic

### E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Bayesian Tutorial (Sheet Updated 20 March)

Bayesian Tutorial (Sheet Updated 20 March) Practice Questions (for discussing in Class) Week starting 21 March 2016 1. What is the probability that the total of two dice will be greater than 8, given that

### 17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

### LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as

### Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology Madras

Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology Madras Lecture - 41 Value of Information In this lecture, we look at the Value

### Simple Random Sampling

Source: Frerichs, R.R. Rapid Surveys (unpublished), 2008. NOT FOR COMMERCIAL DISTRIBUTION 3 Simple Random Sampling 3.1 INTRODUCTION Everyone mentions simple random sampling, but few use this method for

### Chi Square Tests. Chapter 10. 10.1 Introduction

Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

### Cell Phone Impairment?

Cell Phone Impairment? Overview of Lesson This lesson is based upon data collected by researchers at the University of Utah (Strayer and Johnston, 2001). The researchers asked student volunteers (subjects)

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Thesis Proposal Template/Outline 1. Thesis Proposal Template/Outline. Abstract

Thesis Proposal Template/Outline 1 Thesis Proposal Template/Outline Abstract The main purpose of a thesis proposal is to demonstrate that the author: has studied and knows the current status of work in

### AMS 5 Statistics. Instructor: Bruno Mendes mendes@ams.ucsc.edu, Office 141 Baskin Engineering. July 11, 2008

AMS 5 Statistics Instructor: Bruno Mendes mendes@ams.ucsc.edu, Office 141 Baskin Engineering July 11, 2008 Course contents and objectives Our main goal is to help a student develop a feeling for experimental

### Teaching Business Statistics through Problem Solving

Teaching Business Statistics through Problem Solving David M. Levine, Baruch College, CUNY with David F. Stephan, Two Bridges Instructional Technology CONTACT: davidlevine@davidlevinestatistics.com Typical

### Building Qualtrics Surveys for EFS & ALC Course Evaluations: Step by Step Instructions

Building Qualtrics Surveys for EFS & ALC Course Evaluations: Step by Step Instructions Jennifer DeSantis August 28, 2013 A relatively quick guide with detailed explanations of each step. It s recommended

### The Math. P (x) = 5! = 1 2 3 4 5 = 120.

The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct

EC151.02 Statistics for Business and Economics (MWF 8:00-8:50) Instructor: Chiu Yu Ko Office: 462D, 21 Campenalla Way Phone: 2-6093 Email: kocb@bc.edu Office Hours: by appointment Description This course

### Chapter 23 Inferences About Means

Chapter 23 Inferences About Means Chapter 23 - Inferences About Means 391 Chapter 23 Solutions to Class Examples 1. See Class Example 1. 2. We want to know if the mean battery lifespan exceeds the 300-minute

### Introduction to time series analysis

Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

Name: University of Chicago Graduate School of Business Business 41000: Business Statistics Special Notes: 1. This is a closed-book exam. You may use an 8 11 piece of paper for the formulas. 2. Throughout

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### Betting on Volatility: A Delta Hedging Approach. Liang Zhong

Betting on Volatility: A Delta Hedging Approach Liang Zhong Department of Mathematics, KTH, Stockholm, Sweden April, 211 Abstract In the financial market, investors prefer to estimate the stock price

### MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

### MATH 140 Lab 4: Probability and the Standard Normal Distribution

MATH 140 Lab 4: Probability and the Standard Normal Distribution Problem 1. Flipping a Coin Problem In this problem, we want to simualte the process of flipping a fair coin 1000 times. Note that the outcomes

### Predicting Defaults of Loans using Lending Club s Loan Data

Predicting Defaults of Loans using Lending Club s Loan Data Oleh Dubno Fall 2014 General Assembly Data Science Link to my Developer Notebook (ipynb) - http://nbviewer.ipython.org/gist/odubno/0b767a47f75adb382246

### 6th Grade Lesson Plan: Probably Probability

6th Grade Lesson Plan: Probably Probability Overview This series of lessons was designed to meet the needs of gifted children for extension beyond the standard curriculum with the greatest ease of use

### Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust

### OUTLIER ANALYSIS. Data Mining 1

OUTLIER ANALYSIS Data Mining 1 What Are Outliers? Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Ex.: Unusual credit card purchase,

### In the past, the increase in the price of gasoline could be attributed to major national or global

Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null

### Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

### Statistical Rules of Thumb

Statistical Rules of Thumb Second Edition Gerald van Belle University of Washington Department of Biostatistics and Department of Environmental and Occupational Health Sciences Seattle, WA WILEY AJOHN

### Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

Lecture 4 Nancy Pfenning Stats 000 Chapter 7: Probability Last time we established some basic definitions and rules of probability: Rule : P (A C ) = P (A). Rule 2: In general, the probability of one event

### INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS

INTERNATIONAL STANDARD ON AUDITING 530 AUDIT SAMPLING AND OTHER MEANS OF TESTING (Effective for audits of financial statements for periods beginning on or after December 15, 2004) CONTENTS Paragraph Introduction...

### Fixed-Effect Versus Random-Effects Models

CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

### One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

### Dr. Lisa White lwhite@sfsu.edu

Dr. Lisa White lwhite@sfsu.edu edu Associate Dean College of Science and Engineering San Francisco State University Purpose of a Poster To communicate/publicize to others your research/experiment results

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

MONT 07N Understanding Randomness Solutions For Final Examination May, 00 Short Answer (a) (0) How are the EV and SE for the sum of n draws with replacement from a box computed? Solution: The EV is n times

### STATISTICAL ANALYSIS AND INTERPRETATION OF DATA COMMONLY USED IN EMPLOYMENT LAW LITIGATION

STATISTICAL ANALYSIS AND INTERPRETATION OF DATA COMMONLY USED IN EMPLOYMENT LAW LITIGATION C. Paul Wazzan Kenneth D. Sulzer ABSTRACT In employment law litigation, statistical analysis of data from surveys,

### Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Internal Quality Assurance Arrangements

National Commission for Academic Accreditation & Assessment Handbook for Quality Assurance and Accreditation in Saudi Arabia PART 2 Internal Quality Assurance Arrangements Version 2.0 Internal Quality

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Mind on Statistics. Chapter 12

Mind on Statistics Chapter 12 Sections 12.1 Questions 1 to 6: For each statement, determine if the statement is a typical null hypothesis (H 0 ) or alternative hypothesis (H a ). 1. There is no difference

### Introduction to Statistics and Quantitative Research Methods

Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

### Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

ECS20 Discrete Mathematics Quarter: Spring 2007 Instructor: John Steinberger Assistant: Sophie Engle (prepared by Sophie Engle) Homework 8 Hints Due Wednesday June 6 th 2007 Section 6.1 #16 What is the

### Example G Cost of construction of nuclear power plants

1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor

### Quantitative Methods for Finance

Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

Chapter 11 Testing Hypotheses About Proportions Hypothesis testing method: uses data from a sample to judge whether or not a statement about a population may be true. Steps in Any Hypothesis Test 1. Determine

### PERSONAL LEARNING PLAN- STUDENT GUIDE

PERSONAL LEARNING PLAN- STUDENT GUIDE TABLE OF CONTENTS SECTION 1: GETTING STARTED WITH PERSONAL LEARNING STEP 1: REGISTERING FOR CONNECT P.2 STEP 2: LOCATING AND ACCESSING YOUR PERSONAL LEARNING ASSIGNMENT

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 3 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected

### Data Analysis, Statistics, and Probability

Chapter 6 Data Analysis, Statistics, and Probability Content Strand Description Questions in this content strand assessed students skills in collecting, organizing, reading, representing, and interpreting

### Online 12 - Sections 9.1 and 9.2-Doug Ensley

Student: Date: Instructor: Doug Ensley Course: MAT117 01 Applied Statistics - Ensley Assignment: Online 12 - Sections 9.1 and 9.2 1. Does a P-value of 0.001 give strong evidence or not especially strong

### CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance,

### Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................