Lecture Notes Module 1


 Gilbert Jenkins
 2 years ago
 Views:
Transcription
1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific group of people. Some examples of human study populations are: all UCSC freshman, all Arizona public school teachers, all spouses of Alzheimer s patients in Minnesota, and all preschool children in Chicago. Measurement Properties In addition to specifying the study population of interest, a psychologist will specify some attribute to measure. When studying human populations, the attribute of interest might be a specific type of academic ability, a personality trait, some particular behavior (e.g., hours of TV watching), an attitude, an interest, an opinion, or a physiological measure (e.g., heart rate, blood pressure, blood flow in specific parts of the brain, brain wave). The measurement of the attribute that the psychologist wants to examine is called the response variable. To measure some attribute of a person s behavior is to assign a numerical value to that person. These measurements can have different properties. A ratio scale measurement has the following three properties: 1) a score of 0 represents a complete absence of the attribute being measured, 2) a ratio of any two scores correctly describes the ratio of attribute quantities, and 3) a difference between two scores correctly describes the difference in attribute quantities. Suppose a person s heart rate is measured. This measurement is a ratio scale measurement because a score of 0 beats per minute (bmp) represents a stopped heart and a heart rate of, say, 100 bpm is twice as fast as a heart rate of 50 bpm. In addition, the difference between two heart rates of, say, 50 and 60 bmp describes the same change in heart rate as the difference between 70 and 80 bpm. With interval scale measurements, a score of 0 does not represent a complete absence of the attribute being measured and a ratio of two scores does not correctly describe the ratio of attribute quantities, but a difference between two interval scale scores correctly describes the difference in attribute quantities. Most measurements of psychological attributes are not ratio scale measurements but are assumed to be interval scale attributes. For instance, the Beck Depression Inventory (BDI) is scored on a 0 63 scale with higher scores representing higher levels of depression. However, a BDI score of 0 does not indicate a complete absence of depression nor does a BDI score of, say, 40 represent twice the amount 1
2 of depression as a BDI score of 20. It is assumed that a difference between two BDI scores correctly describes the difference in depression levels so that a person who initially obtained a BDI score of, say, 30 and then obtained a score 20 after therapy is assumed to have the same level of improvement as a person who initially scored 25 on the BDI and dropped to 15 after therapy. Ratio and interval scale measurements will be referred to simply as quantitative scores. Population Parameters A population parameter is a single unknown numeric value that describes the measurements that could have been assigned to all N people in a specific study population. Psychologists would like to know the value of a particular population parameter because this information could be used to make an important decision or to advance knowledge in some area of research. The population mean, denoted by the Greek letter μ (mu), is a population parameter that is frequently of interest to psychologists. Imagine every person in a study population of size N being assigned a quantitative score. A population mean (μ) is defined as N μ = i=1 y i /N (1.1) where y i is a quantitative score for the i th person in the study population. The N summation notation i=1 y i is a more compact way of writing y 1 + y y N. Consider a study population of 2,450 elementary school teachers in a particular school district. Imagine giving a job burnout questionnaire (scored on a quantitative scale of 1 to 25) to all 2,450 teachers. The population mean job burnout score would be μ = (y1 + y2 + + y2450)/2450 where y i is the burnout score for the i th teacher. Another important population parameter is the population standard deviation which is defined as σ = N i=1 (y i μ) 2 /N and describes the variability of the quantitative measurements. Note that σ cannot be negative. Note also that if all N scores are identical (no variability), every y i value would equal μ and then σ would be zero. The squared standard deviation (σ 2 ) occurs frequently in statistical formulas and is called the variance. 2
3 Normal (Gaussian) Curve A histogram is a graph that visually describes a set of quantitative scores. A histogram is constructed by specifying several equallength score intervals and counting the number of people who have scores that fall within each interval. An example of a histogram of scores on the Attention Deficit Checklist (ADC) for 4,810 young adults is shown below. ADC Scores Frequency Mean =10.00 Std. Dev. = N =4,810 y Scientists discovered decades ago that histograms for many different types of quantitative scores could be closely approximated by a certain type of symmetric bellshaped curve called a normal (or Gaussian) curve. The histogram above includes a graph of a normal curve that closely approximates the shape of the histogram in this particular application. If a set of quantitative scores is approximately normal, the scores will have the following characteristics: about half of the scores are above the mean and about half are below the mean about 68% of the scores are within 1 standard deviation of the mean about 95% of the scores are within 2 standard deviations of the mean almost all (99.7%) of the scores are within 3 standard deviations of the mean 3
4 A normal distribution with a mean of 0 and a standard deviation of 1 is called a standard unit normal distribution. The symbol z α/2 will be used to denote the point on a standard unit normal distribution for which 100(1 α)% of the distribution is between the values z α/2 and z α/2. For instance, it can be shown that 95% of a standard unit normal distribution is between the values and 1.96 and so z α/2 = 1.96 for α =.05. Random Samples and Parameter Estimates In applications where the study population is large or the cost of measurement is high, the psychologist may not have the necessary resources to measure all N people in the study population. In these applications, the psychologist could take a random sample of n people from the study population of N people. In studies where random sampling is used, the study population is defined as the population from which the random sample was obtained. A random sample of size n is selected in such a way that every possible sample of size n will have the same chance of being selected. Computer programs are typically used to obtain a random sample of size n from a study population of size N. A population mean can be estimated from a random sample. The sample mean n μ = i=1 y i /n (1.2) is an estimate of μ (some statistics texts use X to denote the sample mean). A standard deviation can be estimated from a random sample. The sample standard deviation n σ = i=1 (y i μ ) 2 /(n 1) (1.3) is an estimate of σ (some statistics texts use s to denote the sample standard deviation). Squaring Equation 1.3 gives an estimate of the population variance. Of course, psychologists want to know the exact value of μ but they usually must settle for a sample estimate of μ because the study population size is either too large or the measurement process is too costly. However, the sample mean by itself can be misleading because μ μ will be positive or negative and the direction of the error will be unknown. In other words, the psychologist will not know if the sample mean has overestimated or underestimated the population mean. Furthermore, the magnitude of μ μ will be unknown. Thus, the value of the sample mean might be too small or too large, and it might be close to or very different from the value of μ. 4
5 Standard Error The standard error of a parameter estimate numerically describes the accuracy of an estimate. A small value of the standard error indicates that the parameter estimate is likely to be close to the unknown population parameter value, while a large standard error value indicates that the parameter estimate could be very different from the study population parameter value. A standard error of a parameter estimate can be estimated from a random sample. The estimated standard error of μ is SE μ = σ 2/n. (1.4) From Equation 1.4 it is clear that increasing the sample size (n) will decrease the value of the standard error and increase the accuracy of the sample mean. From Equation 1.4, it also can be seen that variability in the quantitative scores affects the accuracy of the estimate of a population mean with greater variability in scores leading to less accuracy in the sample mean. Confidence Interval for μ We can learn something about the unknown value of μ by using information from a random sample. By using an estimate of μ (Equation 1.2) and its standard error (Equation 1.4), which can be obtained from one random sample, it is possible to say something about the unknown value of μ in the form of a confidence interval. A confidence interval is a range of values that is believed to contain an unknown population parameter value with some specified degree of confidence. A 100(1 α)% confidence interval for μ is μ ± t α/2;df SE μ (1.5) where t α/2;df is a twosided critical tvalue. The value of t α/2;df can be found in a table of critical tvalues given in most statistics texts. The symbol df refers to degrees of freedom and is equal to n 1 in this type of application. The value 100(1 α)% is the confidence level. In psychological studies, it is common to set α =.05 to give a 95% confidence level. Example 1.1. The EPA estimates that lead in drinking water is responsible for more than 500,000 new cases of learning disabilities in children each year. Lead contaminated drinking water is most prevalent in homes built before A random sample of n = 10 homes was obtained from a listing of about 240,000 pre1970 homes in the San Francisco 5
6 area. Drinking water from the 10 homes was tested for lead (the test costs about $25 per house). The legal lead concentration limit for drinking water is 15 ppb. The measured lead concentrations (in ppb) for the 10 homes are given below The sample mean, sample variance, and standard error for this sample of 10 homes are computed below. μ = ( )/10 = 24.7 σ 2 = [( ) 2 + ( ) ( ) 2 ]/(10 1) = SE μ = σ 2/n = 144.0/10 = 3.79 With a sample size of 10 homes, df = n 1 = 9 and t.05/2;9 = The 95% lower confidence limit is (3.79) = 16.2 and the upper 95% limit = (3.79) = We can be 95% confident that the mean lead concentration in the drinking water of the 240,000 older homes is between 16.2 ppb and 33.3 ppb. Properties of Confidence Intervals There are two important properties of confidence intervals: increasing the sample size will tend to reduce the width of the confidence interval, and increasing the level of confidence (e.g., from 95% to 99%) will increase the width of the confidence interval. Increasing the level of confidence increases the proportion of all possible samples in which a confidence interval will capture the unknown population parameter value. These properties are illustrated in analysis of 50 different random samples of n = 30 from a study population of about 15,000 nurses who were all given an emotional exhaustion questionnaire and their mean score was In this hypothetical example, we know that μ = 22.5 but in practice we will not be able to measure all members of the study population and we will estimate μ using the information contained in just one random sample. 6
7 The above table displays the results of 95% confidence intervals from 50 different random samples. Note that the 95% confidence intervals for μ failed to capture the actual population mean value of 22.5 in sample 7 and sample 34. The table below displays the results for 99% confidence intervals computed from the same 50 random samples. Note that these confidence intervals are wider (less precise) but all of them have captured the population mean value. Choosing a Confidence Level The American Psychological Association recommends using 95% confidence intervals. A 95% confidence interval represents a good compromise between the level of confidence and the confidence interval width, as shown in the following graph. Notice that the confidence interval width increases almost linearly up to a confidence level of about 95% and then the width increases dramatically with increasing confidence. Thus, small increases in the level of confidence beyond 95% lead to relatively large increases in the confidence interval width CI Width Confidence 7
8 Hypothesis Testing In some applications, the psychologist simply needs to decide if the population parameter is greater than some value or less than some value. If the parameter is greater than some value, then one course of action will be taken; if the parameter is less than some value, then another course of action will be taken. The following notation is used to specify a set of hypotheses regarding μ H0: μ = h H1: μ > h H2: μ < h where h is some number specified by the psychologist and H0 is called the null hypothesis. H1 and H2 are called the alternative hypotheses. In virtually all applications, H0 is known to be false (because it is extremely unlikely that μ will exactly equal h) and the psychologist s goal is to decide if H1 or H2 is true. Consider the following example. If the mean job satisfaction score in a study population of employees is less than 5, then a company will increase yearend bonuses; otherwise, the standard bonus will be given. In this specific application, the set of hypotheses is shown below. H0: μ = 5 H1: μ > 5 H2: μ < 5 A confidence interval for μ can be used to choose between H1: μ > h and H2: μ < h. If the upper limit of a 100(1 α)% confidence interval is less than h, then H0 is rejected and H2 is accepted. If the lower limit of a 100(1 α)% confidence interval is greater than h, then H0 is rejected and H1 is accepted. If the confidence interval includes h, then H0 cannot be rejected. This general hypothesis testing procedure is called a threedecision rule because one of following three decisions will be made: 1) accept H1, 2) accept H2, or 3) fail to reject H0. A failure to reject H0 is called an inconclusive result. A test of H0: μ = h is commonly referred to as a onesample ttest and involves the computation of the test statistic t = (μ h)/se μ. Statistical packages such as SPSS or R will compute the pvalue that corresponds to the value of the test statistic. The pvalue can be used to reject H0: μ = h. Specifically, H0 is rejected if the pvalue is less than α (α is usually set equal to.05). 8
9 The pvalue is related to the sample size with larger sample sizes leading to smaller pvalues. With a sufficiently large sample size, the pvalue for a test of H0: μ = h will be less than.05. It is a common practice to report the results of a statistical test to be significant if the pvalue is less than.05 and nonsignificant if the pvalue is greater than.05. It is important to remember that a pvalue of less than.05 (a significant result) simply indicates that the sample size was large enough to reject the null hypothesis (which is known to be false in virtually all applications) and does not indicate if the population mean is meaningfully different from the hypothesized value. Also, a pvalue greater than.05 does not imply that H0 is true. In a threedecision rule, a directional error occurs when H1: μ > b has been accepted but μ < b is true or when H2 : μ < b has been accepted but μ > b is true. The probability of making a directional error is at most α/2. For instance, if a 95% confidence interval is used to select H1 or H2, the probability of making a directional error is at most.025. Most social science journals require authors to use α =.05. Power of a Hypothesis Test In hypothesis testing applications, the goal is to reject H0: μ = h and then choose either H1: μ > h or H2: μ < h. The power of a test is the probability of rejecting H0. If the power of the test is low, then the probability of an inconclusive result will be high. The power of a test of H0: μ = h depends on the sample size, the absolute value of (μ h)/σ (the standardized effect size), and the α level. Increasing the sample size will increase the power of the test as illustrated below for α =.05 and (μ h)/σ =
10 Decreasing α will reduce the probability of a directional error but will also decrease the power of the test as illustrated in the graph below for n = 30 and (μ h)/σ = 0.5. Note that there is little loss in power for reductions in α down to about.05 with power decreasing more dramatically for α values below.05, which is why α =.05 is a recommended value. For a given sample size and α level, the power of the test increases as the absolute value of (μ h)/σ increases, as illustrated in the graph below for n = 30 α =.05. Interpreting a Confidence Interval Consider a 95% confidence interval for μ. If a 95% confidence interval for μ was computed from every possible sample of size n in a given study population, about 95% of these confidence intervals will capture the unknown value of μ. With random sampling, we know that every possible sample of size n has the same 10
11 chance of being selected. Knowing that a 95% confidence interval for μ will capture μ in about 95% of all possible samples, and knowing that the one sample the psychologist has used to compute the 95% confidence interval is a random sample, we can say that the probability is.95 (or we are 95% confident) that the computed confidence interval includes μ. Another way to think about confidence intervals is to consider a test of H0: μ = h for many different values of h. For a given value of α, if H0 is tested for all possible values of h, a 100(1 α)% confidence interval for μ is the set of all values of h for which H0 cannot be rejected. All values of h that are not included in the confidence interval are values for which H0 would have been rejected at the specified α level. For instance, if a 95% confidence interval for μ is [14.2, 18.5], then all tests of H0: μ = h will not reject H0 if h is any value in the range 14.2 to 18.5 but will reject H0 for any value of h that is less than 14.2 or greater than Sample Size Planning A narrow confidence interval for μ is desirable because it provides a more precise and informative description of μ than a wider confidence interval. It is possible to approximate the sample size that will give the desired width (upper limit minus lower limit) of a confidence interval with a desired level of confidence. The sample size needed to obtain a 100(1 α)% confidence interval for having a desired width of w is approximately n = 4σ 2(z α/2 /w) 2 (1.6) ~ 2 where is a planning value of the response variable variance and z α/2 is a twosided critical zvalue. Planning values are obtained from expert opinion, pilot studies, or previously published research. If the maximum and minimum possible values of the response variable scale are known, [(max min)/4] 2 provides a crude planning value of the population variance. Equation 1.6 shows that larger sample sizes are needed with narrower confidence interval widths, greater levels of confidence, and greater variability of the response variable. Round Equation 1.6 up to the nearest integer. Example 1.2. A psychologist wants to estimate the mean job satisfaction score for a population of 4,782 public school teachers. The psychologist plans to use a job satisfaction questionnaire (measured on a 1 to 10 scale) that has been used in previous studies. A review of the literature suggests that the variance of the job satisfaction scale is about 6.0. The psychologist would like the 95% confidence interval for μ (the mean job satisfaction score for all 4,782 teachers) to have a width of about 1.5. The required sample size is approximately n = 4(6.0)(1.96/1.5) 2 =
12 Note that Equation 1.6 does not include the value of the study population size (N). Actually, the sample size requirement does depend on N according to the formula n = n(1 n/n) where n is given by Equation 1.6 and n is the revised sample size requirement. In most applications, n will be a small fraction of N and then n will be about the same as n. For instance, if N = 3,000 and Equation 1.6 gives n = 40, then n = 40(1 40/3000) = Sampling in Two Stages In applications where sample data can be collected in two stages, the confidence interval obtained in the first stage can be used to determine how many more participants should be sampled in the second stage. If the 100(1 α)% confidence interval width from a firststage total sample size of n is w 0, then the number of participants that should be added to the original sample (n + ) in order to obtain a 100(1 α)% confidence interval width of w is approximately n + = [( w 0 w )2 1] n. (1.7) Example 1.3. In a study with 25 participants, the 95% confidence interval for μ had a width of The psychologist suspects that the results of this study are unlikely to be published because the confidence interval is too wide. The psychologist would like to obtain a 95% confidence interval for μ that has a width of 2.0. To achieve this goal, the number of participants that should be added to the initial sample is [(4.38/2.0) 2 1]25 = Target Population The confidence interval for μ (Equation 1.5) provides information about the study population from which the random sample was taken. In most applications, the study population will be a small subset of some larger and more interesting population called the target population. For instance, a psychologist may take a random sample of 100 undergraduate students from a particular university directory consisting of 12,000 student names because the psychologist has easy access to this directory. The results of Equation 1.5 will apply only to those 12,000 undergraduate students, but the psychologist is more interested in the value of μ for a target population of all young adults. It might be possible for the psychologist to make a persuasive argument that the study population mean should be very similar to some target population mean. For instance, suppose the psychologist computed a confidence interval for the mean eye pupil diameter in a small room lit only by a 40watt light bulb using a 12
13 random sample from the 12,000 undergraduate students. The psychologist could argue convincingly that the mean eye pupil diameter in the study population of 12,000 undergraduates should be no different than the mean eye pupil diameter of all young adults. As an example where the study population mean would probably not be similar to some target population mean, suppose that the psychologist instead computed a confidence interval for the mean score on an abortion attitude scale using a sample of students from a Jesuit university. In this situation, the psychologist does not believe that the mean abortion attitude in the Jesuit study population is similar to the mean abortion attitude in a target population of all young adults. Researchers in the physical and biological sciences seldom worry about the distinction between a study population and a target population because the parameter values for many physical or biological attributes (like the eye pupil diameter example) are much less likely to differ across different study populations, and consequently the study population parameter values are almost automatically assumed to generalize to some large target population. In contrast, psychologists who study complex human behavior that can vary considerably across different study populations, need to be very cautious about how they interpret their confidence interval and hypothesis testing results. Psychologists should clearly describe the characteristics of the study population so that the statistical results are interpreted in a proper context. Assumptions for Confidence Intervals and Tests Confidence intervals and hypothesis tests for μ require three assumptions. One assumption, the random sampling assumption, requires the sample to be a random sample from the study population. A second assumption, the independence assumption, requires the responses from each participant in the sample to be independent of one another. In other words, no participant in the study should influence the responses of any other participant in the study. A third assumption, the normality assumption, requires the quantitative scores in the study population have an approximate normal distribution. Confidence intervals and hypothesis tests for μ will be uninterpretable if the random sampling assumption has been violated. If the independence assumption has been violated, the true probability of a directional error can be greater than α/2, and the true confidence level can be less than 100(1 α)%. Recall that the interpretation of a confidence interval for μ assumed that a 100(1 α)% confidence interval would capture the unknown population mean in about 100(1 α)% of all possible samples of a given size. However, when the 13
14 independence assumption is violated, the percent of samples in which a 100(1 α)% confidence interval captures the population parameter can be far less than 100(1 α)% and the psychologist s confidence regarding the computed confidence interval result will be mistakenly too high. Violating the normality assumption will have little effect on the confidence interval and test for μ unless the quantitative scores in the study population are extremely nonnormal and the sample size is small (n < 30). If the sample size is small and the study population quantitative scores are extremely nonnormal, the proportion of all possible 95% confidence intervals that would capture μ can be less than.95, and the psychologist s confidence regarding the computed confidence interval result will be mistakenly too high. Assessing the Normality Assumption Recall that the normal distribution is symmetric. If the quantitative scores in the sample exhibit a clear asymmetry, this would suggest a violation of the normality assumption. The asymmetry in a set of quantitative scores can be described using a coefficient of skewness. The skewness coefficient is equal to zero if the scores are perfectly symmetric, positive if the scores are skewed to the right, and negative if the scores are skewed to the left. SPSS and R provide a test of the null hypothesis that the population skewness coefficient is zero. If the pvalue is less than.05, the psychologist may conclude that the normality assumption has been violated and that the population scores are skewed, but a pvalue greater than.05 does not imply that the normality assumption has been satisfied. The population distribution of quantitative scores can be nonnormal even if the distribution is symmetric. The coefficient of kurtosis describes the degree to which a distribution is more or less peaked or has shorter or thicker tails than a normal distribution. SPSS and R provide a test of the null hypothesis that there is no kurtosis in population distribution of scores. If the pvalue is less than.05, the psychologist may conclude that the normality assumption has been violated and that the population scores have kurtosis, but a pvalue greater than.05 does not imply that the normality assumption has been satisfied. Data Transformations A transformation of the quantitative scores can reduce skewness. When the quantitative score is a frequency count, such as the number of facts that can be recalled or the number of spelling errors in a writing sample, a square root transformation ( y i ) may reduce nonnormality. When the score is a timeto 14
15 event, such as the time required to solve a problem or a reaction time, a natural log transformation (ln(yi)) or a reciprocal transformation (1/yi) may reduce nonnormality. Example 1.4. A histogram of 80 highly skewed scores is shown below (left). A histogram of the logtransformed scores (right) is much more symmetric. Although data transformations may reduce nonnormality, the mean of the transformed scores may then be difficult to interpret. However, in some applications the value of μ could be interpretable after a data transformation. For instance, if the response variable is measured in squared units, such as the brain surface area showing activity measured in squared centimeters, a square root transformation could be interpreted as the size of the activated area. Or if the response variable is reaction time measured in seconds, then a reciprocal transformation could be interpreted as responses per second. 15
How to Conduct a Hypothesis Test
How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some
More information93.4 Likelihood ratio test. NeymanPearson lemma
93.4 Likelihood ratio test NeymanPearson lemma 91 Hypothesis Testing 91.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationModule 5 Hypotheses Tests: Comparing Two Groups
Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this
More informationUnit 29 ChiSquare GoodnessofFit Test
Unit 29 ChiSquare GoodnessofFit Test Objectives: To perform the chisquare hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni
More informationHypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam
Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests
More informationChapter 8 Introduction to Hypothesis Testing
Chapter 8 Student Lecture Notes 81 Chapter 8 Introduction to Hypothesis Testing Fall 26 Fundamentals of Business Statistics 1 Chapter Goals After completing this chapter, you should be able to: Formulate
More informationSampling and Hypothesis Testing
Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus
More informationSampling Distributions and the Central Limit Theorem
135 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 10 Sampling Distributions and the Central Limit Theorem In the previous chapter we explained
More informationNCSS Statistical Software. OneSample TTest
Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationConfidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
More informationBiodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D.
Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D. In biological science, investigators often collect biological
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More information5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
More informationNull Hypothesis H 0. The null hypothesis (denoted by H 0
Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property
More informationMAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters
MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters Inferences about a population parameter can be made using sample statistics for
More informationTRANSCRIPT: In this lecture, we will talk about both theoretical and applied concepts related to hypothesis testing.
This is Dr. Chumney. The focus of this lecture is hypothesis testing both what it is, how hypothesis tests are used, and how to conduct hypothesis tests. 1 In this lecture, we will talk about both theoretical
More informationStatistical Inference and ttests
1 Statistical Inference and ttests Objectives Evaluate the difference between a sample mean and a target value using a onesample ttest. Evaluate the difference between a sample mean and a target value
More informationHypothesis Testing or How to Decide to Decide Edpsy 580
Hypothesis Testing or How to Decide to Decide Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at UrbanaChampaign Hypothesis Testing or How to Decide to Decide
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationTwoSample TTests Assuming Equal Variance (Enter Means)
Chapter 4 TwoSample TTests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one or twosided twosample ttests when the variances of
More informationHYPOTHESIS TESTING WITH SPSS:
HYPOTHESIS TESTING WITH SPSS: A NONSTATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER
More informationBiostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
More informationComparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples
Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The
More informationUnit 26 Estimation with Confidence Intervals
Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference
More informationStandard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
More informationIntroduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses
Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the
More informationStat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015
Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a tdistribution as an approximation
More informationAP Statistics 1998 Scoring Guidelines
AP Statistics 1998 Scoring Guidelines These materials are intended for noncommercial use by AP teachers for course and exam preparation; permission for any other use must be sought from the Advanced Placement
More information1 SAMPLE SIGN TEST. NonParametric Univariate Tests: 1 Sample Sign Test 1. A nonparametric equivalent of the 1 SAMPLE TTEST.
NonParametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A nonparametric equivalent of the 1 SAMPLE TTEST. ASSUMPTIONS: Data is nonnormally distributed, even after log transforming.
More informationChapter Additional: Standard Deviation and Chi Square
Chapter Additional: Standard Deviation and Chi Square Chapter Outline: 6.4 Confidence Intervals for the Standard Deviation 7.5 Hypothesis testing for Standard Deviation Section 6.4 Objectives Interpret
More informationChapter 8. Hypothesis Testing
Chapter 8 Hypothesis Testing Hypothesis In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing
More informationIntroduction to Statistics for Computer Science Projects
Introduction Introduction to Statistics for Computer Science Projects Peter Coxhead Whole modules are devoted to statistics and related topics in many degree programmes, so in this short session all I
More informationHypothesis testing S2
Basic medical statistics for clinical and experimental research Hypothesis testing S2 Katarzyna Jóźwiak k.jozwiak@nki.nl 2nd November 2015 1/43 Introduction Point estimation: use a sample statistic to
More informationCalculating PValues. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating PValues Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating PValues" (2014). A with Honors Projects.
More informationUnit 24 Hypothesis Tests about Means
Unit 24 Hypothesis Tests about Means Objectives: To recognize the difference between a paired t test and a twosample t test To perform a paired t test To perform a twosample t test A measure of the amount
More informationChapter 7. Estimates and Sample Size
Chapter 7. Estimates and Sample Size Chapter Problem: How do we interpret a poll about global warming? Pew Research Center Poll: From what you ve read and heard, is there a solid evidence that the average
More information13 TwoSample T Tests
www.ck12.org CHAPTER 13 TwoSample T Tests Chapter Outline 13.1 TESTING A HYPOTHESIS FOR DEPENDENT AND INDEPENDENT SAMPLES 270 www.ck12.org Chapter 13. TwoSample T Tests 13.1 Testing a Hypothesis for
More informationStatistics for Management IISTAT 362Final Review
Statistics for Management IISTAT 362Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to
More informationInference for two Population Means
Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example
More informationChapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
More informationfind confidence interval for a population mean when the population standard deviation is KNOWN Understand the new distribution the tdistribution
Section 8.3 1 Estimating a Population Mean Topics find confidence interval for a population mean when the population standard deviation is KNOWN find confidence interval for a population mean when the
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationIntroduction to Stata
Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the midrange of how easy it is to use. Other options include SPSS,
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 14)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 14) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationA POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment
More informationHypothesis Testing Summary
Hypothesis Testing Summary Hypothesis testing begins with the drawing of a sample and calculating its characteristics (aka, statistics ). A statistical test (a specific form of a hypothesis test) is an
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationUnit 21 Student s t Distribution in Hypotheses Testing
Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between
More informationMAT X Hypothesis Testing  Part I
MAT 2379 3X Hypothesis Testing  Part I Definition : A hypothesis is a conjecture concerning a value of a population parameter (or the shape of the population). The hypothesis will be tested by evaluating
More information3.4 Statistical inference for 2 populations based on two samples
3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationWeek 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
More informationChapter 16 Multiple Choice Questions (The answers are provided after the last question.)
Chapter 16 Multiple Choice Questions (The answers are provided after the last question.) 1. Which of the following symbols represents a population parameter? a. SD b. σ c. r d. 0 2. If you drew all possible
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationMBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 111) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
More informationRegression in SPSS. Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology
Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of
More informationChapter 7 Part 2. Hypothesis testing Power
Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 15 scale to 0100 scores When you look at your report, you will notice that the scores are reported on a 0100 scale, even though respondents
More informationWhen σ Is Known: Recall the Mystery Mean Activity where x bar = 240.79 and we have an SRS of size 16
8.3 ESTIMATING A POPULATION MEAN When σ Is Known: Recall the Mystery Mean Activity where x bar = 240.79 and we have an SRS of size 16 Task was to estimate the mean when we know that the situation is Normal
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationAP Statistics 2002 Scoring Guidelines
AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought
More informationTwoSample TTest from Means and SD s
Chapter 07 TwoSample TTest from Means and SD s Introduction This procedure computes the twosample ttest and several other twosample tests directly from the mean, standard deviation, and sample size.
More informationSample Size Determination
Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)
More informationTwoSample TTests Allowing Unequal Variance (Enter Difference)
Chapter 45 TwoSample TTests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one or twosided twosample ttests when no assumption
More information3. Nonparametric methods
3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More information99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm
Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the
More informationIndependent t Test (Comparing Two Means)
Independent t Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent ttest when to use the independent ttest the use of SPSS to complete an independent
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special DistributionsVI Today, I am going to introduce
More informationStandard Deviation Calculator
CSS.com Chapter 35 Standard Deviation Calculator Introduction The is a tool to calculate the standard deviation from the data, the standard error, the range, percentiles, the COV, confidence limits, or
More informationLecture 1: t tests and CLT
Lecture 1: t tests and CLT http://www.stats.ox.ac.uk/ winkel/phs.html Dr Matthias Winkel 1 Outline I. z test for unknown population mean  review II. Limitations of the z test III. t test for unknown population
More informationPower & Effect Size power Effect Size
Power & Effect Size Until recently, researchers were primarily concerned with controlling Type I errors (i.e. finding a difference when one does not truly exist). Although it is important to make sure
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationBA 275 Review Problems  Week 6 (10/30/0611/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394398, 404408, 410420
BA 275 Review Problems  Week 6 (10/30/0611/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394398, 404408, 410420 1. Which of the following will increase the value of the power in a statistical test
More informationBIOSTATISTICS QUIZ ANSWERS
BIOSTATISTICS QUIZ ANSWERS 1. When you read scientific literature, do you know whether the statistical tests that were used were appropriate and why they were used? a. Always b. Mostly c. Rarely d. Never
More informationHypothesis Testing. Concept of Hypothesis Testing
Quantitative Methods 2013 Hypothesis Testing with One Sample 1 Concept of Hypothesis Testing Testing Hypotheses is another way to deal with the problem of making a statement about an unknown population
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationUNDERSTANDING THE DEPENDENTSAMPLES t TEST
UNDERSTANDING THE DEPENDENTSAMPLES t TEST A dependentsamples t test (a.k.a. matched or pairedsamples, matchedpairs, samples, or subjects, simple repeatedmeasures or withingroups, or correlated groups)
More informationChapter 9, Part A Hypothesis Tests. Learning objectives
Chapter 9, Part A Hypothesis Tests Slide 1 Learning objectives 1. Understand how to develop Null and Alternative Hypotheses 2. Understand Type I and Type II Errors 3. Able to do hypothesis test about population
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationWe will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:
MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having
More informationCHISQUARE: TESTING FOR GOODNESS OF FIT
CHISQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationOneSample ttest. Example 1: Mortgage Process Time. Problem. Data set. Data collection. Tools
OneSample ttest Example 1: Mortgage Process Time Problem A faster loan processing time produces higher productivity and greater customer satisfaction. A financial services institution wants to establish
More informationStatistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
More informationStatistical Inference
Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this
More informationUNDERSTANDING THE INDEPENDENTSAMPLES t TEST
UNDERSTANDING The independentsamples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
More informationPaired TTest. Chapter 208. Introduction. Technical Details. Research Questions
Chapter 208 Introduction This procedure provides several reports for making inference about the difference between two population means based on a paired sample. These reports include confidence intervals
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationData Analysis: Describing Data  Descriptive Statistics
WHAT IT IS Return to Table of ontents Descriptive statistics include the numbers, tables, charts, and graphs used to describe, organize, summarize, and present raw data. Descriptive statistics are most
More informationHomework 6 Solutions
Math 17, Section 2 Spring 2011 Assignment Chapter 20: 12, 14, 20, 24, 34 Chapter 21: 2, 8, 14, 16, 18 Chapter 20 20.12] Got Milk? The student made a number of mistakes here: Homework 6 Solutions 1. Null
More informationThe basics of probability theory. Distribution of variables, some important distributions
The basics of probability theory. Distribution of variables, some important distributions 1 Random experiment The outcome is not determined uniquely by the considered conditions. For example, tossing a
More informationWater Quality Problem. Hypothesis Testing of Means. Water Quality Example. Water Quality Example. Water quality example. Water Quality Example
Water Quality Problem Hypothesis Testing of Means Dr. Tom Ilvento FREC 408 Suppose I am concerned about the quality of drinking water for people who use wells in a particular geographic area I will test
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationExtending Hypothesis Testing. pvalues & confidence intervals
Extending Hypothesis Testing pvalues & confidence intervals So far: how to state a question in the form of two hypotheses (null and alternative), how to assess the data, how to answer the question by
More information