Lecture Notes Module 1


 Gilbert Jenkins
 7 years ago
 Views:
Transcription
1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific group of people. Some examples of human study populations are: all UCSC freshman, all Arizona public school teachers, all spouses of Alzheimer s patients in Minnesota, and all preschool children in Chicago. Measurement Properties In addition to specifying the study population of interest, a psychologist will specify some attribute to measure. When studying human populations, the attribute of interest might be a specific type of academic ability, a personality trait, some particular behavior (e.g., hours of TV watching), an attitude, an interest, an opinion, or a physiological measure (e.g., heart rate, blood pressure, blood flow in specific parts of the brain, brain wave). The measurement of the attribute that the psychologist wants to examine is called the response variable. To measure some attribute of a person s behavior is to assign a numerical value to that person. These measurements can have different properties. A ratio scale measurement has the following three properties: 1) a score of 0 represents a complete absence of the attribute being measured, 2) a ratio of any two scores correctly describes the ratio of attribute quantities, and 3) a difference between two scores correctly describes the difference in attribute quantities. Suppose a person s heart rate is measured. This measurement is a ratio scale measurement because a score of 0 beats per minute (bmp) represents a stopped heart and a heart rate of, say, 100 bpm is twice as fast as a heart rate of 50 bpm. In addition, the difference between two heart rates of, say, 50 and 60 bmp describes the same change in heart rate as the difference between 70 and 80 bpm. With interval scale measurements, a score of 0 does not represent a complete absence of the attribute being measured and a ratio of two scores does not correctly describe the ratio of attribute quantities, but a difference between two interval scale scores correctly describes the difference in attribute quantities. Most measurements of psychological attributes are not ratio scale measurements but are assumed to be interval scale attributes. For instance, the Beck Depression Inventory (BDI) is scored on a 0 63 scale with higher scores representing higher levels of depression. However, a BDI score of 0 does not indicate a complete absence of depression nor does a BDI score of, say, 40 represent twice the amount 1
2 of depression as a BDI score of 20. It is assumed that a difference between two BDI scores correctly describes the difference in depression levels so that a person who initially obtained a BDI score of, say, 30 and then obtained a score 20 after therapy is assumed to have the same level of improvement as a person who initially scored 25 on the BDI and dropped to 15 after therapy. Ratio and interval scale measurements will be referred to simply as quantitative scores. Population Parameters A population parameter is a single unknown numeric value that describes the measurements that could have been assigned to all N people in a specific study population. Psychologists would like to know the value of a particular population parameter because this information could be used to make an important decision or to advance knowledge in some area of research. The population mean, denoted by the Greek letter μ (mu), is a population parameter that is frequently of interest to psychologists. Imagine every person in a study population of size N being assigned a quantitative score. A population mean (μ) is defined as N μ = i=1 y i /N (1.1) where y i is a quantitative score for the i th person in the study population. The N summation notation i=1 y i is a more compact way of writing y 1 + y y N. Consider a study population of 2,450 elementary school teachers in a particular school district. Imagine giving a job burnout questionnaire (scored on a quantitative scale of 1 to 25) to all 2,450 teachers. The population mean job burnout score would be μ = (y1 + y2 + + y2450)/2450 where y i is the burnout score for the i th teacher. Another important population parameter is the population standard deviation which is defined as σ = N i=1 (y i μ) 2 /N and describes the variability of the quantitative measurements. Note that σ cannot be negative. Note also that if all N scores are identical (no variability), every y i value would equal μ and then σ would be zero. The squared standard deviation (σ 2 ) occurs frequently in statistical formulas and is called the variance. 2
3 Normal (Gaussian) Curve A histogram is a graph that visually describes a set of quantitative scores. A histogram is constructed by specifying several equallength score intervals and counting the number of people who have scores that fall within each interval. An example of a histogram of scores on the Attention Deficit Checklist (ADC) for 4,810 young adults is shown below. ADC Scores Frequency Mean =10.00 Std. Dev. = N =4,810 y Scientists discovered decades ago that histograms for many different types of quantitative scores could be closely approximated by a certain type of symmetric bellshaped curve called a normal (or Gaussian) curve. The histogram above includes a graph of a normal curve that closely approximates the shape of the histogram in this particular application. If a set of quantitative scores is approximately normal, the scores will have the following characteristics: about half of the scores are above the mean and about half are below the mean about 68% of the scores are within 1 standard deviation of the mean about 95% of the scores are within 2 standard deviations of the mean almost all (99.7%) of the scores are within 3 standard deviations of the mean 3
4 A normal distribution with a mean of 0 and a standard deviation of 1 is called a standard unit normal distribution. The symbol z α/2 will be used to denote the point on a standard unit normal distribution for which 100(1 α)% of the distribution is between the values z α/2 and z α/2. For instance, it can be shown that 95% of a standard unit normal distribution is between the values and 1.96 and so z α/2 = 1.96 for α =.05. Random Samples and Parameter Estimates In applications where the study population is large or the cost of measurement is high, the psychologist may not have the necessary resources to measure all N people in the study population. In these applications, the psychologist could take a random sample of n people from the study population of N people. In studies where random sampling is used, the study population is defined as the population from which the random sample was obtained. A random sample of size n is selected in such a way that every possible sample of size n will have the same chance of being selected. Computer programs are typically used to obtain a random sample of size n from a study population of size N. A population mean can be estimated from a random sample. The sample mean n μ = i=1 y i /n (1.2) is an estimate of μ (some statistics texts use X to denote the sample mean). A standard deviation can be estimated from a random sample. The sample standard deviation n σ = i=1 (y i μ ) 2 /(n 1) (1.3) is an estimate of σ (some statistics texts use s to denote the sample standard deviation). Squaring Equation 1.3 gives an estimate of the population variance. Of course, psychologists want to know the exact value of μ but they usually must settle for a sample estimate of μ because the study population size is either too large or the measurement process is too costly. However, the sample mean by itself can be misleading because μ μ will be positive or negative and the direction of the error will be unknown. In other words, the psychologist will not know if the sample mean has overestimated or underestimated the population mean. Furthermore, the magnitude of μ μ will be unknown. Thus, the value of the sample mean might be too small or too large, and it might be close to or very different from the value of μ. 4
5 Standard Error The standard error of a parameter estimate numerically describes the accuracy of an estimate. A small value of the standard error indicates that the parameter estimate is likely to be close to the unknown population parameter value, while a large standard error value indicates that the parameter estimate could be very different from the study population parameter value. A standard error of a parameter estimate can be estimated from a random sample. The estimated standard error of μ is SE μ = σ 2/n. (1.4) From Equation 1.4 it is clear that increasing the sample size (n) will decrease the value of the standard error and increase the accuracy of the sample mean. From Equation 1.4, it also can be seen that variability in the quantitative scores affects the accuracy of the estimate of a population mean with greater variability in scores leading to less accuracy in the sample mean. Confidence Interval for μ We can learn something about the unknown value of μ by using information from a random sample. By using an estimate of μ (Equation 1.2) and its standard error (Equation 1.4), which can be obtained from one random sample, it is possible to say something about the unknown value of μ in the form of a confidence interval. A confidence interval is a range of values that is believed to contain an unknown population parameter value with some specified degree of confidence. A 100(1 α)% confidence interval for μ is μ ± t α/2;df SE μ (1.5) where t α/2;df is a twosided critical tvalue. The value of t α/2;df can be found in a table of critical tvalues given in most statistics texts. The symbol df refers to degrees of freedom and is equal to n 1 in this type of application. The value 100(1 α)% is the confidence level. In psychological studies, it is common to set α =.05 to give a 95% confidence level. Example 1.1. The EPA estimates that lead in drinking water is responsible for more than 500,000 new cases of learning disabilities in children each year. Lead contaminated drinking water is most prevalent in homes built before A random sample of n = 10 homes was obtained from a listing of about 240,000 pre1970 homes in the San Francisco 5
6 area. Drinking water from the 10 homes was tested for lead (the test costs about $25 per house). The legal lead concentration limit for drinking water is 15 ppb. The measured lead concentrations (in ppb) for the 10 homes are given below The sample mean, sample variance, and standard error for this sample of 10 homes are computed below. μ = ( )/10 = 24.7 σ 2 = [( ) 2 + ( ) ( ) 2 ]/(10 1) = SE μ = σ 2/n = 144.0/10 = 3.79 With a sample size of 10 homes, df = n 1 = 9 and t.05/2;9 = The 95% lower confidence limit is (3.79) = 16.2 and the upper 95% limit = (3.79) = We can be 95% confident that the mean lead concentration in the drinking water of the 240,000 older homes is between 16.2 ppb and 33.3 ppb. Properties of Confidence Intervals There are two important properties of confidence intervals: increasing the sample size will tend to reduce the width of the confidence interval, and increasing the level of confidence (e.g., from 95% to 99%) will increase the width of the confidence interval. Increasing the level of confidence increases the proportion of all possible samples in which a confidence interval will capture the unknown population parameter value. These properties are illustrated in analysis of 50 different random samples of n = 30 from a study population of about 15,000 nurses who were all given an emotional exhaustion questionnaire and their mean score was In this hypothetical example, we know that μ = 22.5 but in practice we will not be able to measure all members of the study population and we will estimate μ using the information contained in just one random sample. 6
7 The above table displays the results of 95% confidence intervals from 50 different random samples. Note that the 95% confidence intervals for μ failed to capture the actual population mean value of 22.5 in sample 7 and sample 34. The table below displays the results for 99% confidence intervals computed from the same 50 random samples. Note that these confidence intervals are wider (less precise) but all of them have captured the population mean value. Choosing a Confidence Level The American Psychological Association recommends using 95% confidence intervals. A 95% confidence interval represents a good compromise between the level of confidence and the confidence interval width, as shown in the following graph. Notice that the confidence interval width increases almost linearly up to a confidence level of about 95% and then the width increases dramatically with increasing confidence. Thus, small increases in the level of confidence beyond 95% lead to relatively large increases in the confidence interval width CI Width Confidence 7
8 Hypothesis Testing In some applications, the psychologist simply needs to decide if the population parameter is greater than some value or less than some value. If the parameter is greater than some value, then one course of action will be taken; if the parameter is less than some value, then another course of action will be taken. The following notation is used to specify a set of hypotheses regarding μ H0: μ = h H1: μ > h H2: μ < h where h is some number specified by the psychologist and H0 is called the null hypothesis. H1 and H2 are called the alternative hypotheses. In virtually all applications, H0 is known to be false (because it is extremely unlikely that μ will exactly equal h) and the psychologist s goal is to decide if H1 or H2 is true. Consider the following example. If the mean job satisfaction score in a study population of employees is less than 5, then a company will increase yearend bonuses; otherwise, the standard bonus will be given. In this specific application, the set of hypotheses is shown below. H0: μ = 5 H1: μ > 5 H2: μ < 5 A confidence interval for μ can be used to choose between H1: μ > h and H2: μ < h. If the upper limit of a 100(1 α)% confidence interval is less than h, then H0 is rejected and H2 is accepted. If the lower limit of a 100(1 α)% confidence interval is greater than h, then H0 is rejected and H1 is accepted. If the confidence interval includes h, then H0 cannot be rejected. This general hypothesis testing procedure is called a threedecision rule because one of following three decisions will be made: 1) accept H1, 2) accept H2, or 3) fail to reject H0. A failure to reject H0 is called an inconclusive result. A test of H0: μ = h is commonly referred to as a onesample ttest and involves the computation of the test statistic t = (μ h)/se μ. Statistical packages such as SPSS or R will compute the pvalue that corresponds to the value of the test statistic. The pvalue can be used to reject H0: μ = h. Specifically, H0 is rejected if the pvalue is less than α (α is usually set equal to.05). 8
9 The pvalue is related to the sample size with larger sample sizes leading to smaller pvalues. With a sufficiently large sample size, the pvalue for a test of H0: μ = h will be less than.05. It is a common practice to report the results of a statistical test to be significant if the pvalue is less than.05 and nonsignificant if the pvalue is greater than.05. It is important to remember that a pvalue of less than.05 (a significant result) simply indicates that the sample size was large enough to reject the null hypothesis (which is known to be false in virtually all applications) and does not indicate if the population mean is meaningfully different from the hypothesized value. Also, a pvalue greater than.05 does not imply that H0 is true. In a threedecision rule, a directional error occurs when H1: μ > b has been accepted but μ < b is true or when H2 : μ < b has been accepted but μ > b is true. The probability of making a directional error is at most α/2. For instance, if a 95% confidence interval is used to select H1 or H2, the probability of making a directional error is at most.025. Most social science journals require authors to use α =.05. Power of a Hypothesis Test In hypothesis testing applications, the goal is to reject H0: μ = h and then choose either H1: μ > h or H2: μ < h. The power of a test is the probability of rejecting H0. If the power of the test is low, then the probability of an inconclusive result will be high. The power of a test of H0: μ = h depends on the sample size, the absolute value of (μ h)/σ (the standardized effect size), and the α level. Increasing the sample size will increase the power of the test as illustrated below for α =.05 and (μ h)/σ =
10 Decreasing α will reduce the probability of a directional error but will also decrease the power of the test as illustrated in the graph below for n = 30 and (μ h)/σ = 0.5. Note that there is little loss in power for reductions in α down to about.05 with power decreasing more dramatically for α values below.05, which is why α =.05 is a recommended value. For a given sample size and α level, the power of the test increases as the absolute value of (μ h)/σ increases, as illustrated in the graph below for n = 30 α =.05. Interpreting a Confidence Interval Consider a 95% confidence interval for μ. If a 95% confidence interval for μ was computed from every possible sample of size n in a given study population, about 95% of these confidence intervals will capture the unknown value of μ. With random sampling, we know that every possible sample of size n has the same 10
11 chance of being selected. Knowing that a 95% confidence interval for μ will capture μ in about 95% of all possible samples, and knowing that the one sample the psychologist has used to compute the 95% confidence interval is a random sample, we can say that the probability is.95 (or we are 95% confident) that the computed confidence interval includes μ. Another way to think about confidence intervals is to consider a test of H0: μ = h for many different values of h. For a given value of α, if H0 is tested for all possible values of h, a 100(1 α)% confidence interval for μ is the set of all values of h for which H0 cannot be rejected. All values of h that are not included in the confidence interval are values for which H0 would have been rejected at the specified α level. For instance, if a 95% confidence interval for μ is [14.2, 18.5], then all tests of H0: μ = h will not reject H0 if h is any value in the range 14.2 to 18.5 but will reject H0 for any value of h that is less than 14.2 or greater than Sample Size Planning A narrow confidence interval for μ is desirable because it provides a more precise and informative description of μ than a wider confidence interval. It is possible to approximate the sample size that will give the desired width (upper limit minus lower limit) of a confidence interval with a desired level of confidence. The sample size needed to obtain a 100(1 α)% confidence interval for having a desired width of w is approximately n = 4σ 2(z α/2 /w) 2 (1.6) ~ 2 where is a planning value of the response variable variance and z α/2 is a twosided critical zvalue. Planning values are obtained from expert opinion, pilot studies, or previously published research. If the maximum and minimum possible values of the response variable scale are known, [(max min)/4] 2 provides a crude planning value of the population variance. Equation 1.6 shows that larger sample sizes are needed with narrower confidence interval widths, greater levels of confidence, and greater variability of the response variable. Round Equation 1.6 up to the nearest integer. Example 1.2. A psychologist wants to estimate the mean job satisfaction score for a population of 4,782 public school teachers. The psychologist plans to use a job satisfaction questionnaire (measured on a 1 to 10 scale) that has been used in previous studies. A review of the literature suggests that the variance of the job satisfaction scale is about 6.0. The psychologist would like the 95% confidence interval for μ (the mean job satisfaction score for all 4,782 teachers) to have a width of about 1.5. The required sample size is approximately n = 4(6.0)(1.96/1.5) 2 =
12 Note that Equation 1.6 does not include the value of the study population size (N). Actually, the sample size requirement does depend on N according to the formula n = n(1 n/n) where n is given by Equation 1.6 and n is the revised sample size requirement. In most applications, n will be a small fraction of N and then n will be about the same as n. For instance, if N = 3,000 and Equation 1.6 gives n = 40, then n = 40(1 40/3000) = Sampling in Two Stages In applications where sample data can be collected in two stages, the confidence interval obtained in the first stage can be used to determine how many more participants should be sampled in the second stage. If the 100(1 α)% confidence interval width from a firststage total sample size of n is w 0, then the number of participants that should be added to the original sample (n + ) in order to obtain a 100(1 α)% confidence interval width of w is approximately n + = [( w 0 w )2 1] n. (1.7) Example 1.3. In a study with 25 participants, the 95% confidence interval for μ had a width of The psychologist suspects that the results of this study are unlikely to be published because the confidence interval is too wide. The psychologist would like to obtain a 95% confidence interval for μ that has a width of 2.0. To achieve this goal, the number of participants that should be added to the initial sample is [(4.38/2.0) 2 1]25 = Target Population The confidence interval for μ (Equation 1.5) provides information about the study population from which the random sample was taken. In most applications, the study population will be a small subset of some larger and more interesting population called the target population. For instance, a psychologist may take a random sample of 100 undergraduate students from a particular university directory consisting of 12,000 student names because the psychologist has easy access to this directory. The results of Equation 1.5 will apply only to those 12,000 undergraduate students, but the psychologist is more interested in the value of μ for a target population of all young adults. It might be possible for the psychologist to make a persuasive argument that the study population mean should be very similar to some target population mean. For instance, suppose the psychologist computed a confidence interval for the mean eye pupil diameter in a small room lit only by a 40watt light bulb using a 12
13 random sample from the 12,000 undergraduate students. The psychologist could argue convincingly that the mean eye pupil diameter in the study population of 12,000 undergraduates should be no different than the mean eye pupil diameter of all young adults. As an example where the study population mean would probably not be similar to some target population mean, suppose that the psychologist instead computed a confidence interval for the mean score on an abortion attitude scale using a sample of students from a Jesuit university. In this situation, the psychologist does not believe that the mean abortion attitude in the Jesuit study population is similar to the mean abortion attitude in a target population of all young adults. Researchers in the physical and biological sciences seldom worry about the distinction between a study population and a target population because the parameter values for many physical or biological attributes (like the eye pupil diameter example) are much less likely to differ across different study populations, and consequently the study population parameter values are almost automatically assumed to generalize to some large target population. In contrast, psychologists who study complex human behavior that can vary considerably across different study populations, need to be very cautious about how they interpret their confidence interval and hypothesis testing results. Psychologists should clearly describe the characteristics of the study population so that the statistical results are interpreted in a proper context. Assumptions for Confidence Intervals and Tests Confidence intervals and hypothesis tests for μ require three assumptions. One assumption, the random sampling assumption, requires the sample to be a random sample from the study population. A second assumption, the independence assumption, requires the responses from each participant in the sample to be independent of one another. In other words, no participant in the study should influence the responses of any other participant in the study. A third assumption, the normality assumption, requires the quantitative scores in the study population have an approximate normal distribution. Confidence intervals and hypothesis tests for μ will be uninterpretable if the random sampling assumption has been violated. If the independence assumption has been violated, the true probability of a directional error can be greater than α/2, and the true confidence level can be less than 100(1 α)%. Recall that the interpretation of a confidence interval for μ assumed that a 100(1 α)% confidence interval would capture the unknown population mean in about 100(1 α)% of all possible samples of a given size. However, when the 13
14 independence assumption is violated, the percent of samples in which a 100(1 α)% confidence interval captures the population parameter can be far less than 100(1 α)% and the psychologist s confidence regarding the computed confidence interval result will be mistakenly too high. Violating the normality assumption will have little effect on the confidence interval and test for μ unless the quantitative scores in the study population are extremely nonnormal and the sample size is small (n < 30). If the sample size is small and the study population quantitative scores are extremely nonnormal, the proportion of all possible 95% confidence intervals that would capture μ can be less than.95, and the psychologist s confidence regarding the computed confidence interval result will be mistakenly too high. Assessing the Normality Assumption Recall that the normal distribution is symmetric. If the quantitative scores in the sample exhibit a clear asymmetry, this would suggest a violation of the normality assumption. The asymmetry in a set of quantitative scores can be described using a coefficient of skewness. The skewness coefficient is equal to zero if the scores are perfectly symmetric, positive if the scores are skewed to the right, and negative if the scores are skewed to the left. SPSS and R provide a test of the null hypothesis that the population skewness coefficient is zero. If the pvalue is less than.05, the psychologist may conclude that the normality assumption has been violated and that the population scores are skewed, but a pvalue greater than.05 does not imply that the normality assumption has been satisfied. The population distribution of quantitative scores can be nonnormal even if the distribution is symmetric. The coefficient of kurtosis describes the degree to which a distribution is more or less peaked or has shorter or thicker tails than a normal distribution. SPSS and R provide a test of the null hypothesis that there is no kurtosis in population distribution of scores. If the pvalue is less than.05, the psychologist may conclude that the normality assumption has been violated and that the population scores have kurtosis, but a pvalue greater than.05 does not imply that the normality assumption has been satisfied. Data Transformations A transformation of the quantitative scores can reduce skewness. When the quantitative score is a frequency count, such as the number of facts that can be recalled or the number of spelling errors in a writing sample, a square root transformation ( y i ) may reduce nonnormality. When the score is a timeto 14
15 event, such as the time required to solve a problem or a reaction time, a natural log transformation (ln(yi)) or a reciprocal transformation (1/yi) may reduce nonnormality. Example 1.4. A histogram of 80 highly skewed scores is shown below (left). A histogram of the logtransformed scores (right) is much more symmetric. Although data transformations may reduce nonnormality, the mean of the transformed scores may then be difficult to interpret. However, in some applications the value of μ could be interpretable after a data transformation. For instance, if the response variable is measured in squared units, such as the brain surface area showing activity measured in squared centimeters, a square root transformation could be interpreted as the size of the activated area. Or if the response variable is reaction time measured in seconds, then a reciprocal transformation could be interpreted as responses per second. 15
Lesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More information5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
More informationNCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, twosample ttests, the ztest, the
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationHYPOTHESIS TESTING WITH SPSS:
HYPOTHESIS TESTING WITH SPSS: A NONSTATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER
More informationConfidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
More informationUnit 26 Estimation with Confidence Intervals
Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference
More informationComparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples
Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The
More informationTwoSample TTests Assuming Equal Variance (Enter Means)
Chapter 4 TwoSample TTests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one or twosided twosample ttests when the variances of
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationNCSS Statistical Software. OneSample TTest
Chapter 205 Introduction This procedure provides several reports for making inference about a population mean based on a single sample. These reports include confidence intervals of the mean or median,
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 14)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 14) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationBiostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
More informationStat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct 16 2015
Stat 411/511 THE RANDOMIZATION TEST Oct 16 2015 Charlotte Wickham stat511.cwick.co.nz Today Review randomization model Conduct randomization test What about CIs? Using a tdistribution as an approximation
More informationIntroduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses
Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the
More information99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm
Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the
More informationStandard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
More informationWeek 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
More informationHow To Test For Significance On A Data Set
NonParametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A nonparametric equivalent of the 1 SAMPLE TTEST. ASSUMPTIONS: Data is nonnormally distributed, even after log transforming.
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationMBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 111) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 15 scale to 0100 scores When you look at your report, you will notice that the scores are reported on a 0100 scale, even though respondents
More informationCalculating PValues. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating PValues Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating PValues" (2014). A with Honors Projects.
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More information3.4 Statistical inference for 2 populations based on two samples
3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted
More informationDescriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
More informationChapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationTwoSample TTests Allowing Unequal Variance (Enter Difference)
Chapter 45 TwoSample TTests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one or twosided twosample ttests when no assumption
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationA POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
CHAPTER 5. A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING 5.1 Concepts When a number of animals or plots are exposed to a certain treatment, we usually estimate the effect of the treatment
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationCHISQUARE: TESTING FOR GOODNESS OF FIT
CHISQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity
More informationStatistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
More informationIndependent samples ttest. Dr. Tom Pierce Radford University
Independent samples ttest Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationStatistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
More informationUNDERSTANDING THE DEPENDENTSAMPLES t TEST
UNDERSTANDING THE DEPENDENTSAMPLES t TEST A dependentsamples t test (a.k.a. matched or pairedsamples, matchedpairs, samples, or subjects, simple repeatedmeasures or withingroups, or correlated groups)
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationUNDERSTANDING THE INDEPENDENTSAMPLES t TEST
UNDERSTANDING The independentsamples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationChapter 2 Probability Topics SPSS T tests
Chapter 2 Probability Topics SPSS T tests Data file used: gss.sav In the lecture about chapter 2, only the OneSample T test has been explained. In this handout, we also give the SPSS methods to perform
More informationChapter 7. Oneway ANOVA
Chapter 7 Oneway ANOVA Oneway ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The ttest of Chapter 6 looks
More informationMeans, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special DistributionsVI Today, I am going to introduce
More informationAn Introduction to Statistics using Microsoft Excel. Dan Remenyi George Onofrei Joe English
An Introduction to Statistics using Microsoft Excel BY Dan Remenyi George Onofrei Joe English Published by Academic Publishing Limited Copyright 2009 Academic Publishing Limited All rights reserved. No
More informationPaired TTest. Chapter 208. Introduction. Technical Details. Research Questions
Chapter 208 Introduction This procedure provides several reports for making inference about the difference between two population means based on a paired sample. These reports include confidence intervals
More information2 Sample ttest (unequal sample sizes and unequal variances)
Variations of the ttest: Sample tail Sample ttest (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing
More informationINTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the oneway ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationOpgaven Onderzoeksmethoden, Onderdeel Statistiek
Opgaven Onderzoeksmethoden, Onderdeel Statistiek 1. What is the measurement scale of the following variables? a Shoe size b Religion c Car brand d Score in a tennis game e Number of work hours per week
More informationTHE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.
THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM
More informationIntroduction to Statistics for Psychology. Quantitative Methods for Human Sciences
Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationConfidence Intervals for Cp
Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process
More informationIntroduction to Hypothesis Testing
I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters  they must be estimated. However, we do have hypotheses about what the true
More informationInference for two Population Means
Inference for two Population Means Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison October 27 November 1, 2011 Two Population Means 1 / 65 Case Study Case Study Example
More informationThis chapter discusses some of the basic concepts in inferential statistics.
Research Skills for Psychology Majors: Everything You Need to Know to Get Started Inferential Statistics: Basic Concepts This chapter discusses some of the basic concepts in inferential statistics. Details
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationBA 275 Review Problems  Week 6 (10/30/0611/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394398, 404408, 410420
BA 275 Review Problems  Week 6 (10/30/0611/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394398, 404408, 410420 1. Which of the following will increase the value of the power in a statistical test
More informationPermutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. JaeWan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
More informationIndependent t Test (Comparing Two Means)
Independent t Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent ttest when to use the independent ttest the use of SPSS to complete an independent
More informationHypothesis testing  Steps
Hypothesis testing  Steps Steps to do a twotailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationConfidence Intervals for One Standard Deviation Using Standard Deviation
Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from
More informationComparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
More informationDescriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
More informationHYPOTHESIS TESTING (ONE SAMPLE)  CHAPTER 7 1. used confidence intervals to answer questions such as...
HYPOTHESIS TESTING (ONE SAMPLE)  CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men
More informationPrinciples of Hypothesis Testing for Public Health
Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationp ˆ (sample mean and sample
Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationTwo Related Samples t Test
Two Related Samples t Test In this example 1 students saw five pictures of attractive people and five pictures of unattractive people. For each picture, the students rated the friendliness of the person
More informationMind on Statistics. Chapter 13
Mind on Statistics Chapter 13 Sections 13.113.2 1. Which statement is not true about hypothesis tests? A. Hypothesis tests are only valid when the sample is representative of the population for the question
More information1 Nonparametric Statistics
1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions
More informationHYPOTHESIS TESTING: POWER OF THE TEST
HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,
More informationCrosstabulation & Chi Square
Crosstabulation & Chi Square Robert S Michael Chisquare as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among
More informationChapter 7 Section 1 Homework Set A
Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the
More informationNonParametric Tests (I)
Lecture 5: NonParametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of DistributionFree Tests (ii) Median Test for Two Independent
More informationChapter 2. Hypothesis testing in one population
Chapter 2. Hypothesis testing in one population Contents Introduction, the null and alternative hypotheses Hypothesis testing process Type I and Type II errors, power Test statistic, level of significance
More informationSKEWNESS. Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set.
SKEWNESS All about Skewness: Aim Definition Types of Skewness Measure of Skewness Example A fundamental task in many statistical analyses is to characterize the location and variability of a data set.
More informationCHAPTER 14 NONPARAMETRIC TESTS
CHAPTER 14 NONPARAMETRIC TESTS Everything that we have done up until now in statistics has relied heavily on one major fact: that our data is normally distributed. We have been able to make inferences
More information5.1 Radical Notation and Rational Exponents
Section 5.1 Radical Notation and Rational Exponents 1 5.1 Radical Notation and Rational Exponents We now review how exponents can be used to describe not only powers (such as 5 2 and 2 3 ), but also roots
More informationSCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
More informationDescriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research
More informationChapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 81 Overview 82 Basics of Hypothesis Testing
Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 81 Overview 82 Basics of Hypothesis Testing 83 Testing a Claim About a Proportion 85 Testing a Claim About a Mean: s Not Known 86 Testing
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly
More informationindividualdifferences
1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,
More informationChapter 7 Notes  Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:
Chapter 7 Notes  Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrclmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More informationPart 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217
Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationIntroduction to Analysis of Variance (ANOVA) Limitations of the ttest
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One Way ANOVA Limitations of the ttest Although the ttest is commonly used, it has limitations Can only
More information