ELEMENTARY STATISTICS

Size: px
Start display at page:

Download "ELEMENTARY STATISTICS"

Transcription

1 ELEMENTARY STATISTICS Study Guide Dr. Shinemin Lin

2 Table of Contents 1. Introduction to Statistics. Descriptive Statistics 3. Probabilities and Standard Normal Distribution 4. Estimates and Sample Sizes 5. Hypothesis Testing 6. Correlations and Regression 7. Analysis of Variance 8. Statistical Process Control

3 Project 1 Collecting Data There are many factors that influence the complexity of the written words, factors such as subject matter, overall length of discussion, choice of vocabulary, and sentence structure. To simplify the question I propose to look at the length of the sentence in articles written for the national newspaper, and local paper. You are going to collect sentences from local and national newspapers and record the length of each sentence ( and the complexity of the sentence.) This project requires you to do random sampling or pseudo random sampling to obtain your sample. Concentrating on the following questions: 1. How will you accomplish this?. How will you measure your variables with as much reliability and as little bias as possible? 3. How will you collect your data? 4. Is your data collection plan unbiased bias? 5. What descriptive statistics do you plan to compute for what variables? 6. What graphs and tables do you plan to display? 7. What inferences do you want to make about your population from the sample you observe?

4 Chapter 1 Introduction to Statistics The word STATISTICS has two basic meaning. We sometimes use this word when referring to actual numbers derived from data. A second meaning refers to statistics as a method of analysis. A statistical research usually consists of data collection, data presentation, data analysis and decision-making. In statistics, we commonly use the term's population and sample. We investigate sample to predict population. A population is the complete collection of elements to be studied. A sample is a sub collection of elements drawn from population. A parameter is a numerical measurement describing some characteristic of a population. A statistic is a numerical measurement describing some characteristic of a sample. Natural of Data Qualitative data can be separated into different categories that are distinguished by some nonnumeric characteristic. Quantitative data consist of numbers representing counts and measurements. Discrete data result from either a finite of possible values or a countable number of possible values. When data represent counts, they are discrete. Continuous numerical data result from infinitely many possible values that can be associated with points on a continuous scale in such a way that there are no gaps or interruptions. When data represent measurement, they are continuous. Levels of measurement of Data The nominal level of measurement is characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme. Example. Voter distribution: 45 democrats, 80 Republicans, 90 Independents. The ordinal level of measurement involves data that may be arranged in some order, but difference between data values either cannot be determined or are meaningless. Example. Voter distribution: 45 low-income voters, 80 middle-income voters, 90 upperincome voters. An order is determined by 'low, middle, upper'. The interval level of measurement is like the ordinal level, with the additional property that meaningful amounts of differences between data can be determined. However, there is no natural zero starting point Example. Temperatures of steel rods: 45 F, 80 F, and 90 F. 90 F is not twice as hot as 45 F.

5 The ratio level of measurement is the interval level modified to include the natural zero starting point. For value of this level, differences and ratios are meaningful. Example. Length of steel rods: 45 cm, 60 cm, and 90 cm. 90 cm is twice as long as 45 cm. Methods of Sampling Guidelines of data collection 1. Ensure that sample size is large enough for required purpose.. If your are obtaining measurements of some characteristic from people, you will get better results if you do the measuring instead of asking the subject for the value. 3. When conducting a survey, consider the medium to be used. 4. Ensure that the method used to collect data actually results in a sample that is representative of the population. Sampling methods 1. Random sampling: members of the population are selected in such a way that each has an equal chance of being selected.. Stratified sampling: we subdivide the population into at least two different subpopulation that share the same characteristics (such as gender), and then we draw a sample from each subpopulation. 3. Systematic sampling: we choose some starting point and then select every kth element in the population. 4. Cluster sampling, we divide the population area into sections (or clusters), randomly select a few of those sections, and then choose all the members from the selected sections. 5. Convenience sampling, we simply use results that are readily available. Homework:

6 Chapter Descriptive Statistics Summarizing Data When beginning an analysis of a large set of values, we must often organize and summarize the data by developing tables and graphs. We begin with a frequency table. A frequency table lists categories (or classes) of scores, along with counts (or frequencies) of number of scores that fall into each category. The construction of a frequency table is not very difficult, and many statistics software packages can do it automatically. Lower class limits are the smallest numbers that can actually belong to the different classes. Upper class limits are the largest numbers that can actually belong to the different classes. Class boundaries are the numbers used to separate classes, but without the gaps created by class limits. Class marks are the midpoints of the classes. Class width is the difference between two consecutive lower class limits Example 1, Construct frequency table (with 4 classes) for the data 80, 68, 84, 86, 85, 77, 64, 81, 93, 94, 97, 93, 89, 8, 76, 75, 83, 90, 83, 84, 9, 94, 90, 9, 91, 84, 81, 84, 79, 80, 80. Data Frequency Cumulative relative (%) Cumulative frequency frequency percent

7 Pictures of Data 1. Histograms use frequency table. Pie Charts use relative frequency table 3. Maps 4. Stem and Leaf Plot Another interesting way of summarizing data is to use what is called a stem and leaf plot. To illustrate this procedure, let's consider the grades obtained by two classes as follow. Class I: 56, 64, 73, 7, 84, 98, 80, 86, 75, 68, 46, 78, 75, 91, 63, 84, 79, 69, 76, 58. Class II: 99, 81, 50, 64, 76, 63, 71, 78, 81, 9, 87, 79, 74, 60, 68, 9, 84, 86, 65, 78. The first digit serves as the stem, and the second digit as the leaf. For example, the stem of 46 in Class I is 4, and the leaf is 6. Likewise, 56, and 58 have stems of 5 and leaves of 6 and 8, respectively. Stem and Leaf Plots Class I Stems Leaves , 8 6 4, 8, 3, 9 7, 3, 5, 8, 5, 9, 6 8 4, 0, 6, 4 9 8, 1 Class II Stems Leaves Complete the stem and leaf plot for class II Steps in Making a Stem and Leaf Plot 1. Decide on the number of digits in the data to be listed under stems (one-digit, twodigit,..) Usually only one digit is given under leaves and the other digits are listed under stems.. List the stems in a column, for least to greatest. 3. List the remaining digits in each data entry as leaves. (You may wish to order these data from smallest to largest) Example, Construct the stem-and-leaf plot for the data 80, 68, 84, 86, 85, 77, 64, 81, 93, 94, 97, 93, 89, 8, 76, 75, 83, 90, 83, 84, 9, 94, 90, 9, 91, 84, 81, 84, 79, 80, 80.

8 Measures of Central Tendency A measure of central tendency is a value at the center or middle of a data set. Mostly we will like to measure mean, weighted mean, mode, median, midrange and skewness. Mean = sum / counts X = n x Weighted mean = w x i n i i Median: The median of a set of scores is the middle value of the sorted data. Mode: The mode of a data set is the score that occurs most frequently. Midrange = (highest + lowest)/ Examples 1. Given data list 5, 5, 5, 3, 1, 1, 5, 4, 3, and 5. Find the mean, mode, and median.. How to find mean, mode, and median if you are given a frequency table. Skewness: A distribution of data is skewed if it is not symmetric and extends more to one side than the other. Skewed to the left (mean < median < mode) Symmetric (mean = mode = median) Skewed to the right (mode < median < mean) Measures of Variation Range = largest - smallest Variance = ( x x) /( n 1) = s

9 SD = ( X X ) ( n 1) = s The amount of deviation is the amount of difference between score and the mean. Example Find the variance and SD of the data 6.5, 6.6, 6.7, 6.8, 7.1, 7.3, 7.4, 7.7, 7.7, 7.7. Calculate SD from a frequency table Example. Find the (a) mean and (b) standard deviation of the data described below. X Frequency Range Rule of THUMB Range is closed to 4s and hence s can be approximated by (range /4) Interquartile Range (IQR) IQR = Q3 - Q1 Example, The following are 16 grades received on a test, arranged in increasing order. Find the mean, Q1, Q3, and IQR. Boxplots display Q1, Median and Q3. Outlier An outlier is any data point father than 1.5 IQRs above Q3 or father than 1.5 IQRs below Q1

10 Measure of position x µ Z scores = σ The standard score, or z score, is the number of standard deviation that a given value x is above or below the mean. Percentile = cumulative percent Example, Two equivalent IQ tests are given to similar groups, but the tests are designed with different scales. The statistics for the tests are listed below. Which is better: a score of 130 on test A or a score of 5 on test B? Test A: mean = 100, s = 15; Test B: mean = 40, s = 5. Solution.

11 Chapter 3 Probability and Standard Normal Distribution Probability of a single even. Pr(E) = k/n = number of success/number of possible outcomes. Example. 1) If we draw a ball from a bag containing 4 white balls and 6 black balls, what is the probability of a) getting a white ball? b) Getting a black ball? c) Not getting a white ball? ) A dice is rolled. What is the probability that a) A 4 will result? b) An old number will result? c) An number bigger than 4 will result? Sample space. A set that contains all possible outcomes of an experiment is called a sample space. Each element of the sample space is called a sample point, and an event is a subset of the sample space. Examples. 1) Write the sample space and all events of the example above. ) Suppose a coin is tossed 3 times. Construct the sample space for the experiment and the event of getting at least heads. 3) Ten blank cards are marked with the numbers 1 to 10. An experiment consists of shuffling the cards and then drawing one card. a) Determine the sample space for the experiment. b) How many sample points are in the sample space? c) What is the event getting a card with an even number? Pr(event) = #event / # sample points Addition rule: the probability of obtaining any one of several different and distinct outcomes equals the sum of their separate probability. The addition rule always assumes that the outcomes being considered are mutually exclusive Multiplication Rule: the probability of obtaining a combination of independent outcomes equals the product of their separate probabilities. Examples Flip two fair coins, what is the probability to get a) Two heads

12 b) One head and one tail Example 1. Draw one card at random froma standard deck of cards. The sample spaces S is the collection of the 5 cards. Assume that the probability set function assigns 1/5 to each of these 5 outcomes. Let A = { x: x is a jack, queen, or king} B = {x: x is a 9, 10, or jack and x is red}, C = {x: x is a club}, D = {x: x is a diamond, a heart, or a spade} Find a) P(A), b) P(A and B), c) P(A or B), d) P(C or D), and e) P(C and D) Example. If P(A) = 0.4, P(B) = 0.5, and P(A and B) = 0.3, find P(A or B), and P(A and B'). Independent and Dependent events If the occurrence of one event affects the occurrence of the other, the events are said to be dependent. If the occurrence of one event does not affect the occurrence of the other, the events are called independent. If E and F are any two events, then the probability that both events occur, denoted Pr(EF), is given by Pr(E*F) = Pr(E)*Pr(F E), where Pr(F E) is the probability that F occurs, given that E has occurred. We call Pr(F E) a conditional probability. Examples. 1) Two card are drawn from regular deck of cards (without replacement). a) The probability of first card king is 4/5 b) The probability of second card king given first card king is 3/51 c) The probability of both cards king is 4/5 * 3/51 = 1/1. d) The probability of both hearts. e) The probability of a heart at first draw, a club on the second draw. f) The probability of a heart on the first draw; an ace on the second draw. ) From a deck of 5 cards two cards are drawn, one after another without replacement. What is the probability that (a) the first will be king and the second will be a jack? (b) the first will be king and the second will be jack in the same suit? Suppose that we are given 0 tulips that are very similar in appearance and told that 8 tulips will bloom early, 1 will bloom late, 13 will be red, and 7 will be yellow. If a bulb is selected at random, find a) the probability that it will produce a red tulip, b) the probability that it will be red and that will bloom early.

13 The normal distribution If we modify some line graphs to indicate probability rather than frequency, the resulting graphs closely approximate a smooth, bell-shaped curve called the normal probability curve. 1. The area under a normal curve is equal to 1.. The normal curve is symmetric about a vertical line through the mean of the set of data. 3. The interval extending from SDs to the left of the mean to SDs to the right of the mean contains approximately 95% of all data. 4. If x is a data value from a normally distributed set of data, then the probability that x is greater than a and less than b is the area under the normal curve between a and b. Finding Probabilities when given z scores Prob(a < z < b) = the probability that the z score is between a and b. Prob(z > a) = the probability that the z score is greater than a. Prob(z < a) = the probability that z score is less than a. Using Z-distribution Table Examples 1. Assume that IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. An IQ scores is randomly selected from this population. Find the indicated probability. a) P(100 < x < 130) b) P( x < 15) c) P( x > 85) d) P( 85 < x < 115). If IQ scores are normally distributed with a mean of 100 and a standard deviation of 15, find the probability of randomly selecting a person with an IQ score between 100 and 130? Finding z scores when given Probabilities Examples

14 1. Use the same thermometers with temperature readings that are normally distributed with a mean 0 C and a standard deviation of 1 C. Find the temperature corresponding to, the 95th percentile, and 90 th percentile.. The Chemco Company, which manufactures car tires, finds that the tires last distance that are normally distributed with a mean of miles and a standard deviation of 475 miles. The manufacturer wants to guarantee the tires so that only 3% will be replaces because of failure before the guaranteed number of miles. For how many miles should the tires be guaranteed? The central Limit Theorem As the sample size increases, the sampling distribution of sample means approaches a normal distribution. i.e.. the mean of means is almost equal to population mean ( µ = µ ); and the standard deviation of the sample means will be σ X n Central Limit Theorem Given: 1. The random variable x has a distribution with mean µ and standard deviation σ.. Samples of size n are randomly selected from this population. Conclusion: 1. The distribution of the sample means Xs will, as the sample size n increase, approaches a normal distribution.. The mean of the sample means will be the population mean µ. 3. The standard deviation of the sample means will be σ n Practical Rules Commonly used: 1. For sample size n larger than 30, the distribution of the sample means can be approximated reasonably well by a normal distribution. The approximation gets better as the sample size n becomes larger.. If the original population itself is normally distributed, the sample means will be normally distributed for any sample size n. Example Assume that the population of human body temperatures has a mean of 98.6 F, as is commonly believed. Also assume that the population standard deviation is

15 0.6 F. If a sample of size 106 is randomly selected, find the probability of getting a mean of 98. F or lower. Solution. µ x = µ = 98.6 σ 0.6 σ = = = x n 106 z = ( )/0.060 = P( z < -6.67) =

16 Chapter 4 Estimates and Sample sizes A point estimate is a single value used to approximate a population parameter. As an example, the sample mean X is the best point estimate of the population mean µ A confidence interval (or interval estimate) is a range (or an interval) of values that likely to contain the true value of the population parameter. A confidence interval is associated with a degree of confidence. The degree of confidence is the probability 1 - α that the population parameter is contained in the confidence interval. This probability is often expressed as the equivalent percentage value. The degree of confidence is also referred to as the level of confidence or the confidence coefficient. Common choices for the degree of confidence are 95% (α = 0.05), and 99% (α = 0.01). Notation: Z α = positive z value that is at the vertical boundary for the area of α / in the right tail of the standard normal distribution. A critical value is the number on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur. The number Z is a critical value with the property that the size of the area under the curve bounded by - Z and Z is 1 - α α α α Example Given a 95% degree of confidence, find the critical value Z α When sample data are used to estimate a population mean µ, the margin of error, denoted by E, is the maximum likely (with probability 1 - α ) difference between the observed sample mean and the true value of the population mean µ. E = Z * σ α n If n > 30, we can replace σ by the sample standard deviation s. Confidence interval for the population mean µ is ( X E X + E). Example For a 95% degree of confidence, find the confidence interval for population mean given the statistics n = 106, X = 98. and s = 0.6.,

17 Small Sample Cases and the Student t distribution If n < 30 and the population standard deviation is unknown then we can not use the previous formula to find out the confidence interval for the population mean. It this is the case we can apply the student t-distribution. Student t-distribution If the distribution of a population is essentially normal, then the distribution of t = is essentially a Student t-distribution for all samples of size n. X µ s n Using t-distribution Table The number of degrees of freedom for a data set corresponds to the number of scores that can vary after certain restrictions have been imposed on all scores. DF = n - 1. Important facts of the t-distribution 1. The t-distribution is different for different sample size.. The t-distribution has the same general symmetric bell shape as the normal distribution, but it reflects the greater variability that is expected with small sample. 3. The t-distribution has a mean of t = The standard deviation of The t-distribution varies with the sample size, but it is greater than As n gets larger, the t-distribution gets closer to normal distribution. When do we use the t-distribution? (1) The sample is small (n <= 30); () σ is unknown; and (3) The population has a distribution that is essentially normal. The confidence interval will be ( X E X + E) where E =, t * α / s n Example. Suppose that we have only the following 10 randomly selected body temperatures. 98.6, 98.6, 98.0, 99.0, 98.4, 98.4, 98.4, 98.6, 98.4, 98.0 Construct the 95% confidence interval for the mean of all body temperatures. (Assume that body temperatures are normally distributed.)

18 Z * σ What is the appropriate sample size? n E population standard deviation, we can go ahead using s instead of σ. α / = [ ] and round up. If we don't know the Example. We want to estimate the mean weight of plastic discarded by households in one week. How many households must we randomly select if we want to be 99% sure that the sample mean is within 0.5lb of the true population mean Assume that σ = 1.10lb. Estimating a Population Variance In a normally distributed population with variance σ, we randomly select independent sample of size n and compute the sample variance s for each sample. The sample statistic χ = ( n 1) s / σ has a distribution called the Chi-square distribution with DF = n - 1. Properties of Chi-square distribution. 1. The Chi-square distribution is not symmetric.. The value of chi-square can be zero or positive, but cannot be negative. 3. The chi-square distribution is different for each number of degrees of freedom. As the number of DF increases, the chi-square distribution approaches to a normal distribution. Example. Find the critical values of χ that determine critical regions containing an area of 0.05 in each tail. Assume that the relevant sample size is 10. The sample variance s is the best point estimate of the population variance The confidence interval of population variance is ( n 1) s χ R ( n 1) s, χ L. Question: What is the confidence interval of population standard deviation? Example. The following IQ scores are obtained from a randomly selected sample a) Find the best point estimate of the population variance. b) Construct a 95% confidence interval estimate of the population standard deviation.

19 Chapter 5 Hypothesis Testing In previous chapter we studied how to use sample statistics to estimate values of population parameters. In this chapter we study how to use sample statistics to test hypotheses made about population parameters. In statistics, a hypothesis is a statement that something is true. Components of a Formal Hypothesis Test 1. The null hypothesis (H0) is a statement about the value of a population parameter, and it must contain the condition of equality.. The alternative hypothesis (H1) is the statement that must be true if the null hypothesis is false. Hypothesis testing is not simply a matter of being right or wrong. Different types of errors can have dramatically different consequences. Type I error: The mistake of rejecting the null hypothesis when it is true. This type error is not a miscalculation or procedural misstep; it is an actual error that can occur when a rare event happens by chance. The probability of rejecting the null hypothesis when it is true is called significance level; that is, the significance level is the probability of type I error. The symbol α is used to represent the significance level. The values of 0.05 and 0.01 are common used. Type II error: This mistake of failing to reject the null hypothesis when it is false. The symbol β is used to represent the probability of a type II error. True State of Nature H0 is true H0 is false Reject H0 Type I error Correct Decision Fail to reject H0 Correct Decision Type II error Type II error 3. Test Statistic: A sample statistic or a value based on the sample data. A test statistic is used in making decision about rejection of the null hypothesis. 4. Critical Region: The set of all values of the test statistic that would cause us to reject the null hypothesis. 5. Critical value: The value or values that separated the critical region from the value of the test statistic that would not lead to rejection of the null hypothesis. The critical values depend on the natural of the null hypothesis, the relevant sampling distribution, and the level of the significance 6. Conclusion: a) Fail to reject the null hypothesis H0 b) Reject the null hypothesis

20 Example. Original claim: A medical researcher claims that the mean body temperature of a healthy adults is not equal to 98.6 F. Hypotheses: H0: µ = 98; H1: µ Significant level: α = 0.05 X µ X Test statistic: z = = = σ / n 0.6 * 106 Critical region: It consists of values of the statistic less than z = or greater than z = Critical value: The critical values are z = and z = The following practical considerations may be relevant: 1) For any fixed α, an increase in the sample size n will cause a decrease in β. That is a larger sample will lessen the chance that you fail to reject a false null hypothesis. ) For any fixed size n, a decrease in α will cause an increase in β. Conversely, an increase in α will cause a decrease in β. 3) To decrease both α and β, increase the sample size. Summary Start => Does the original claim contain the condition of equality? If the answer is yes => original claim becomes H0. Do you reject H0 If yes => There is sufficient evidence to warrant rejection of the claim. If no => There is not sufficient evidence to warrant rejection of the claim. If the answer is no => original claim becomes H1. Do you reject H0? If yes => The sample data support the claim. If no => There is not sufficient sample evidence to support the claim. Two-tailed test: H1 Left-tailed test: H1 < Right-tailed test: H1 > Example. After analyzing 106 body temperatures of healthy adults, a medical researcher makes a claim that the mean body temperature is less than 98.6 degree F.

21 a) Express the claim in symbolic form: b) Identify the null hypothesis: c) Identify the alternative hypothesis: d) Identify this test as being two-tailed, left-tailed, or right-tailed: e) Identify the type I error: f) Identify the type II error: g) Assume that the conclusion is to reject the null hypothesis. State the conclusion in no technical terms. h) Assume that the conclusion is failure to reject the null hypothesis. State the conclusion in no technical terms. Testing a claim about a mean: Large Samples Test Statistic for claims about When n > 30: Z = X µ X σ / n Traditional Method of Hypothesis Testing: 1. Identify the specific claim or hypothesis to be tested and put it in symbolic form. Give the symbolic form that must be true when the original claims id false. 3. Of two symbolic expressions obtained so far, let the null hypothesis H0 be the one that contains the condition of equality, H1 is another statement. 4. Select the significance level α based on the seriousness of a type I error. Make α small if the consequences of rejecting a true H0 are severe. The value 0.05 or 0.01 is very common. 5. Identify the statistic that is relevant to this test and its sampling distribution. 6. Determine the test statistic, the critical values, and the critical region. Draw a graph and include the test statistic, critical value(s), and critical region. 7. Reject H0 if the test statistic is in the critical region. Fail to reject H0 if the test statistic is not in the critical region. 8. Restate this previous decision in simple no technical terms. That fail to reject H0 does not equivalent to say support H0. Example. Using the sample data given at the beginning of the chapter (n = 106, X = 98., s = 0.6) and a 0.05 significance level, test the claim that the mean body temperature of healthy adults is equal to 98.6 F. Use the traditional method by following the procedure outlined above.

22 The p-value method of testing hypothesis Many professional articles and software packages use another approach to hypothesis testing that is based on the calculation of a probability value, or p-value. A p-value is the probability of getting a value of the sample test statistic that is at least as extreme as the one found from the sample data, assuming that the null hypothesis is true. p-value can be found at Table A3. P-values measure how confident we are in rejecting a null hypothesis. For example, a P- value of would lead us to reject null hypothesis, but it would also suggest that the sample results are extremely unusual if the claimed value of µ is in fact correct. P-value approach uses most of the same basic procedures as the traditional approach, but step 6 and 7 are different: Step 6: Find p-value Step 7: Report p-value. Some statisticians prefer to simply report the p-value and leave the conclusion to the reader. Others prefer to use the following decision criterion: Reject H0 if the p-value is less than or equal to the significance level. Fail to reject H0 if the p-value is greater than the significance level. If the conclusion is based on the p-value alone, the following guide may be helpful: Less than 0.01: Highly statistically significant; Very strong evidence against the null hypothesis 0.01 to 0.05: Statistically significant Adequate evidence against the null hypothesis Greater than 0.05: Insufficient evidence against the null hypothesis Example. Use the p-value method to test the claim that the mean body temperature of healthy adults is equal to 98.6 F. As before, use a 0.05 significance level and the sample data from previous example. Testing a Claim about a Mean: Small samples If the sample size is small than 30, the population standard deviation is unknown, and the population is essentially normally distributed then we use t-distribution to test our hypothesis. Test Statistic = t = X µ X s / n Example. In one part of a test developed by a psychologist, the test subject is asked to form a word by unscrambling the letters 'ciiatttsss'. Given below are the times (in

23 seconds) required by 15 randomly selected persons to unscramble the letters. Test the claim that the mean time is equal to 60 seconds at the 0.05 level of significance. 68.7, 7.4, 6.0, 60.5, 34.6, 61.1, 68.6, 48.4, 43.6, 39.5, 85.3, 6.3, 43.4, 83.7, Testing a Claim about a Standard Deviation or Variance. In testing a hypothesis made about a population standard deviation and variance, we assume that the population has values that are normally distributed. Test Statistic for testing hypothesis about standard deviation or variance χ = n 1) s /, where n = sample size; ( σ andα = population variance(given in the H0) s = sample variance; Example. With individual lines at its various windows, the Jefferson Bank found that the standard deviation for normally distributed waiting times on Friday afternoon was 6 min. The bank experimented with a single main waiting line and found that for a random sample of 5 customers, the waiting times have a standard deviation of 3.8 min. based on previous studies, we can assume that the waiting times are normally distributed. At the = 0.05 significance level, test the claim that a single line causes lower variation among the waiting times.

24 Chapter 6 Correlation and Regression In this chapter involves estimating parameters and testing hypothesis, but the method s we will use are different because of the very different issue we will be considering: given paired data, we want to investigate the relationship between the two variables. Specially, we want to determine whether there is a relationship between the two variables and, if so, identify what the relationship is. We begin by considering the concept of correlation. We also investigate regression analysis. A correlation exists between two variables when on of them is related to the other in some way. The Minitab provides a scatter diagram, which is a plot of paired (x, y) data with a horizontal x-axis and a vertical y-axis. We can find out the general pattern of those paired data sometimes. The linear correlation coefficient r measures the strength of the linear relationship between the paired x and y values in a sample. Its value is computed by using the formula n xy ( x)( y) r = = n( x ) ( x) n( y ) ( y) ( x x)( y y) ( n 1) s x s y r is a sample statistic. We might think of r as a point estimate of the population parameter, which is the linear correlation coefficient for all pairs of data in the population. Example. Use Table 6.1, find the value of the linear correlation coefficient r. (r = 0.84) Table 6.1 Data from the Garbage Project x Plastic (lb) y household size After calculating r, how do we interpret the result? If r is close to zero, we conclude that there is no significant linear correlation between x and y. Properties of r 1. r is always between -1 and 1.. r does not change if all values of either variables are converted to a different scale. 3. r is not affected by the choice of x or y. 4. r measures the strength of a linear relationship.

25 Hypothesis Test of the Significance of r H0: ρ = 0; H1: ρ 0 For the test statistic, we use one of the following methods. Method I: Test Statistic is t ( r µ ) t = r r = s r 1 r ; since we assume that ρ = 0, it follows that µ r = 0 Also, n it can be shown that the standard deviation of linear correlation coefficients, can be expresses as (1 r ) /( n ).. Critical value: Use Table A-3 with degrees of freedom = n-. Method : Test Statistic is r Critical values: refer to Table A-6. Example. Using the sample data in Table 6.1, test the claim that there is a linear correlation between weights of discarded plastic and household sizes use method 1. Common Errors Involving Correlation 1. We must be careful to avoid conducting that a significant linear correlation between two variables is proof that there is a cause-effect relationship between them.. Another source of potential error arises with data based on rates or averages. If we suppress the variation of individuals, it may lead to an inflates correlation coefficient. 3. A third error involves the property of linearity. The conclusion that there is no significant linear correlation coefficient does not mean that x and y are not related in any way.

26 Regression Our goal in this section is to identify the relationship between variables so that we can predict the value of one variable, given the value of the other variable. Given a collection of paired sample data, the regression equation describes the relationship between the two variables. The graph is Yˆ i = b0 + b1 X i called the regression line or line of best fit, or least-squares line. b b 1 0 = n i= 1 n i= 1 X = Y b X X Y nxy i i 1 i n ( X ) Notation of Regression Equation Population parameter Point Estimate y-intercept of regression line b0 b0' Slope of regression line b1 b1' Equation of the line y = b0 + b1x y = b0' + b1' x' Example. Use Table 6.1 data, find the regression equation of the straight line that relates x and y. (y = x) Predictions In predicting a value of y based on some given value of x.. 1. If there is not a significant linear correlation, the best predicted y value is y.. If there is a significant linear correlation, the best predicted y value is found by substituting the x value into the regression equation. Example. Use the previous regression equation y = x to predict the size of a household that discards.50 lb of plastic in a week. Solution. y = (.50) = 4.5

27 Guidelines for Using the Regression Equation If there is no significant linear correlation, don't use the regression equation to make prediction When using the regression equation for prediction, stay within the scope of the available sample data. A regression equation based no old data is not necessarily valid now. Don't make predictions about a population that is different from the population from which the sample data were drawn.

28 Chapter 7 Analysis of Variance In Chapter 5 we developed procedures for testing the hypothesis that two population means are equal. In this chapter we will develop a procedure for testing the hypothesis that three or more population means are equal. Analysis of Variance (ANOVA) is a method of testing the equality of three or more population means by analyzing sample variances. The ANOVA methods use F-distribution. Assume that two populations are independent of each other and are normally distributed then s1 F(n,m) = is a F-distribution with degrees of freedom n-1,m-1. s Properties of F-distributions 1. The F distribution is not symmetric; it is skewed to the right.. The value of F can be zero or positive, but they cannot be negative. 3. There is a different F distribution for each pair of degrees of freedom for the numerator and denominator. In this chapter we assume that 1. The population has normal distribution. The population has the same variance. 3. The samples are random and independent of each other. One-Way ANOVA with Equal Sample Sizes. Notation for One-Way ANOVA with Equal Sample Sizes n = size of each sample k = number of samples S = Variance of the sample means x S = Pooled variance obtained by calculating the mean of the sample variances. p H0: µ 1 = µ = µ 3 H1: one of the equalities does not hold. The variance between samples (variation due to treatment) is an estimate of σ based on the sample means. Variance between samples = ns where S = variance of the sample means x x

29 The variance within samples (variation due to error) is an estimate of σ based on the sample variances. With all samples of the same size n, Variance within samples = S p = pooled variance obtained by finding the mean of the sample variance. Test Statistic for One-Way ANOVA with Equal Sample sizes F = ns / S x p numerator degrees of freedom = k-1 denominator degrees of freedom = k(n-1) The critical value of F is F(k-1, k(n-1)) Example Do different age groups have different body temperature? Table 7-3 lists the body temperatures of 5 randomly selected subjects from each of 3 different age groups. Informal examination of 3 sample means (97.940, , ) seems to suggest that the 3 samples come from populations with means that are not significantly different. In addition to the values of the 3 sample means, however, we should consider their standard deviations and the sample sizes. We need to conduct a formal hypothesis test to determine whether the sample means are significantly different. Using a significance level of 0.05, we will test the claim that the 3 age-group populations have the same mean body temperature. Table 7-1 Body Temperature (Categorized by Age) and older n1 = 5 n = 5 n3 = 5 X = X = X = s1 = s = s3 = 0.75 Solution. Step 1 and Step (omit) Step 3: Ho: µ 1 = µ = µ 3 ; H1: Three means are not all equal.

30 Step 4: Significance level = 0.05 Step 5: Because we test the claim that 3 or more population means are equal, we use ANOVA with an F test statistic. Step 6: For one-way ANOVA with equal sample sizes, the test statistic (F = ) is calculated as following. The critical value of F = is found by referring to the table for which α = The degrees of freedom are as follows: numerator degrees of freedom = k - 1 = 3-1 = denominator degrees of freedom = k(n - 1) = 3(5-1) = 1 Step 7: Because the test statistic of F = does not fall in the critical region bounded by F = , we fail to reject the null hypothesis of the 3 means are equal. Step 8: There is not sufficient evidence to warrant rejection of the claim that the 3 populations of different age groups have the same mean body temperature. Perhaps there really is a difference, but the sample size is too small and/or the sample differences are not large enough to justify that conclusion. One-Way ANOVA with Unequal Sample Sizes Notions = X = overall mean ( sum of all sample scores divided by the total number of scores) k = number of population means being compared n i = number of values in the ith sample N = total number of values in all sample combined (N = X i = mean of values in the ith sample S i = variance of values in the ith sample Using the preceding notation, we can now express the test statistic as follows: F =( variance between samples )/ (variance within samples) = The numerator is really a form of the formula Key components in our ANOVA method are listed below. SS(total) = total sum of squares = a measure of the total variation (around overall mean) in all of the sample data combined.

31 = = SS(treatment) + SS(error) SS(treatment) = a measure of the variation between the sample means = SS(between groups) = SS(error) = sum of squares representing the variability that is assumed to be common to all the population being considered. = Example. Table 7- includes sample data with movie lengths arranged according to the numbers of stars the movies were given. Use the data in Table 7- to find the values of SS(treatment), SS(error), and SS(total). Table 7. Lengths (in minutes) of Movies Categorized by Star Ratings Poor Fair Good Excellent Stars Stars Stars 4.0 Stars Solution k = 4 (number of samples) mean of all 60 sample scores = 6630/60 = SS(treatment) = = SS(error) = =

32 SS(total) = = SS(treatment) and SS(error) are both sums of squares, and if we divide each by its corresponding number of degrees of freedom, we get mean squares, as defined below. MS(treatment) is a mean square for treatment, obtained as follows: MS(treatment) = SS(treatment)/(k-1) MS(error) is a mean square for error, obtained as follows: MS(error) = SS(error)/(N - k) MS(total) is a mean square for the total variation, obtained as follows: MS(total) = SS(total)/(N-1) Example. Use the sample in Table 7- to find the values of MS(treatment), MS(error), and MS(total). Solution. MS(treatment ) = SS(treatment)/(k-1) = /(4-1) = MS(error) = SS(error) / (N - k) = /(60-4) = MS(total) = SS(total) /(N - 1) = /(60-1) = Test Statistic for ANOVA with Unequal Sample Sizes H0: All means are equal H1: these means are not all equal The test statistic F = MS(treatment)/MS(error) The critical value = F(k-1, N-k) Example. Are bad movies as long as good movies, or does it just seem that way? Refer to the sample data given in Table 7.. Examination of summary statistics seems to suggest that there are diifferences in the mean length of movies, with movies rated as excellent tending to be longer. But are those differences significant? Test the claim that the 4 categories of movies have the same mean length. That is, test the claim that. Solution H0 : H1 : The preceding means are not all equal.

33 Significant level = Use F distribution ANOVA Table Source of Variation SS Degree of Freedom MS F Treatments Error Total Critical Value =.7581 Because the test statistic of F = does exceed the critical value F =.7581, we reject the null hypothesis that the means are equal. There is sufficient sample evidence to warrant rejection of the claim that the 4 population means are equal. It appears the mean movie length is not the same for poor, fair, good, and excellent movies. It seems that movies rated with 4 stars are longer than other movies, but we need other methods to formally justify this conclusion. Chapter 8 Nonparametric Statistics

34

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) A researcher for an airline interviews all of the passengers on five randomly

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Chapter 4. Probability and Probability Distributions

Chapter 4. Probability and Probability Distributions Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

More information

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability.

A and B This represents the probability that both events A and B occur. This can be calculated using the multiplication rules of probability. Glossary Brase: Understandable Statistics, 10e A B This is the notation used to represent the conditional probability of A given B. A and B This represents the probability that both events A and B occur.

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

More information

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives. The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013 Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance Principles of Statistics STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Basic Probability and Statistics Review. Six Sigma Black Belt Primer

Basic Probability and Statistics Review. Six Sigma Black Belt Primer Basic Probability and Statistics Review Six Sigma Black Belt Primer Pat Hammett, Ph.D. January 2003 Instructor Comments: This document contains a review of basic probability and statistics. It also includes

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935) Section 7.1 Introduction to Hypothesis Testing Schrodinger s cat quantum mechanics thought experiment (1935) Statistical Hypotheses A statistical hypothesis is a claim about a population. Null hypothesis

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

THE BINOMIAL DISTRIBUTION & PROBABILITY

THE BINOMIAL DISTRIBUTION & PROBABILITY REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice! Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!) Part A - Multiple Choice Indicate the best choice

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

AP * Statistics Review. Descriptive Statistics

AP * Statistics Review. Descriptive Statistics AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey): MATH 1040 REVIEW (EXAM I) Chapter 1 1. For the studies described, identify the population, sample, population parameters, and sample statistics: a) The Gallup Organization conducted a poll of 1003 Americans

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information