Review for FINAL EXAM 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample Chapter 1 a sub-collection of elements drawn from a population 2 The Nature of Data Quantitative data Definitions numbers representing counts or measurements Qualitative (attribute) data nonnumeric data that can be separated into different categories (categorical data) 3
Definitions Discrete - Countable Continuous - Measurements with no gaps 4 Levels of Measurement Nominal - names only Ordinal - names with some order Interval - differences but no zero Ratio - differences and a zero 5 Methods of Sampling Random Systematic Convenience Stratified Cluster 6
Chapters 2,3 7 Determine the Definition Values for this Frequency Table Quiz Scores 0-4 5-9 10-14 15-19 20-24 Frequency 2 5 8 11 7 Classes Lower Class Limits Upper Class Limits Class Boundaries Class Midpoints Class Width 8 Regular Freq. Table Axial Load Frequency Tables Frequency Relative Freq. Table Axial Load Relative Frequency Cumulative Freq. Table Axial Load Cumulative Frequency 200-209 210-219 220-229 230-239 240-249 250-259 260-269 270-279 280-289 290-299 9 3 5 4 4 14 32 52 38 14 200-209 210-219 220-229 230-239 240-249 250-259 260-269 270-279 280-289 290-299 0.051 0.017 0.029 0.023 0.023 0.080 0.183 0.297 0.217 0.08- Less than 210 Less than 220 Less than 230 Less than 240 Less than 250 Less than 260 Less than 270 Less than 280 Less than 290 Less than 300 9 12 17 21 25 39 71 123 161 175 9
Histogram of Axial Load Data 60 50 Frequency 40 30 20 10 0 199.5 209.5 219.5 229.5 239.5 249.5 259.5 269.5 279.5 289.5 299.5 Axial Load (pounds) 10 Important Distributions Normal Uniform Skewed Right Skewed Left 11 Stem-Leaf Plots 10 11 15 23 27 28 38 38 39 39 40 41 44 45 46 46 52 57 58 65 Stem 1 2 3 4 5 6 Leaves 015 378 8899 014566 278 5 12
Mean Median Mode Measures of Center Midrange 13 Calculator Basics for Statistical Data 1. Put calculator into statistical mode 2. Clear previous data 3. Enter data (and frequency) 4. Select key(s) that calculate x 14 Mean for a Frequency Table Quiz Scores Midpoints Frequency x = 14.4 ( rounded to one more decimal place than data ) 0-4 5-9 10-14 15-19 20-24 2 7 12 17 22 2 5 8 11 7 15
Measure of Variation Range highest score lowest score 16 Measure of Variation Standard Deviation a measure of variation of the scores about the mean (average deviation from the mean) 17 Measure of Variation Variance standard deviation squared 18
Same Means (x = 4) Different Standard Deviations Frequency 7 6 5 4 3 2 1 s = 0 s = 0.8 s = 1.0 s = 3.0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Standard Deviation 19 Estimation of Standard Deviation Range Rule of Thumb x - 2s x x + 2s (minimum usual value) Range» 4s (maximum usual value) Range s» = 4 highest value - lowest value 4 20 Rough Estimates of Usual Sample Values minimum usual value» (mean) - 2 (standard deviation) minimum» x - 2(s) maximum usual value» (mean) + 2 (standard deviation) maximum» x + 2(s) 21
FIGURE 2-13 The Empirical Rule (applies to bell-shaped distributions) 99.7% of data are within 3 standard deviations of the mean 95% within 2 standard deviations 68% within 1 standard deviation 34% 34% 2.4% 2.4% 0.1% 0.1% 13.5% 13.5% x - 3s x - 2s x - 1s x x + 1s x + 2s x + 3s 22 Measures of Position z score Sample z = x - x s Population z = x - µ s Round to 2 decimal places 23 Interpreting Z Scores Unusual Values Ordinary Values Unusual Values - 3-2 - 1 0 1 2 3 Z 24
Other Measures of Position Quartiles and Percentiles 25 Sort the data. (Arrange the data in order of lowest to highest.) Start Compute L = ( k ) n where 100 n = number of values k = percentile in question Is L a whole number? Change L by rounding it up to the next No larger whole number. Finding the Value of the kth Percentile 200 201 204 206 206 208 208 209 215 217 218 Find the 75th percentile. Yes (75 ) 11 = 8.75 = L 100 L = 9 The value of the kth percentile is midway between the Lth value and the next value in the sorted set of data. Find P k by adding the L th value and the next value and dividing the total by 2. Figure 3-6 The value of P k is the The 75th percentile is the 9th score, or 215. Lth value, counting from the Final lowest Review. Triola, Essentials of Statistics, Third Edition. Copyright 2008. Pearson Education, Inc. 26 Quartiles Q 1 = P 25 Q 2 = P 50 Q 3 = P 75 27
Boxplot pulse rates (beats per minute) of smokers 52 52 60 60 60 60 63 63 66 67 68 69 71 72 73 75 78 80 82 83 88 90 5 - number summary Minimum - 52 first quartile Q1-60 Median - 68.5 third quartile Q3-78 Maximum - 90 28 Boxplot Box-and-Whisker Diagram 60 68.5 78 52 90 50 55 60 65 70 75 80 85 90 Boxplot of Pulse Rates (Beats per minute) of Smokers 29 Chapters 4 and 5 30
Fundamentals of Probability 31 Basic Rules for Computing Probability Rule 1: Relative Frequency Approximation Conduct (or observe) an experiment a large number of times, and count the number of times event A actually occurs, then an estimate of P(A) is P(A) number of times A occurred number of times trial was repeated 32 Rule 2: Classical approach (requires equally likely outcomes) If a procedure has n different simple events, each with an equal chance of occurring, and event A can occur in s of these ways, then P(A) = Basic Rules for Computing Probability s n = number of ways A can occur number of different simple events 33
Rule 1 Relative frequency approach Throwing a die 100 times and getting 15 threes P(3) = 0.150 Rule 2 Classical approach P(3 on a die) = 1/6 = 0.167 34 Probability Limits The probability of an impossible event is 0. The probability of an event that is certain to occur is 1. 0 P(A) 1 Impossible Certain to occur to occur A probability value must be a number between 0 and 1. 35 Complementary Events The complement of event A, denoted by A, consists of all outcomes in which event A does not occur. P(A) P(A) (read not A ) 36
Rounding Off Probabilities give the exact fraction or decimal or round the final result to three significant digits P(struck by lightning last year)» 0.00000143 37 Compound Event Any event combining 2 or more events Notation Definitions P(A or B) = P (event A occurs or event B occurs or they both occur) 38 Disjoint Events A = Green ball B = Blue ball } disjoint events P(A or B) = P(A) + P(B) = + = 4 8 1 8 5 8 39
Not Disjoint Events 6 1 7 2 8 0 3 4 5 9 A = Even number B = Number greater than 5 } Overlapping events; some counted twice P(A or B) = P(A) + P(B) - P(A and B) = 5 4 2 7 10 + 10-10 = 10 0 2 4 6 8 6 7 8 9 6 & 8 Final Review. Triola, Essentials of Statistics, Third Edition. Copyright counted 2008. Pearson twice Education, Inc. 40 Contingency Table Homicide Robbery Assault Totals Stranger 12 379 727 1118 Acqu. or Rel. 39 106 642 787 Unknown 18 20 57 95 Totals 69 505 1426 2000 Find the probability of randomly selecting one person from this group and getting someone who was robbed or was a stranger. P(robbed or a stranger) = 505 + 1118-379 = 1244 = 0.622 2000 2000 2000 2000 * * NOT Disjoint Events ** 41 Complementary Events P(A) and P(A) are disjoint events All simple events are either in A or A. P(A) + P(A) = 1 42
Finding the Probability of Two or More Selections Multiple selections Multiplication Rule 43 Definitions Independent Events Two events A and B are independent if the occurrence of one does not affect the probability of the occurrence of the other. Dependent Events If A and B are not independent, they are said to be dependent. 44 Find the probability of drawing two cards from a shuffled deck of cards such that the first is an Ace and the second is a King. (The cards are drawn without replacement.) P(Ace on first card) = P(King Ace) = 4 52 4 51 P(drawing Ace, then a King) = 4 4 = 52 51 DEPENDENT EVENTS 16 2652 0.00603 = 45
Independent Events G G D G G Two selections With replacement P (both good) = P (good and good) = 4 5 4 16 25 5 = = 0.64 46 Example: On a TV program it was reported that there is a 60% success rate for those who try to stop smoking through hypnosis. Find the probability that for 8 randomly selected smokers who undergo hypnosis, they all successfully quit smoking. P(all 8 quit smoking) = P(quit) P(quit) P(quit) P(quit) P(quit) P(quit) P(quit) P(quit) = (0.60) (0.60) (0.60) (0.60) (0.60) (0.60) (0.60) (0.60) 8 0.60 = 0.0168 or 47 Small Samples from Large Populations If small sample is drawn from large population (if n 5% of N), you can treat the events as independent. 48
Chapter 4 49 Probability Distribution x (# of correct) 0 1 2 3 4 5 P(x).05.10.25.40.15.05 P(x) 0.5 0.4 0.3 0.2 0.1 0.0.25.40.05.1.05 0 1 2 3 4 5 # of correct answers Probability Histogram.15 50 Requirements for Probability Distribution S P(x) = 1 where x assumes all possible values 0 P(x) 1 for every value of x 51
Mean, Variance and Standard Deviation of a Probability Distribution Mean s 2 µ = S x P(x) Variance = S[x 2 P(x) ] - µ 2 Standard Deviation s = S[x 2 P(x) ] - µ 2 52 Mean, Standard Deviation and Variance of Probability Distribution x 0 1 2 3 4 5 P(x).05.10.25.40.15.05 µ = 2.7 s = 1.2 2 s = 1.3 53 Binomial Experiment Definition 1. The procedure must have a fixed number of trials. 2. The trials must be independent.. (The outcome of any individual trial doesn t affect the probabilities in the other trials.) 3. Each trial must have all outcomes classified into two categories. 4. The probabilities must remain constant for each trial. 54
Binomial Probability Formula P(x) = n! (n - x )! x! p x q n-x 55 For n = 15 and p = 0.10 Table A-1 Binomial Probability Distribution n x P(x) 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.206 0.343 0.267 0.129 0.043 0.010 0.002 0.0+ 0.0+ 0.0+ 0.0+ 0.0+ 0.0+ 0.0+ 0.0+ 0.0+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x P(x) 0.206 0.343 0.267 0.129 0.043 0.010 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 56 Example: US Air has 20% of all domestic flights and one year had 4 of 7 consecutive major air crashes in the United States. Assuming that airline crashes are independent and random events, find the probability that when seven airliners crash, at least four of them are from US Air. According to the definition, this is a binomial experiment. n = 7 p = 0.20 q = 0.80 x = 4, 5, 6, 7 Table A-1 can be used. + + P(4,5,6,7) = 0.029 + 0.004 + 0 + 0 = 0.033 57
Binomial Probability Formula P(x) = n C r p x q n-x Number of outcomes with exactly x successes among n trials Probability of x successes among n trials for any one particular order 58 Example: Find the probability of getting exactly 3 left-handed students in a class of 20 if 10% of us are left-handed. This is a binomial experiment where: n = 20 x = 3 p =.10 q =.90 Table A-1 cannot be used; therefore, we must use the binomial formula. 3 17 P(3) = 20 C 3 0.1 0.9 = 0.190 59 For a Binomial Distribution: Mean µ = n p Variance s 2 = n p q Standard s = n Deviation = n p q 60
Example: US Air has 20% of all domestic flights. What is considered the unusual number of US Air crashes out of seven randomly selected crashes? We previously found for this binomial distribution, µ = 1.4 crashes s = 1.1 crashes µ - 2 s = 1.4-2(1.1) = - 0.8 (or 0) µ + 2 s = 1.4 + 2(1.1) = 3.6 The usual number of US Air crashes out of seven randomly selected crashes should be between -0.8 (or 0) and 3.6. Four crashes would be unusual! 61 Chapter 6 Normal Probability Distributions 62 6-2 The Standard Normal Distribution 63
Because the total area under the density curve is equal to 1, there is a correspondence between area and probability. 64 Definition Standard Normal Distribution a normal probability distribution that has a mean of 0 and a standard deviation of 1, and the total area under its density curve is equal to 1. -3-2 -1 0 1 2 3 65 NEGATIVE Z Scores Table A-2 66
Table A-2 Designed only for standard normal distribution Is on two pages: negative z-scores and positive z-scores Body of table is a cumulative area from the left up to a vertical boundary Avoid confusion between z-scores and areas Z-score hundredths is across the top row 67 Table A-2 Standard Normal Distribution Negative z-scores: cumulative from left x z 0 68 Table A-2 Standard Normal Distribution Positive z-scores: cumulative from left X z 69
Table A-2 Standard Normal Distribution µ = 0 s = 1 z = x - 0 1 X z 70 Table A-2 Standard Normal Distribution Area = Probability µ = 0 s = 1 z = x X z 71 Example: If thermometers have an average (mean) reading of 0 degrees and a standard deviation of 1 degree for freezing water and if one thermometer is randomly selected, find the probability that it reads freezing water is less than 1.58 degrees. m = 0 s = 1 P(z < 1.58) = 0.9429 94.29% of the thermometers will read freezing water less than 1.58 degrees. 72
Example: If we are using the same thermometers, and if one thermometer is randomly selected, find the probability that it reads (at the freezing point of water) above 1.23 degrees. P (z > 1.23) = 0.8907 The percentage of thermometers with a reading above -1.23 degrees is 89.07%. 73 Example: A thermometer is randomly selected. Find the probability that it reads (at the freezing point of water) between 2.00 and 1.50 degrees. P (z < 2.00) = 0.0228 P (z < 1.50) = 0.9332 P ( 2.00 < z < 1.50) = 0.9332 0.0228 = 0.9104 The probability that the chosen thermometer has a reading between 2.00 and 1.50 degrees is 0.9104. 74 The Empirical Rule Standard Normal Distribution: µ = 0 and s = 1 99.7% of data are within 3 standard deviations of the mean 95% within 2 standard deviations 68% within 1 standard deviation 34% 34% 2.4% 2.4% 0.1% 0.1% 13.5% 13.5% x - 3s x - 2s x - 1s x x + 1s x + 2s x + 3s 75
Notation P(a < z < b) betweena and b P(z > a) greater than, at least, more than, not less than P (z < a) less than, at most, no more than, not greater than 76 6-3 Applications of Normal Distributions 77 Converting to Standard Normal Distribution z = x m s Figure 6-12 78
Probability of Sitting Heights Less Than 38.8 Inches m = 36.0 σ= 1.4 z = 38.8 36.0 = 2.00 1.4 79 Probability of Sitting Heights Less Than 38.8 Inches m = 36.0 P ( x < 38.8 in.) = P(z < 2) σ= 1.4 = 0.9772 80 6.2 6.3 Finding Values of Normal Distributions 81
Procedure for Finding Values Using Table A-2 and Formula 6-2 1. Sketch a normal distribution curve, enter the given probability or percentage in the appropriate region of the graph, and identify the x value(s) being sought. 2. Use Table A-2 to find the z score corresponding to the cumulative left area bounded by x. Refer to the BODYof Table A-2 to find the closest area, then identify the corresponding z score. 3. Using Formula 6-2, enter the values for µ, s,, and thez score found in step 2, then solve for x. x = µ + (z s) (another form of Formula 6-2) (If z is located to the left of the mean, be sure that it is a negative number.) 4. Refer to the sketch of the curve to verify that the solution makes sense in the context of the graph and the context of the problem. 82 Find P 98 for Hip Breadths of Men x = m + (z? s) x = 14.4 + (2.05 1.0) x = 16.45 83 Table A-2: Positive Z- scores 84
Find P 98 for Hip Breadths of Men The hip breadth of 16.5 in. separates the lowest 98% from the highest 2% 16.5 85 6-5 The Central Limit Theorem 86 Central Limit Theorem Conclusions: 1. The distribution of sample means x will, as the sample size increases, approach a normal distribution. 2. The mean of the sample means will be the population mean µ. 3. The standard deviation of the sample means will approach will approach s/ n. 87
Practical Rules Commonly Used: 1. For samples of size n larger than 30, the distribution of the sample means can be approximated reasonably well by a normal distribution. The approximation gets better as the sample size n becomes larger. 2. If the original population is itself normally distributed, then the sample means will be normally distributed for any sample size n (not just the values of n larger than 30). 88 Notation the mean of the sample means µ x = µ the standard deviation of sample means s x = s (often called standard error of the mean) n 89 Example: Given the population of men has normally distributed weights with a mean of 172 lb. and a standard deviation of 29 lb, b.) if 12 different men are randomly selected, find the probability that their mean weight is greater than 167 lb. z = 167 172 = 0.60 29 12 90
Example: Given the population of men has normally distributed weights with a mean of 172 lb. and a standard deviation of 29 lb, b.) if 12 different men are randomly selected, find the probability that their mean weight is greater than 167 lb. The probability that the mean weight of 12 randomly selected men is greater than 167 lb. is 0.7257. 91 Chapter 7 Estimates and Sample Sizes 92 Definition Confidence Interval (or Interval Estimate) a range (or an interval) of values used to estimate the true value of the population parameter Lower # < population parameter < Upper # As an example 0.476 < p < 0.544 93
Confidence Interval for Population Proportion pˆ - E < p E = where z a / 2 < p ˆ + E ˆ p q ˆ n 94 Notation for Proportions p = population proportion p ˆ = x n sample proportion (pronounced p-hat ) of x successes in a sample of size n q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n 95 Round-Off Rule for Confidence Interval Estimates of p Round the confidence interval limits to three significant digits 96
Procedure for Constructing a Confidence Interval for p 1. Verify that the required assumptions are satisfied. (The sample is a simple random sample, the conditions for the binomial distribution are satisfied, and the normal distribution can be used to approximate the distribution of sample proportions because np 5, and nq 5 are both satisfied). 2. Refer to Table A-2 and find the critical value z a/2 that corresponds to the desired confidence level. 3. Evaluate the margin of error E = p q ˆˆ n 97 Procedure for Constructing a Confidence Interval for p 4. Using the calculated margin of error, E and the value of the sample proportion, p, ˆ, find the values of p ˆ E and p ˆ + E.. Substitute those values in the general format for the confidence interval: p ˆ E < p < p ˆ + E 5. Round the resulting confidence interval limits to three significant digits. 98 Example: In the Chapter Problem, we noted that 829 adult Minnesotans were surveyed, and 51% of them are opposed to the use of the photo-cop for issuing traffic tickets. Use these survey results. Find the 95% confidence interval estimate of the population proportion p. 99
Example: In the Chapter Problem, we noted that 829 adult Minnesotans were surveyed, and 51% of them are opposed to the use of the photo-cop for issuing traffic tickets. Use these survey results. First, we check for assumptions. We note that np = 422.79 5, and nq = 406.21 5. ˆ ˆ Next, we calculate the margin of error. We have found that p = 0.51, q = 1 0.51 = 0.49, z a/ 2 = 1.96, and n = 829. ˆ E = 1.96 E = 0.03403 ˆ (0.51)(0.49) 829 100 Example: In the Chapter Problem, we noted that 829 adult Minnesotans were surveyed, and 51% of them are opposed to the use of the photo-cop for issuing traffic tickets. Use these survey results. Find the 95% confidence interval for the population proportion p. We substitute our values from Part a to obtain: 0.51 0.03403 < p < 0.51 + 0.03403, 0.476 < p < 0.544 101 Example: In a given example, we noted that 829 adult Minnesotans were surveyed, and 51% of them are opposed to the use of the photo-cop for issuing traffic tickets. Use these survey results. Based on the results, can we safely conclude that the majority of adult Minnesotans oppose use of the photo-cop? Based on the survey results, we are 95% confident that the limits of 47.6% and 54.4% contain the true percentage of adult Minnesotans opposed to the photo-cop. The percentage of opposed adult Minnesotans is likely to be any value between 47.6% and 54.4%. However, a majority requires a percentage greater than 50%, so we cannot safely conclude that the majority is opposed (because the entire confidence interval is not greater than 50%). 102
Estimating a Population Mean: s Not Known 103 Confidence Interval for the Estimate of m Based on an Unknown s and a Small Simple Random Sample from a Normally Distributed Population x - E < µ < x + E where E = t s a/2 n t a/2 found in Table A-3 104 Table A-3 t Distribution Degrees of freedom.005 (one tail).01 (two tails).01 (one tail).02 (two tails).025 (one tail).05 (two tails).05 (one tail).10 (two tails).10 (one tail).20 (two tails).25 (one tail).50 (two tails) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Large (z) 63.657 9.925 5.841 4.604 4.032 3.707 3.500 3.355 3.250 3.169 3.106 3.054 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.625 2.602 2.584 2.567 2.552 2.540 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.132 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.320 1.318 1.316 1.315 1.314 1.313 2.756 2.462 2.045 1.699 1.311 2.575 2.327 1.960 1.645 1.282 1.000.816.765.741.727.718.711.706.703.700.697.696.694.692.691.690.689.688.688.687.686.686.685.685.684.684.684.683.683.675 105
Example: A study of 12 Dodge Vipers involved in collisions resulted in repairs averaging $26,227 and a standard deviation of $15,873. Find the 95% interval estimate of m, the mean repair cost for all Dodge Vipers involved in collisions. (The 12 cars distribution appears to be bell-shaped.) x = 26,227 s = 15,873 a = 0.05 a/2 = 0.025 t a /2 = 2.201 E = t a / 2 s = (2.201)(15,873) = 10,085.3 n 12 x - E < µ < x + E 26,227-10,085.3 < µ < 26,227 + 10,085.3 $16,141.7 < µ < $36,312.3 We are 95% confident that this interval contains the average cost of repairing a Dodge Viper. 106 End of 7-2 and 7-3 Determining Sample Size Required to Estimate p and m 107 Sample Size for Estimating Proportion p When an estimate of p is known: n = ˆ z a/2 E 2 ˆ ˆ ( ) 2 p q ˆ Formula 7-2 When no estimate of p is known: ( z ) 2 0.25 Formula 7-3 n = a/2 E 2 108
Example: Example:We want to determine, with a margin of error of four percentage points, the current percentage of U.S. households using e-mail. Assuming that we want 90% confidence in our results, how many households must we survey? A 1997 study indicates 16.9% of U.S. households used e-mail. n = [z a /2 ] 2 p q E 2 ˆˆ = [1.645] 2 (0.169)(0.831) 0.04 2 = 237.51965 = 238 households To be 90% confident that our sample percentage is within four percentage points of the true percentage for all households, we should randomly select and survey 238 households. 109 Example: Example:We want to determine, with a margin of error of four percentage points, the current percentage of U.S. households using e-mail. Assuming that we want 90% confidence in our results, how many households must we survey? There is no prior information suggesting a possible value for the sample percentage. n = [z a /2 ] 2 (0.25) E = (1.645) 2 (0.25) 2 0.04 2 = 422.81641 = 423 households With no prior information, we need a larger sample to achieve the same results with 90% confidence and an error of no more than 4%. 110 Sample Size for Estimating Mean m E = z s a/2 n (solve for n by algebra) n = z a/2 E s 2 Formula 7-5 z a /2 /2 = critical z score based on the desired degree of confidence E = desired margin of error s = population standard deviation 111
Example: If we want to estimate the mean weight of plastic discarded by households in one week, how many households must be randomly selected to be 99% confident that the sample mean is within 0.25 lb of the true population mean? (A previous study indicates the standard deviation is 1.065 lb.) a = 0.01 z a/2 = 2.575 E = 0.25 s = 1.065 2 2 n = z a/2 s = (2.575)(1.065) E 0.25 = 120.3 = 121 households We would need to randomly select 121 households and obtain the average weight of plastic discarded in one week. We would be 99% confident that this mean is within 1/4 lb of the population mean. 112 Chapter 8 Hypothesis Testing 113 Claim: Using math symbols H 0 : Must contain equality H 1 : Will contain,, <, > 114
Test Statistic The test statistic is a value computed from the sample data, and it is used in making the decision about the rejection of the null hypothesis. /\ z = p - p pq n Test statistic for proportions 115 Test Statistic The test statistic is a value computed from the sample data, and it is used in making the decision about the rejection of the null hypothesis. t = x - µ x s n Test statistic for mean 116 Test Statistic The test statistic is a value computed from the sample data, and it is used in making the decision about the rejection of the null hypothesis. c 2 = (n 1)s2 s 2 Test statistic for standard deviation 117
Critical Region Set of all values of the test statistic that would cause a rejection of the null hypothesis Critical Regions 118 Critical Value Any value that separates the critical region (where we reject the null hypothesis) from the values of the test statistic that do not lead to a rejection of the null hypothesis Reject H 0 Fail to reject H 0 Critical Value ( z score ) 119 Two-tailed, Right-tailed, tailed, Left-tailed tailed Tests The tails in a distribution are the extreme regions bounded by critical values. 120
Decision Criterion Traditional method: Reject H 0 if the test statistic falls within the critical region. Fail to reject H 0 if the test statistic does not fall within the critical region. 121 Wording of Final Conclusion Figure 8-7 122 Comprehensive Hypothesis Test 123
Example: It was found that 821 crashes of midsize cars equipped with air bags, 46 of the crashes resulted in hospitalization of the drivers. Using the 0.01 significance level, test the claim that the air bag hospitalization is lower than the 7.8% rate for cars with automatic tic safety belts. Claim: p < 0.078 p = 46 / 821 = 0.0560 reject H 0 H 0 : p = 0.078 H 1 : p < 0.078 a = 0.01 p = 0.056 z = - 2.35 p - p 0.056-0.078 pq n (0.078 )(0.922) 821 z = =» - 2.35 z = - 2.33 p = 0.078 There is sufficient evidence to support claim that the air bag hospitalization rate is lower than the 7.8% rate for automatic safety belts. 124 8-5 Testing a Claim about a Mean: s Not Known 125 Example: Seven axial load scores are listed below. At the 0.01 level of significance, test the claim that this sample comes from a population with a mean that is greater than 165 lbs. 270 273 258 204 254 228 282 n = 7 df = 6 x = 252.7 lb s = 27.6 lb Claim: Claim: µ > 165 lb H 0 : µ = 165 lb H 1 : µ > 165 lb (right tailed test) 126
a = 0.01 0.01 165 0 t = 3.143 252.7 t = 8.407 x - µ t = x 252.7-165 s = = 8.407 n 27.6 7 Reject H o 127 Example: Seven axial load scores are listed below. At the 0.01 level of significance, test the claim that this sample comes from a population with a mean that is greater than 165 lbs. 270 273 258 204 254 228 282 Final conclusion: There is sufficient evidence to support the claim that the sample comes from a population with a mean greater than 165 lbs. Reject Claim: µ > 165 lb H 0 : µ = 165 lb H 1 : µ > 165 lb (right tailed test) 128 8-6 Testing a Claim about a Standard Deviation or Variance 129
Chi-Square Distribution Test Statistic X 2 = (n - 1) s 2 s 2 n s 2 s 2 = sample size = sample variance = population variance (given in null hypothesis) 130 Critical Values and P-values for Chi-Square Distribution Found in Table A-4 Degrees of freedom = n -1 Based on cumulative areas from the RIGHT 131 Table A-4: Critical values are found by determining the area to the RIGHT of the critical value. 0.025 0.975 0.025 df = 80 a = 0.05 a/2 = 0.025 57.153 106.629 132
Example: Aircraft altimeters have measuring errors with a standard deviation of 43.7 ft. With new production equipment, 81 altimeters ers measure errors with a standard deviation of 52.3 ft. Use the 0.0505 significance level to test the claim that the new altimeters have a standard deviation different from the old value of 43.7 ft. Claim: s 43.7 H 0 : s = 43.7 H 1 : s 43.7 0.025 a = 0.05 0.975 a/2 = 0.025 0.025 n = 81 df = 80 Table A-4 57.153 106.629 133 x 2 = (n -1)s 2 (81-1) (52.3) = 2» 114.586 s 2 43.7 2 Reject H 0 57.153 106.629 x 2 = 114.586 134 Example: Aircraft altimeters have measuring errors with a standard deviation of 43.7 ft. With new production equipment, 81 altimeters ers measure errors with a standard deviation of 52.3 ft. Use the 0.0505 significance level to test the claim that the new altimeters have a standard deviation different from the old value of 43.7 ft. SUPPORT REJECT Claim: s 43.7 H 0 : s = 43.7 H 1 : s 43.7 The new production method appears to be worse than the old method. The data supports that there is more variation in the error readings than before. 135
Table 8-3 Hypothesis Tests Parameter Conditions Distribution and Test Statistic Critical and P-values Proportion np =5 and nq =5 Normal: z p = ˆ p p q Table A-2 n Mean Standard Deviation or Variance σ not known and normally distributed or n =30 Population normally distributed Student t: t = X s n µ Chi-Square: x 2 ( n 1) = 2 σ X 2 s Table A-3 Table A-4 136 Chapter 10 Correlation and Regression 137 Overview Paired Data is there a relationship if so, what is the equation use the equation for prediction 138
Definition Correlation exists between two variables when one of them is related to the other in some way 139 Definition Scatterplot (or scatter diagram) is a graph in which the paired (x,y)) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual (x,y)) pair is plotted as a single point. 140 Scatter Diagram of Paired Data Lengths and Weights of Male Bears 500 Weight (lb.) 400 300 200 (72,416) (68.5,360) (67.5,344) (72,348) (73,332) (73.5,262) 100 0 (37,34) (53,80) 35 40 45 50 55 60 65 70 75 Length (in.) 141
Positive Linear Correlation y y y (a) Positive x (b) Strong positive x (c) Perfect positive x Figure 10-2 Scatter Plots 142 Negative Linear Correlation y y y (d) Negative x (e) Strong negative x (f) Perfect negative x Figure 10-2 Scatter Plots 143 No Linear Correlation y y (g) No Correlation x (h) Nonlinear Correlation x Figure 10-2 Scatter Plots 144
Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired x- and y-quantitative values in a sample 145 Definition Linear Correlation Coefficient r r = nsxy - (Sx)(Sy) n(sx 2 ) - (Sx) 2 n(sy 2 ) - (Sy) 2 Formula 10-1 Calculators can compute r r (rho) is the linear correlation coefficient for all paired data in the population. 146 Rounding the Linear Correlation Coefficient r Round to three decimal places so that it can be compared to critical values in Table A-5 Use calculator or computer if possible 147
Interpreting the Linear Correlation Coefficient If the absolute value of r exceeds the value in Table A - 5, conclude that there is a significant linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation. 148 TABLE A-5 Critical Values of the Pearson Correlation Coefficient r n 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 a =.05 a =.01.950.878.811.754.707.666.632.602.576.553.532.514.497.482.468.456.444.396.361.335.312.294.279.254.236.220.207.196.999.959.917.875.834.798.765.735.708.684.661.641.623.606.590.575.561.505.463.430.402.378.361.330.305.286.269.256 149 Properties of the Linear Correlation Coefficient r 1. -1 r 1 2. Value of r does not change if all values of either variable are converted to a different scale. 3. The value of r is not affected by the choice of x and y.. Interchange x and y and the value of r will not change. 4. r measures strength of a linearrelationship. relationship. 150
Formal Hypothesis Test To determine whether there is a significant linear correlation between two variables Two methods Both methods let H 0 : r = 0 (no significant linear correlation) H 1 : r 0 (significant linear correlation) 151 Method 2: Test Statistic is r (uses fewer calculations) Test statistic: r Critical values: Refer to Table A-5 (no degrees of freedom) Reject r = 0 Fail to reject r = 0 Reject r = 0-1 r = - 0.811 0 r = 0.811 1 Sample data: r = 0.828 152 Is there a significant linear correlation? Data from the Garbage Project x Plastic (lb) y Household 0.27 2 1.41 3 2.19 3 2.83 6 2.19 4 1.81 2 0.85 1 3.05 5 n = 8 a = 0.05 H 0 : r = 0 H 1 :r 0 Test statistic is r = 0.842 153
Is there a significant linear correlation? n = 8 a = 0.05 H 0 : r = 0 H 1 :r 0 Test statistic is r = 0.842 Critical values are r = - 0.707 and 0.707 (Table A-5 with n = 8 and a = 0.05) TABLE A -5 Critical Values of the Pearson Correlation Coefficient r n 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 a =.05 a =.01.950.878.811.754.707.666.632.602.576.553.532.514.497.482.468.456.444.396.361.335.312.294.279.254.236.220.207.196.999.959.917.875.834.798.765.735.708.684.661.641.623.606.590.575.561.505.463.430.402.378.361.330.305.286.269.256 154 Is there a significant linear correlation? 0.842 > 0.707 The test statistic does fall within the critical region. Therefore, we REJECT H 0 : r = 0 (no correlation) and conclude there is a significant linear correlation between the weights of discarded plastic and household size. Reject r = 0 Fail to reject r = 0 Reject r = 0-1 r = - 0.707 0 r = 0.707 1 Sample data: r = 0.842 155 10.3 Regression 156
Definition Regression Regression Equation Given a collection of paired data, the regression equation y ^ = b 0 + b 1 x algebraically describes the relationship between the two variables Regression Line (line of best fit or least-squares squares line) the graph of the regression equation 157 The Regression Equation x is the independent variable (predictor variable) ^y is the dependent variable (response variable) y ^= b 0 +b 1 x y = mx +b b 0 = y - intercept b 1 = slope 158 Regression Line Plotted on Scatter Plot 159
Formula for b 1 and b 0 Formula 10-2 b 1 = n(σxy) (Σx) (Σy) n(σx 2 ) (Σx) 2 (slope) Formula 10-3 b 0 = y b 1 x (y-intercept) calculators or computers can compute these values 160 Rounding the y-intercept b 0 and the slope b 1 Round to three significant digits If you use the formulas 10-2, 10-3, try not to round intermediate values or carry to at least six significant digits. 161 Example: Lengths and Weights of Male Bears x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0 y Weight (lb) 80 344 416 348 262 360 332 34 b 0 = - 352 (rounded) b 1 = 9.66 (rounded) y ^ = - 352 + 9.66x 162
Scatter Diagram of Paired Data Lengths and Weights of Male Bears 500 Weight (lb.) 400 300 200 100 0 35 40 45 50 55 60 65 70 75 Length (in.) 163 Predictions In predicting a value of y based on some given value of x... 1. If there is not a significant linear correlation, the best predicted y-value is y. 2. If there is a significant linear correlation, the best predicted y-value is found by substituting the x-value into the regression equation. 164 Guidelines for Using The Regression Equation 1. If there is no significant linear correlation, don t use the regression equation to make predictions. 2. When using the regression equation for predictions, stay within the scope of the available sample data. 3. A regression equation based on old data is not necessarily valid now. 4. Don t make predictions about a population that is different from the population from which the sample data was drawn. 165
Example: Lengths and Weights of Male Bears x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0 y Weight (lb.) 80 344 416 348 262 360 332 34 y ^ = - 352 + 9.66x What is the weight of a bear that is 60 inches long? Since the data does have a significant positive linear correlation, we can use the regression equation for prediction. 166 Example: Lengths and Weights of Male Bears x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0 y Weight (lb.) 80 344 416 348 262 360 332 34 y ^ = - 352 + 9.66 (60) ^ y = 227.6 pounds 167 Example: Lengths and Weights of Male Bears x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0 y Weight (lb.) 80 344 416 348 262 360 332 34 A bear that is 60 inches long will weigh approximately 227.6 pounds. 168
Example: Lengths and Weights of Male Bears x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0 y Weight (lb.) 80 344 416 348 262 360 332 34 If there were no significant linear correlation, to predict a weight for any length: use the average of the weights (y-values) y = 272 lbs 169 Chapter 11 Multinomial Experiments And Contingency Tables 170 11-2 Multinomial Experiments 171
Definition Goodness-of-fit fit test used to test the hypothesis that an observed frequency distribution fits (or conforms to) some claimed distribution 172 Goodness-of-Fit Test Notation 0 represents the observed frequency of an outcome E represents the expected frequency of an outcome k n represents the number of different categories or outcomes represents the total number of trials 173 Expected Frequencies If all expected frequencies are equal: E = n k the sum of all observed frequencies divided by the number of categories 174
Expected Frequencies If all expected frequencies are not all equal: E = n p each expected frequency is found by multiplying the sum of all observed frequencies by the probability for the category 175 Key Question We need to measure the discrepancy between O and E; the test statistic will involve their difference: O - E 176 Test Statistic X 2 = S (O - E)2 E Critical Values 1. Found in Table A-4 using k -1 degrees of freedom where k = number of categories 2. Goodness-of-fit fit hypothesis tests are always right-tailed. tailed. 177
Multinomial Experiment: Goodness-of-Fit Test H 0 : No difference between observed and expected probabilities H 1 : at least one of the probabilities is different from the others 178 Categories with Equal Frequencies H 0 : p 1 = p 2 = p =... = p 3 k H 1 (Probabilities) : at least one of the probabilities is different from the others 179 Example: A study was made of 147 industrial accidents that required medical attention. Test the claim that the accidents occur with equal proportions on the 5 workdays. Frequency of Accidents Day Mon Tues Wed Thurs Fri Observed accidents 31 42 18 25 31 Claim: Accidents occur with the same proportion (frequency); that is, p 1 = p 2 = p 3 = p 4 = p 5 H 0 : p 1 = p 2 = p 3 = p 4 = p 5 H 1 : At least 1 of the 5 proportions is different from others 180
Example: A study was made of 147 industrial accidents that required medical attention. Test the claim that the accidents occur with equal proportions on the 5 workdays. Frequency of Accidents Day Mon Tues Wed Thurs Fri Observed accidents 31 42 18 25 31 O: E: E = n/k = 147/5 = 29.4 Observed and Expected Frequencies Day Mon Tues Wed Thurs Fri Observed accidents 31 42 18 25 31 Expected accidents 29.4 29.4 29.4 29.4 29.4 181 Observed and Expected Frequencies of Industrial Accidents Day Mon Tues Wed Thurs Fri Observed accidents 31 42 18 25 31 Expected accidents 29.4 29.4 29.4 29.4 29.4 (O -E) 2 /E 0.0871 5.4000 4.4204 0.6585 0.0871 (rounded) Test Statistic: X 2 (O -E) = S 2 E = 0.0871 + 5.4000 + 4.4204 + 0.6585 + 0.0871 = 10.6531 Critical Value:X 2 = 9.488 Table A-4 with k-1 1 = 5-1 = 4 and a = 0.05 182 Fail to Reject p 1 = p 2 = p 3 = p 4 = p 5 Reject p 1 = p 2 = p 3 = p 4 = p 5 a = 0.05 0 X 2 = 9.488 Sample data: X 2 = 10.653 Test Statistic falls within the critical region: REJECT the null hypothesis Claim: Accidents occur with the same proportion (frequency); that is, p 1 = p 2 = p 3 = p 4 = p 5 H 0 : p 1 = p 2 = p 3 = p 4 = p 5 H 1 : At least 1 of the 5 proportions is different from others 183
Fail to Reject p 1 = p 2 = p 3 = p 4 = p 5 Reject p 1 = p 2 = p 3 = p 4 = p 5 a = 0.05 0 X 2 = 9.488 Sample data: X 2 = 10.653 Test Statistic falls within the critical region: REJECT the null hypothesis We reject claim that the accidents occur with equal proportions (frequency) on the 5 workdays. (Although it appears Wednesday has a lower accident rate, arriving at such a conclusion would require other methods of analysis.) 184 Categories with Unequal Frequencies (Probabilities) H 0 : p 1, p 2, p,..., p 3 k are as claimed H 1 : at least one of the above proportions is different from the claimed value 185 Example: Mars, Inc. claims its M&M candies are distributed with the color percentages of 30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% blue. At the 0.05 significance level, test the claim that the color distribution is as claimed by Mars, Inc. Claim: p = 0.30, p 1 2 = 0.20, p 3 = 0.20, p 4 p = 0.10, p = 0.10, 5 6 = 0.10 H : p 0 1 = 0.30, p 2 = 0.20, p 3 = 0.20, p 4 = 0.10, p = 0.10, p 5 6 = 0.10 H 1 : At least one of the proportions is different from the claimed value. 186
Example: Mars, Inc. claims its M&M candies are distributed with the color percentages of 30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% blue. At the 0.05 significance level, test the claim that the color distribution is as claimed by Mars, Inc. Frequencies of M&Ms Brown Yellow Red Orange Green Blue Observed frequency 33 26 21 8 7 5 n = 100 Brown E= np = (100)(0.30) = 30 Yellow E= np = (100)(0.20) = 20 Red E= np = (100)(0.20) = 20 Orange E= np = (100)(0.10) = 10 Green E= np = (100)(0.10) = 10 Blue E= np = (100)(0.10) = 10 187 Frequencies of M&Ms Brown Yellow Red Orange Green Blue Observed frequency 33 26 21 8 7 5 Expected frequency 30 20 20 10 10 10 (O -E) 2 /E 0.3 1.8 0.05 0.4 0.9 2.5 Test Statistic (O - E) 2 X 2 = S E = 5.95 Critical Value X 2 =11.071 (with k-1 1 = 5 and a = 0.05) 188 Fail to Reject Reject a = 0.05 0 X 2 = 11.071 Sample data: X 2 = 5.95 Test Statistic does not fall within critical region; Fail to reject H 0 : percentages are as claimed There is not sufficient evidence to warrant rejection of the claim that the colors are distributed with the given percentages. 189
11-3 Contingency Tables 190 Definition Contingency Table (or two -way frequency table) a table in which frequencies correspond to two variables. (One variable is used to categorize rows, and a second variable is used to categorize columns.) Contingency tables have at least two rows and at least two columns. 191 Definition Test of Independence tests the null hypothesis that there is no association between the row variable and the column variable. (The null hypothesis is the statement that the row and column variables are independent.) 192
H 0 H 1 Tests of Independence : The row variable is independent of the column variable : The row variable is dependent (related to) the column variable This procedure cannot be used to establish a direct cause-and-effect link between variables in question. Dependence means only there is a relationship between the two variables. 193 Test of Independence Test Statistic X 2 = S (O - E)2 E Critical Values 1. Found in Table A-4 using degrees of freedom = (r - 1)(c - 1) r is the number of rows and c is the number of columns 2. Tests of Independence are always right-tailed. tailed. 194 E = (row total) (column total) (grand total) Total number of all observed frequencies in the table 195
Is the type of crime independent of whether the criminal is a stranger? Stranger Homicide Robbery Assault 12 379 727 Row Total 1118 Acquaintance or Relative 39 106 642 787 Column Total 51 485 1369 1905 H 0 : Type of crime is independent of knowing the criminal H 1 : Type of crime is dependent with knowing the criminal 196 Is the type of crime independent of whether the criminal is a stranger? Stranger Acquaintance or Relative Homicide Robbery Assault 12 379 727 (29.93) 39 (284.64) 106 (803.43) 642 (21.07) (200.36) (565.57) Row Total 1118 787 Column Total 51 485 1369 1905 E = (1118)(51) 1905 (row total) (column total) E = (grand total) = 29.93 E = (1118)(485) 1905 etc. = 284.64 197 Is the type of crime independent of whether the criminal is a stranger? X 2 = S (O - E )2 E Stranger Acquaintance or Relative Homicide Robbery Forgery 12 (29.93) [10.741] 39 (21.07) [15.258] 379 (284.64) [31.281] 106 (200.36) [44.439] 727 (803.43) [7.271] 642 (565.57) [10.329] (E) (O - E ) 2 E (O -E ) 2 (12-29.93) 2 Upper left cell: = = 10.741 E 29.93 198
Is the type of crime independent of whether the criminal is a stranger? X 2 = S (O - E )2 E Stranger Acquaintance or Relative Homicide Robbery Forgery 12 (29.93) [10.741] 39 (21.07) [15.258] 379 (284.64) [31.281] 106 (200.36) [44.439] 727 (803.43) [7.271] 642 (565.57) [10.329] (E) (O - E ) 2 E Test Statistic X 2 = 10.741 + 31.281 +... + 10.329 = 119.319 199 Test Statistic: X 2 = 119.319 with a = 0.05 and (r -1) (c -1) = (2-1) (3-1) = 2 Critical Value:X 2 = 5.991 (from Table A-4) 1) = 2degrees of freedom Fail to Reject Independence Reject Independence 0 X 2 = 5.991 a = 0.05 Reject independence Sample data: X 2 =119.319 H o H 1 : The type of crime and knowing the criminal are independent : The type of crime and knowing the criminal are dependent 200 Test Statistic: X 2 = 119.319 with a = 0.05 and (r -1) (c -1) = (2-1) (3-1) = 2 Critical Value:X 2 = 5.991 (from Table A-4) 1) = 2degrees of freedom Fail to Reject Independence Reject Independence 0 X 2 = 5.991 a = 0.05 Reject independence Sample data: X 2 =119.319 It appears that the type of crime and knowing the criminal are related. 201
Definition Test of Homogeneity tests the claim that different populations have the same proportions of some characteristics 202 Example - Test of Homogeneity Seat Belt Use in Taxi Cabs New York Chicago Pittsburgh Taxi has Yes 3 42 2 usable No seat belt? 74 87 70 Claim: The 3 cities have the same proportion of taxis with usab le seat belts H 0 : The 3 cities have the same proportion of taxis with usable seat belts H 1 : The proportion of taxis with usable seat belts is not the same in all 3 cities Fail to Reject homogeneity 0 a = 0.05 X 2 = 5.991 Reject homogeneity There is sufficient evidence to warrant rejection of the claim that the 3 cities have the same proportion of usable seat belts in taxis; appears from Table Chicago has a much higher proportion. Sample data: X Final Review. Triola, Essentials of Statistics, Third 2 = 42.004 Edition. Copyright 2008. Pearson Education, Inc. 203 204