4) The goodness of fit test is always a one tail test with the rejection region in the upper tail. Answer: TRUE

Business Statistics, 9e (Groebner/Shannon/Fry) Chapter 13 Goodness of Fit Tests and Contingency Analysis 1) A goodness of fit test can be used to determine whether a set of sample data comes from a specific hypothesized population distribution. 2) If the test statistic for a chi square goodness of fit test is larger than the critical value, the null hypothesis should be rejected. 3) The logic behind the chi square goodness of fit test is based on determining how far the actual observed frequencies are from the expected frequencies. 4) The goodness of fit test is always a one tail test with the rejection region in the upper tail. 5) When the expected cell frequencies are smaller than 5, the cells should be combined in a meaningful way such that the expected cell frequencies do exceed 5. 6) The reason that a decision maker might want to combine groups before performing a goodness of fit test is to avoid accepting the null hypothesis due to an inflated value of the test statistic. Answer: FALSE 7) In a goodness of fit test, when the null hypothesis is true, the expected value for the chi square test statistic is zero. 8) The Conrad Real Estate Company recently conducted a statistical test to determine whether the number of days that homes are on the market prior to selling is normally distributed with a mean equal to 50 days and a standard deviation equal to 10 days. The sample of 200 homes was divided into 8 groups to form a grouped data frequency distribution. The degrees of freedom for the test will be 7. 9) The Conrad Real Estate Company recently conducted a statistical test to determine whether the number of days that homes are on the market prior to selling is normally distributed with a mean equal to 50 days and a standard deviation equal to 10 days. The sample of 200 homes was divided into 8 groups to form a grouped data frequency distribution. If a chi square goodness of fit test is to be conducted using an alpha =.05, the critical value is 14.0671. 10) A business with 5 copy machines keeps track of how many copy machines need service on a given day. It believes this is binomially distributed with a probability of p = 0.2 of each machine needing service on any given day. It has collected the following based on a random sample of 100 days. X 0 1 2 3 4 or 5 Frequency 28 38 22 7 5 Given this information, assuming that all expected values are sufficiently large to use the classes as shown above, the critical value for testing the hypothesis will be based on 5 degrees of freedom. Answer: FALSE 11)Given this information the expected number of days on which exactly 1 machine breaks down is 40.96. 12) Given this information, assuming that all expected values are sufficiently large to use the classes as shown above, the critical value based on a 0.05 level of significance is 9.4877. 13) It is believed that the number of drivers who are ticketed for speeding on a particular stretch of highway is a Poisson distribution with a mean of 3.5 per hour. A random sample of 100 hours is selected with the following results: X 0 1 2 3 4 5 6 7 8 9 Frequency 5 10 20 18 20 15 4 6 1 2 Given this information, and without regard to whether there is a need to combine cells due to expected cell frequencies, the critical value for testing whether the distribution is Poisson with a mean of 3.5 per hour at an alpha level of.05 is x2 = 15.5073. Answer: FALSE 13 1

14) Given this information, it can be seen that the cells will need to be combined since the actual number of occurrences at some levels of x is less than 5. Answer: FALSE 15) If the sample size is large, the standard normal distribution can be used in place of the chi square in a goodness of fit test for testing whether the population is normally distributed. Answer: FALSE 16) By combining cells we guard against having an inflated test statistic that could have led us to incorrectly accept the null hypothesis. Answer: FALSE 17) If any of the observed frequencies are smaller than 5, then categories should be combined until all observed frequencies are at least 5. Answer: FALSE 18) A lube and oil change business believes that the number of cars that arrive for service is the same each day of the week. If the business is open six days a week (Monday Saturday) and a random sample of n = 200 customers is selected, the expected number that will arrive on Monday is about 33.33. 19) The sum of the expected frequencies over the six days cannot be determined without seeing the actual sample data. Answer: FALSE 20) The critical value for testing the hypothesis using a goodness of fit test is x2 = 9.2363 if the alpha level for the test is set at.10. 21) A goodness of fit test can decide whether a set of data comes from a specific hypothesized distribution. 22) If the calculated chi square statistic is large, this is evidence to suggest the fit of the actual data to the hypothesized distribution is not good, and H0 should be rejected. 23) The goodness of fit test is essentially determining if the test statistic is significantly larger than zero. Answer: FALSE 24) By combining cells we guard against having an inflated test statistic that could have led us to incorrectly reject the H0. 25) If one or more parameters are left unspecified in a goodness of fit test, they must be estimated from the sample data and one degree of freedom is lost for each parameter that must be estimated. 26) The sampling distribution for a goodness of fit test is the Poisson distribution. Answer: FALSE 27) Contingency analysis helps to make decisions when multiple proportions are involved. 28) Contingency analysis is used only for numerical data. Answer: FALSE 29) Managers use contingency analysis to determine whether two categorical variables are independent of each other. 30) A survey was recently conducted in which males and females were asked whether they owned a laptop personal computer. The following data were observed: Males Females Have Laptop 120 70 No Laptop 50 60 Given this information, the sample size in the survey was 300 people. 31) Given this information, if having a laptop is independent of gender, the expected number of males with laptops in this survey is 150. Answer: FALSE 13 2

32)Given this information, if an alpha level of.05 is used, the critical value for testing whether the two variables are independent is x2 = 3.8415. 33)Given this information, if an alpha level of.05 is used, the sum of the expected cell frequencies will be equal to the sum of the observed cell frequencies. 34)Given this information, if an alpha level of.05 is used, the test statistic for determining whether having a laptop is independent of gender is approximately 14.23. Answer: FALSE 35) When the variables of interest are both categorical and the decision maker is interested in determining whether a relationship exists between the two, a statistical technique known as contingency analysis is useful. 36) In conducting a test of independence for a contingency table that has 4 rows and 3 columns, the number of degrees of freedom is 11. Answer: FALSE 37) A study was recently conducted in which people were asked to indicate which new medium was their preferred choice for national news. The following data were observed: radio television newspaper under 21 30 50 5 21 40 20 25 30 41 and over 30 30 50 Given this data, if we wish to test whether the preferred news source is independent of age, the expected frequency in the cell, radio under 21 cell is 30. Answer: FALSE 38) A cell phone company wants to determine if the use of text messaging is independent of age. The follow data has been collected from a random sample of their customers. Regularly use textdo not regularly messaging use text messaging Under 21 82 38 21 39 57 34 40 and over 6 83 Using the data above, in order to test for the independence of age and the use of text messaging, the expected value for the ʺunder 21 and regularly use text messagingʺ cell is 82. Answer: FALSE 39) A study was recently conducted in which people were asked to indicate which news medium was their preferred choice for national news. The following data were observed: radio television newspaper under 21 30 50 5 21 40 20 25 30 41 and over 30 30 50 Given this data, if we wish to test whether the preferred news source is independent of age with an alpha equal to.05, the critical value will be a chi square value with 9 degrees of freedom. Answer: FALSE 40) Given this data, if we wish to test whether the preferred news source is independent of age, the cell with the largest expected cell frequency is also the cell with the largest observed frequency. Answer: FALSE 41) Given this data, if we wish to test whether the preferred news source is independent of age, for an alpha =.05 level, the critical value from the chi square table is based on 8 degrees of freedom. Answer: FALSE 42) Given this data, if we wish to test whether the preferred news source is independent of age, for an alpha =.05 level, the critical value from the chi square table is 9.4877. 43) Given this data, if we wish to test whether the preferred news source is independent of age, for an alpha =.05 level, the test statistic is computed to be approximately 40.70. 44) A cell phone company wants to determine if the use of text messaging is independent of age. The following data has been collected from a random sample of customers. 13 3

Regularly use text messaging Do not regularly use text messaging Under 21 82 38 21 39 57 34 40 and over 6 8 Using this data, if we wish to test whether the preferred news source is independent of age using a 0.05 level of significance, the critical value is 5.9915. 45) In order to apply the chi square contingency methodology for quantitative variables, we must first break the quantitative variable down into discrete categories. 46) A study was recently done in the United States in which car owners were asked to indicate whether their most recent car purchase was a U.S. car, a German car, or a Japanese car. The people in the survey were divided by geographic region in the United States. The following data were recorded. US Japanese German East Coast 200 200 50 Central 250 100 20 West Coast 80 300 40 Given this situation, the sample size used in this study was nine. Answer: FALSE 47) Given this situation, the null hypothesis to be tested is that the car origin is dependent on the geographical location of the buyer. Answer: FALSE 48) Given this situation, to test whether the car origin is independent of the geographical location of the buyer, the sum of the expected cell frequencies will equal 1,240. 49) Given this situation, to test whether the car origin is independent of the geographical location of the buyer, the critical value for alpha =.10 is 14.6837. Answer: FALSE 50) Given this situation, to test whether the car origin is independent of the geographical location of the buyer, the expected number of people in the sample who bought a German made car and who lived on the East Coast is just under 40 people. 51) A cell phone company wants to determine if the use of text messaging is independent of age. The following data has been collected from a random sample of customers. Regularly use text messaging Do not regularly use text messaging Under 21 82 38 21 39 57 34 40 and over 6 83 To conduct a test of independence, the difference expected value for the ʺ40 and over and regularly use text messagingʺ cell is just over 43 people. 52) Contingency analysis can be used when the level of data measurement is nominal or ordinal. 53) To employ contingency analysis, we set up a 2 dimensional table called a contingency table. 54) A contingency table and a cross tabulation table are two separate things and should not be used for the same purpose. Answer: FALSE 55) In a contingency analysis, we expect the actual frequencies in each cell to approximately match the corresponding expected cell frequencies when H0 is true. 56) In a chi square contingency test, the number of degrees of freedom is equal to the number of cells minus 1. Answer: FALSE 57) In a chi square contingency analysis application, the expected cell frequencies will be equal in all cells if the null hypothesis is true. Answer: FALSE 13 4

58) Unlike the case of goodness of fit testing, with contingency analysis there is no restriction on the minimum size for an expected cell frequency. Answer: FALSE 59) In a contingency analysis the expected values are based on the assumption that the two variables are independent of each other. 60) If a contingency analysis test is performed with a 4 6 design, and if alpha =.05, the critical value from the chi square distribution is 24.9958 61) If a contingency analysis test performed with a 4 6 design results in a test statistic value of 18.72, and if alpha =.05, the null hypothesis that the row and column variable are independent should be rejected. Answer: FALSE 62) If the null hypothesis is not rejected, you do not need to worry when the expected cell frequencies drop below 5.0 63) The degrees of freedom for the chi square goodness of fit test are equal to, where k is the number of categories. A) k + 1 B) k 1 C) k + 2 D) k 2 64) Which of the following statements is true in the context of a chi square goodness of fit test? A) The degrees of freedom for determining the critical value will be the number of categories minus 1. B) The critical value will come from the standard normal table if the sample size exceeds 30. C) The null hypothesis will be rejected for a small value of the test statistic. D) A very large test statistic will result in the null not being rejected. 65) A walk in medical clinic believes that arrivals are uniformly distributed over weekdays (Monday through Friday). It has collected the following data based on a random sample of 100 days. Frequency Mon 25 Tue 22 Wed 19 Thu 18 Fri 16 Total 100 Based on this information how many degrees for freedom are involved in this goodness of fit test? A) 99 B) 100 C) 4 D) 5 66) Assuming that a goodness of fit test is to be conducted using a 0.10 level of significance, the critical value is: A) 9.4877 B) 11.0705 C) 7.7794 D) 9.2363 67) To conduct a goodness of fit test, what is the expected value for Friday? A) 20 B) 25 C) 16 D) 100 68) What is the value of the test statistic needed to conduct a goodness of fit test? A) 8.75 B) 7.7794 C) 2.46 D) 2.50 69) Based on these data, conduct a goodness of fit test using a 0.10 level of significance. Which conclusion is correct? A) Arrivals are not uniformly distributed over the weekday because (test statistic) > (critical value). B) Arrivals are uniformly distributed over the weekday because (test statistic) > (critical value). C) Arrivals are not uniformly distributed over the weekday because (test statistic) < (critical value). D) Arrives are uniformly distributed over the weekday because (test statistic) < (critical value). 70) In a chi square goodness of fit test, by combining cells we guard against having an inflated test statistic that could have caused us to: 13 5

A) incorrectly reject the H0. B) incorrectly accept the H0. C) incorrectly reject the H1. D) incorrectly accept the H1. 71) In a goodness of fit test about a population distribution, if one or more parameters are left unspecified in H0, they must be estimated from the sample data. This will reduce the degrees of freedom by for each estimated parameter. A) 1 B) 2 C) 3 D) None of the above 72) If a sample with n = 60 subjects distributed over 3 categories was selected, a chi square test for goodness of fit will be used. How many degrees of freedom will be used in determining the chi square test statistic? A) 1 B) 2 C) 16 D) 64 73) Consider a goodness of fit test with a computed value of chi square = 1.273 and a critical value = 13.388, the appropriate conclusion would be to: A) reject H0. B) fail to reject H0. C) take a larger sample. D) take a smaller sample. 74) A researcher is using a chi square test to determine whether there are any preferences among 4 brands of orange juice. With alpha = 0.05 and n = 30, the critical region for the hypothesis test would have a boundary of: A) 7.81 B) 8.71 C) 8.17 D) 42.25 75) A chi square test for goodness of fit is used to test whether or not there are any preferences among 3 brands of peas. If the study uses a sample of n = 60 subjects, then the expected frequency for each category would be: A) 20 B) 30 C) 60 D) 33 76) We are interested in determining whether the opinions of the individuals on gun control (as to Yes, No, and No Opinion) are uniformly distributed. A sample of 150 was taken and the following data were obtained. Do you support gun control Number of Responses Yes 40 No 60 No Opinion 50 The conclusion of the test with alpha = 0.05 is that the views of people on gun control are: A) uniformly distributed. B) not uniformly distributed. C) inconclusive. D) None of the above 77) To use contingency analysis for numerical data, which of the following is true? A) Contingency analysis cannot be used for numerical data. B) Numerical data must be broken up into specific categories. C) Contingency analysis can be used for numerical data only if both variables are numerical. D) Contingency analysis can be used for numerical data only if it is interval data. 78) What does the term observed cell frequencies refer to? A) The frequencies found in the population being examined B) The frequencies found in the sample being examined C) The frequencies computed from H0 D) The frequencies computed from H1 79) What does the term expected cell frequencies refer to? A) The frequencies found in the population being examined B) The frequencies found in the sample being examined C) The frequencies computed from H0 D) the frequencies computed from H1 80) We expect the actual frequencies in each cell to approximately match the corresponding expected cell frequencies when: A) H0 is false. B) H0 is true. C) H0 is falsely accepted. D) the variables are related to each other. 81) In a contingency analysis, the greater the difference between the actual and the expected frequencies, the more likely: 13 6

A) H0 should be rejected. B) H0 should be accepted. C) we cannot determine H0. D) the smaller the test statistic will be. 82) In a chi square contingency analysis, when expected cell frequencies drop below 5, the calculated chi square value tends to be inflated and may inflate the true probability of beyond the stated significance level. A) committing a Type I error B) committing a Type II error C) Both A and B D) All of the above 83) In performing chi square contingency analysis, to overcome a small expected cell frequency problem, we: A) combine the categories of the row and/or column variables. B) increase the sample size. C) Both A and B D) None of the above 84) How can the degrees of freedom be found in a contingency table with cross classified data? A) When df are equal to rows minus columns B) When df are equal to rows multiplied by columns C) When df are equal to rows minus 1 multiplied by columns minus 1 D) Total number of cell minus 1 85) A cell phone company wants to determine if the use of text messaging is independent of age. The following data has been collected from a random sample of customers. Regularly use text messaging Do not regularly use text messaging Under 21 82 38 21 39 57 34 40 and over 6 83 Based on the data above what is the expected value for the ʺunder 21 and regularly use text messagingʺ cell? A) 82 B) 50 C) 120 D) 58 86) To conduct a contingency analysis, the number of degrees of freedom is: A) 6 B) 5 C) 3 D) 2 87) To conduct a contingency analysis using a 0.01 level of significance, the value of the critical value is: A) 15.0863 B) 5.9915 C) 9.2104 D) 11.0705 88) To conduct a contingency analysis, the value of the test statistic is: A) 9.2104 B) 88.3 C) 275.02 D) 14.6 89) For a chi square test involving a contingency table, suppose H0 is rejected. We conclude that the two variables are: A) curvilinear. B) linear. C) related. D) not related. 90) When testing for independence in a contingency table with 3 rows and 4 columns, there are degrees of freedom. A) 5 B) 6 C) 7 D) 12 91) In testing a hypothesis that two categorical variables are independent using the x2 test, the expected cell frequencies are based on assuming: A) the null hypothesis. B) the alternative hypothesis. C) the normal distribution. D) the variable are related. 92) A study published in the American Journal of Public Health was conducted to determine whether the use of seat belts in motor vehicles depends on ethnic status in San Diego County. A sample of 792 children treated for injuries sustained from motor vehicle accidents was obtained, and each child was classified according to (1) ethnic status (Hispanic or non Hispanic) and (2) seat belt usage (worn or not worn) during the accident. The number of children in each category is given in the table below. Hispanic Non Hispanic Seat belts worn 31 148 Seat belts not worn 283 330 Referring to these data, which test would be used to properly analyze the data in this experiment? A) x2 test for independence in a two way contingency table B) x2 test for equal proportions in a one way table 13 7

C) ANOVA F test for interaction in a 2 2 factorial design D) x2 goodness of fit test 93) Referring to these data, the calculated test statistic is: A) approximately 0.9991 B) nearly 0.1368 C) about 48.1849 D) approximately 72.8063 94) Referring to these data, which of the following conclusions should be reached if the appropriate hypothesis is conducted using an alpha =.05 level? A) The mean value for Hispanics is the same as for Non Hispanics. B) There is no relationship between whether someone is Hispanic and whether they wear a seat belt. C) The use of seat belts and whether a person is Hispanic or not is statistically related. D) None of the above 95) Many companies use well known celebrities as spokespeople in their TV advertisements. A study was conducted to determine whether brand awareness of female TV viewers and the gender of the spokesperson are independent. Each in a sample of 300 female TV viewers was asked to identify a product advertised by a celebrity spokesperson. The gender of the spokesperson and whether or not the viewer could identify the product was recorded. The numbers in each category are given below. Male Celebrity Female Celebrity Identified product 41 61 Could not identify 109 89 Referring to these sample data, which test would be used to properly analyze the data in this experiment? A) x2 test for independence in a two way contingency table B) x2 test for equal proportions in a one way table C) ANOVA F test for main treatment effect D) x2 goodness of fit test 96) Referring to these sample data, if the appropriate hypothesis test is to be conducted using a.05 level of significance, which of the following is correct critical value? A) 9.4877 B) 3.8415 C) 1.96 D) 7.8147 97) Referring to these sample data, which of the following values is the correct value of the test statistic? A) Approximately 9.48 B) Nearly 23.0 C) About 3.84 D) Approximately 5.94 98) Referring to these sample data, if the appropriate null hypothesis is tested using a significance level equal to.05, which of the following conclusions should be reached? A) There is a relationship between gender of the celebrity and product identification. B) There is no relationship between gender of the celebrity and product identification. C) The mean number of products identified for males is different than the mean number for females. D) Females have higher brand awareness than males. 99) The degrees of freedom for a contingency table with 11 rows and 10 columns is: A) 11 B) 10 C) 110 D) 90 100) We want to test whether type of car owned (domestic or foreign) is independent of gender. A contingence table is obtained from a sample of 990 people as At alpha = 0.05 level, we conclude that: A) x2= 3.34 and type of car owned is independent of gender. B) x2 = 3.34 and type of car owned is dependent of gender. C) x2 = 3.84 and type of car owned is independent of gender. D) x2 = 3.84 and type of car owned is dependent of gender. 13 8

101) The billing department of a national cable service company is conducting a study of how customers pay their monthly cable bills. The cable company accepts payment in one of four ways: in person at a local office, by mail, by credit card, or by electronic funds transfer from a bank account. The cable company randomly sampled 400 customers to determine if there is a relationship between the customerʹs age and the payment method used. The following sample results were obtained: Based on the sample data, can the cable company conclude that there is a relationship between the age of the customer and the payment method used? Conduct the appropriate test at the alpha= 0.01 level of significance. A) Because x2 = 42.2412 > 21.666, do not reject the null hypothesis. Based on the sample data conclude that age and type of payment are independent. B) Because x2 = 42.2412 > 21.666, reject the null hypothesis. Based on the sample data conclude that age and type of payment are not independent C) Because x2 = 50.3115 > 21.666, do not reject the null hypothesis. Based on the sample data conclude that age and type of payment are independent. D) Because x2 = 50.3115 > 21.666, reject the null hypothesis. Based on the sample data conclude that age and type of payment are not independent. 104) Explain why, in performing a goodness of fit test, it is sometimes necessary to combine categories. Answer: Because of the way in which the chi square test statistic is computed by squaring the difference between the observed and expected frequencies, when the expected frequencies are small (less than 5), the calculated test statistic can become artificially large and therefore may lead to an increased chance of committing a Type I statistical error. That is, a true null hypothesis may be rejected at a higher rate than indicated by the selected significance level. By combining categories, the small expected frequencies are grouped to become larger than five and thus the issue of inflated Type I error probability dissolves. Note: An alternative to combining categories is to increase the sample size. Large sample sizes result in greater expected cell frequencies in all categories. 13 9