1 Report of for Chapter 2 pretest Exam: Chapter 2 pretest Category: Organizing and Graphing Data 1. "For our study of driving habits, we recorded the speed of every fifth vehicle on Drury Lane. Nearly every car traveled right at the speed limit or a little over, but there were some that were 10 mph under, even fewer at 20 mph under, and one car that crept by a just 15 mph. On the basis of the central tendency calculation on our data, we drew conclusions about all drivers on this stretch of road." The proper central tendency value calculated from the data is the Correct Answer population median; sample median; population mean (m); sample mean (`X). In this situation, the speed (ratio scale) of every fifth vehicle was recorded. The first thing to note then is that the researchers have generated data from a systematic random sample. We would also note that the shape of the distribution was negatively skewed with a few very slow speeds and most at or slightly over the speed limit. In a negatively skewed distribution, the median is not senstive to outlying data and thus would present a clearer summary than the mean of driver performance on the targeted stretch of road. Since the data represent a sample, the sample median would be the correct response to this question. Help: none Confident: 36/100 Central tendency measures: statistics - parameters: 67 Mean, Median, and Mode in frequency distributions: 66 Time of problem: 0 minutes seconds 2. Which of the following is not an accurate statement of one of the conventions your text used in establishing class intervals?
2 Use not fewer than 10 or more than 20 class intervals; Class intervals should start with an odd number or a multiple of 10; The largest scores should be at the top of the list of class intervals; Correct Answer All of the above are correct statements. All are correct answers. There are 4 general criteria or guidelines for establishing class intervals. 1. The number of intervals should be between 10 and The class interval should be convenient size. For example, Use an odd number such as 3, 5, or 7 for the interval width to make it easy to compute the midpoint of the interval. Use widths of 10, 20, 50, or 100 where appropriate to make it easy for the reader to "understand" the intervals. When displaying test data based on a 100 point scale, 10-point intervals like 50-59, 60-69, 70-79, 80-89, and make sense to most folks because they correspond in general to letter grade distributions. 3. Begin each class interval with a multiple of i. 4. The largest scores should go at the top of the distribution. Confident: 73/100 Frequency Distributions: 67 Time of problem: 0 minutes seconds 3. A frequency distribution with a mean of 100 and a median of 90 is Correct Answer positively skewed; negatively skewed; neither positive nor negative; cannot be determined from the information given.
3 Positively skewed is the correct answer. In a symmetrical distribution, the mode=mean=median. If I know that the mean is greater than the median, I know that there must be some high scores in the distribution that are pulling the mean away from the center of the distribution. This is a situation where outliers are influencing how the distribution is shaped. When a few high numbers pull the mean away from the center of the distribution, the tail of the distribtion pointing to the high numbers is stretched in a positive direction, hence, the distribution is positively skewed. Confident: 69/100 Mean, Median, and Mode (Define): 83 Mean, Median, and Mode in frequency distributions: 66 Time of problem: 0 minutes seconds 4. Which of the following words could legitimately fit into this sentence: "That simple frequency distribution has two, 13 and 18." means; medians; Correct Answer modes; all of the above. Modes is the correct answer. A distribution can have only one mean and only one median. Means and medians are computed values whereas the mode reflects the most frequently occurring value(s). In a simple frequency distribution (where scores are presented in an ascending or desending order accompanied by the number of occurrences for each score), the mode is the frequently appearing score or scores in the distribution. In a group frequency distribution, the mode is the midpoint of the interval that contains the most scores. Group frequency distributions can also be multimodal if more than one interval has the same high frequency count. Confident: 85/100 Mean, Median, and Mode (Define): 83 Mean, Median, and Mode in frequency distributions: 66 Time of problem: 0 minutes 6.77 seconds
4 5. The U.S. Department of Agriculture reported the total number of bushels harvested of corn, soy beans, wheat, rice, and oats. This is a frequency distribution of a Correct Answer nominal variable; ordinal variable; interval variable; ratio variable. Nominal variable is the correct response. The data in this question is nominal, that is categories are set up according to names like corn, wheat, soy beans, etc. There is no order to the variables, just simple categories and frequencies indicating the total number of bushels harvested in each category. A bar graph would be used to visually present this information. Histograms, frequency polygons, and line graphs are used when the underlying scale of measurement is continuous (interval or ratio). In this case, the underlying scale of measurement is nominal, so we want a presentation device that separates the different categories and that vehicle is the bar graph. The bar graph allows the reader to compare categories without regard to order. Confident: 100/100 Frequency graphs: 59 Time of problem: 0 minutes seconds 6. Which of the following is not used to present a frequency distribution? a bar graph; Correct Answer a scatterplot; a line graph; a histogram. Scatterplot is the correct response. A scatterplot shows the relationship between two quantitative variables, both measured on either an interval or ratio scale. The bar graph, histogram, and frequency polygon all indicate frequency values on the Y-axis and measurement on one variable represented on the X-axis. In a scatterplot, a point represents a the value on the Y variable that goes with the corresponding value on the
5 X variable. In a scatterplot, measurement for both the X and Y variables requires underlying scales of measurement that are continuous (interval or ratio). Confident: 90/100 Frequency graphs: 59 Time of problem: 0 minutes seconds 7. To present a frequency distribution of nominal data you should use a polygon; a histogram; a line graph; Correct Answer a bar graph. Bar graph is the correct response. With nominal data, the underlying scale of measurement is categorical, reflecting no specific order or equal distances between categories. Of the choices, only the bar graph makes sense for categorical data. The bars in a bar graph do not touch, emphasizing the separateness of the categories. Confident: 100/100 Frequency graphs: 59 Time of problem: 0 minutes seconds 8. In a set of scores that ranged from 11 to 50, an acceptable lowest class interval would be 11-13; 11-14; 9-12; Correct Answer is the correct answer. Remember:
6 a. the number of intervals should be between 10 and 20; b. the size for the class interval should be convenient; and c. each interval should begin with a value that is a multiple of the interval. Given the guidelines above, the range of values is 40 [(50-11)+1]. Dividing 40 by 10 (the recommended number of intervals) yields an interval width of 4. But, this interval width is not convenient because of midpoint problems. A better choice would be an interval width of 3 which would give us more than 10 intervals and a convenient width to work with when computing midpoints. This eliminates choices b and c. Given an interval width of 3, only the interval 9 to 11 begins with a multiple of 3. Confident: 26/100 Frequency Distributions: 67 Time of problem: 0 minutes seconds 9. In which situation would the mean be an appropriate measure of central tendency? Correct Answer Most of the scores are near the minimum, a few are in the middle range, and there are almost none near the maximum; We have frequency data on cows, horses, mules, and goats; You The data categories in the soil analysis are: 0-2 ppm, 3-5 ppm, 6-8 ppm, 9- Answered 11 ppm, and over 11 ppm; A few scores are at the minimum of the range, some scores are in the middle range, most scores are near the maximum; None of the above; The mean would be the appropriate measure of center for a, c, and d. If I'm looking for a single score to communicate how a group of individuals has performed, the mean is the best measure of central tendency when the measurement scale is continuous (interval or ratio) and the distribution of scores is reasonably symmetric. Because the mean is mathematically derived and represents the SX N, outliers in the data set can artificially raise or lower the mean by increasing or decreasing SX. Thus, in distributions that are skewed, the outlying data could render the mean too high or too low and make the median a more representative and informative measure of central tendency. When the distribution is symmetric (most of the scores are near the minimum, a few are in the middle range, and there are almost
7 none near the maximum) and the measurement scale is continuous, the mean is the most appropriate and most commonly used measure of central tendency. Stems b and c feature nominal data (mode only) and stem d describes a negatively skewed distribution in which case the median would be a better measure of central tendency than the mean. Confident: 68/100 Central tendency measures: statistics - parameters: 67 Mean, Median, and Mode (Define): 83 Time of problem: 0 minutes seconds 10. Following are final examination scores for 40 students in a basic statistics class. These scores were randomly selected from the records of all students who have taken the course over the past 10 years and have taken the standardized final examination a. Create a simple frequency distribution. b. Create grouped frequency distributions with interval widths of 3 and 5. Include columns for class intervals, exact limits, midpoints, f, cf, %, and c% in your tables. c. Draw histograms, keeping the same scale on the Y axis the same for each of the grouped frequency distributions. d. Draw a boxplot for this data using the five-number summary [X min,
8 X max, Q 3, Q 1, and the Median]. Note. You may eliminate this question as we have yet to cover boxplots. e. Comparing the two histograms, which do you think best represents the distribution and why? f. Using the group frequency distribution (i = 5), compute the percentiles for scores of 63 and 81. g. Using the group frequency distribution (i = 5), what score corresponds to the 15 th percentile? h. Compute the mean and standard deviation for this set of scores using both the deviation and raw score methods? Should you generate population parameters or sample statistics? Why? Note. Compute the mean but leave the standard deviation for later; we haven't covered it yet. i. From the grouped frequency distribution (i = 5), what are the values for the median and the mode? j. Given what you know about distributions and measures of center, what can you conclude about this data set? Your Answer: placeholder The full explanation for this question, along with tables, histograms, boxplot, summary statistics, and computations can be viewed and printed by clicking on the following link: a. The simple frequency distribution can be viewed on the linked page. In ascending order the data are: 58,60,61,63,63,65,65,68,68,70,72,72,73,74,75,75,75,76,76,79 80,80,80,81,82,82,82,82,82,84,84,86,86,88,89,89,90,91,94,96 b. Class Interval Exact Limits Midpoint f cf % c%
9 c. Class Interval Exact Limits Midpoint f cf % c% d. See linked page. e. See linked page.
10 f. In this distribution, scores range from 58 to 96, a distance of 39 score points. Generally, you are looking to create between 10 and 20 intervals. The interval width should be odd and it should make sense in terms of the distribution. The lower limit of the first interval should be a multiple of the interval width and include the lowest value in the distribution. Divide 39 by 10 and you get 3.9. The closest whole number value to 3.9 is 4.0, but this value is an even number which makes computation of the midpoint somewhat difficult. This leads us to considering interval widths of 3 and 5. A width of 3 yields 14 class intervals starting at and ending at A width of 5 yields 9 class intervals starting at and ending at Look over the histograms. With an interval width of 3, the distribution looks too flat, too sparse. There are not enough cases to adequately populate the 14 class intervals. An interval width of 5 on the other hand, leads to a much more cogent picture of the distribution, a slight negative skew, with bunching in the middle of the distribuiton. Moreover, an interval width of 5 makes sense, in that test scores naturally break at points of 90, 80, 70, 60, etc. g. Percentiles:
11 h. Percentile Rank: i. See linked page. The data is a random sample and will be used to describe the general characteristics of the population, all students who have taken the introductory statistics class over the past 10 years. Therefore, you should be computing sample statistics and using n-1 in the denominator of the standard deviation formula. j. In the grouped frequency distribution, the interval from 80 to 84 has the highest frequency (11), so the mode is the midpoint of that interval, 82. The median of the distribution is also in the interval from (79.5 to 84.5). The c% associated with the lower limit is 47.5 and with the upper limit is % (11 40) of the cases are in this interval. To compute the median, subtract 47.5 from 50 to get 2.5. Divide 2.5 by 27.5 to get.09. Multiply.09 times the interval width (5) to get.45 and add the.45 to 79.5 to get 79.95, the median.
12 Slight negative skew in the exam score distribution Scores bunched in the area Mean=77.4; Median= % of the students scored at or below 70 and 25% scored at or above 84. The score distribution might be interpreted to suggest that students were generally well prepared for the exam. The slight negative skew bunched in the 80% to 85% area indicates that the exam was challenging but fair. A shift in the distribution up would maybe indicate that the exam was too easy and a shift down, too hard. Overall, good scores were within the reach of most students in the class. Confident: 30/100 Central tendency measures: statistics - parameters: 67 Frequency Distributions: 67 Frequency graphs: 59 Graphs of distributions: 100 Mean, Median, and Mode (Define): 83 Mean, Median, and Mode in frequency distributions: 66 Skewness: 88 Time of problem: 0 minutes seconds 11. Your text noted which of the following as a characteristic of the mean? Correct Answer The sum of the results of squaring the difference between each score and the mean is a minimum; The sum of the results of squaring the difference between each score and the mean is zero; You Answered Both a and b; Neither a nor b. There are two important properties of the mean: 1. the mean is a balance point in a distribution, therefore, the sum of the deviations about the mean, S(X -`X), equals 0; and
13 2. if I square each of the deviation scores (to get rid of negative values), the sum of the squared deviation scores, S(X -`X) 2, is a minimum. The properties of the mean can be illustrated by the numbers 1, 3, and 5. The mean of these numbers is 3. If I create deviation scores for each value of X, I get (1-3) or -2; (3-3) or 0; and (5-3) or +2. Summing these deviation scores, I get ((-2) (+2))=0 (1 st property of the mean). If I square and sum each deviation, I get (-2) 2 + (0) 2 + (+2) 2 = 4+0+4=8. If I choose some other number for the mean, other than 3, say for instance 5, squaring and summing the deviations yields (1-5) 2 + (3-5) 2 + (5-5) 2 = = 20. Note that 8 is the lowest value that can be obtained when summing the squared deviations for the data values 1, 3, and 5. Try any other value. This is the second property of the mean. The second property of the mean is expressed in stem a and represents the correct answer to this question. Stem b cannot be correct because squaring the deviations and summing the values will yield 0 only when all scores in the distribution are the same. Confident: 14/100 Mean, Median, and Mode (Define): 83 Time of problem: 0 minutes seconds 12. The mean temperature for January was 30º. In February the mean was 25º and for March the mean was 35º. The overall mean for these months is 30º Correct Answer greater than 30º less than 30º Cannot be determined from the information presented. At first reading, this would appear to be a simple problem. Add and divide by 3 to get an average temperature of 30º. Certainly this would give you a rough estimate of the average temperature, but it would not be accurate estimate. This is a weighted mean problem because the three months are not equal in length. January and March have 31 days but February has only 28 days. This would mean that there are fewer days with the average temperature at 25º and more days with the average temperature at 35º. The weighting then would be on the above 30º side meaning that the average temperature over the three months would be slighly more than 30º.
14 Confident: 74/100 Weighted mean: 64 Time of problem: 0 minutes seconds 13. Two investigators tested their friends for memory span. The first tested five people and found a mean of 6.0. The second tested nine people and found a mean of 7.0. The overall mean for the data gathered is Correct Answer This is a weighted mean problem. One group has a mean of 6 based on 5 people and the other group has a mean of 7 based on 9 people. Because there are unequal numbers in the two groups, I need to proportionally weight the means. To determine the overall mean, you need to key in on the formula for the mean, SX N. For the first group, the SX is equal to 6*5 or 30. In the second group, SX is equal to 7*9 or 63. To find the combined mean, I would add the two SX terms, 30 and 63, and divide by the total number of people, 5+9. The result is or Help: Calculator Confident: 95/100 Weighted mean: 64 Time of problem: 2 minutes seconds 14. Describe the distinguishing characteristics of the histogram, line graph, and frequency polygon. Under what conditions would each be used? Correct answers: quantitative; two variables; line-curve Your Answer: placeholder (Incorrect) a. histogram
15 Graphing technique appropriate for quantitative data. Class intervals are represented on the X-axis, and the frequency of each class interval is represented by the height of the bar. Midpoints of the class intervals are plotted on the X-axis and the width of the bars extends to the exact limits for each class interval. The bars in a histogram, then, touch one another. Histograms tend to be easy to read and convey a sense for how scores in the distribution are gathered. b. line graph Line graphs picture the relationship between two variables. That is, for every value of X there is a corresponding value of Y. Line graphs are very useful for indicating trends, in fact, the most common use is probably with stock market data where transaction averages or stock values are plotted on the Y-axis and time (days, months, quarters, or years) is plotted on the X-axis. c. frequency polygon The frequency polygon is often referred to as a smooth-line curve and is a variation of the histogram. Midpoints of the class intervals on the X-axis are connected by straight lines. The frequency polygon allows researchers to easily compare, on one set of axes, distributions for two or more groups. Histograms comparing groups on the same axes are generally too cluttered. Like the histogram, frequency polygons represent frequencies for values of quantitative variables. Bar graphs, with spaces between bars, are used to display frequencies of the categories of a qualitative variable. Confident: 31/100 Frequency graphs: 59 Time of problem: 0 minutes seconds
16 15. The fact that the middle of a series of items is more difficult to learn that the beginning or the end is known as the series effect; middling effect; bimodal effect; Correct Answer serial position effect. Plotting this phenomenon, known as the serial-position effect, results in a good example of a line graph. In trying to learn a list of items, it tends to be the case that recall of the items is highest for those that occur at the beginning or at the end of the list. These effects are referred to as primacy and recency effects, respectively. That is, items that you reviewed most recently tend to have the highest recall followed by those at the beginning of the list that have been repeated most often in the study process. Items in the middle of the list are the ones that suffer in terms of memory and thus are the ones that need careful attention in the study process. Use of nmenonic devices can really improve interior recall, but the line curve is a good way to illustrate how the serial position effect impacts recall. Confident: 100/100 Frequency graphs: 59 Time of problem: 0 minutes seconds 16. Suppose a frequency distribution with a range of 0 to 100 was positively skewed. The greatest frequency of scores would be expected around Correct Answer 25; 50; 75; any of the above are possible for such a distribution; none of the above are reasonable for such a distribution.
17 Confident: 34/100 Graphs of distributions: 100 Mean, Median, and Mode in frequency distributions: 66 Time of problem: 0 minutes seconds 17. Identify the skew of the two distributions below. a. X f
18 b. X f Correct answers: negatively skewed; negatively skewed Your Answer: negatively skewed; negatively skewed (Incorrect) a. Even without sketching the first distribution, you should be able to see the strong negative skew. Negative skew is present when the tail of the distribution extends or points to the low numbers and that is exactly the case here. As scores on the X-axis go from 0 to 5 the frequencies move from 1 to 1 to 2 to 6 to 8 to 10. Eighteen of the 28 scores are 4's or 5's, that is, the scores tend to be bunched at the upper end of the distribution. This is characteristic of negatively skewed distributions. b. The second distribution is also negatively skewed, but not nearly as pronounced as the first. The tail points somewhat to the lower
19 numbers and bunches some at the higher numbers, but at the highest values, there is a dropoff in scoring and there are no real outlying scores on either end of the distribution. The distribution is not symmetric and it is negatively skewed, but the skew is slight. Confident: 80/100 Skewness: 88 Time of problem: 0 minutes seconds 18. The appropriate statistic for conveying the central tendency of a nominal variable is mean; median; Correct Answer mode; any of the above, but the mean is preferable; any of the above, but the median is preferable; any of the above, but the mode is preferable. The mode is the only appropriate measure of central tendency that can be used with nominal variables. Remember, nominal variables are represented by categories or names that have no inherent order to them. A bar graph is used to summarize the data and the bars are separated by space to indicate that their location along the X-axis is arbitrary. Means and medians, the other measures of central tendency, require that the measurement scale along the X-axis be at least ordered (mode + median) and at best interval or ratio (mode + median + mean). Confident: 22/100 Mean, Median, and Mode (Define): 83
20 Mean, Median, and Mode in frequency distributions: 66 Time of problem: 0 minutes seconds You answered 10 correct out of 15 computer graded questions. Add the number of the written answers (if any were given) that you believe you got correct to the total correct value to determine your score out of 18.