10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used. Statements of central tendency are nothing more than attempts to describe the whole distribution of a data set by reporting one most typical value. The most typical values serve to represent the point or points about which most of the values in the distribution are centered. Three statements of central tendency are commonly used. They are the mean, the median, and the mode. Computing Means The mean is the average value. Imagine that we have recorded the ages of 11 children with chicken pox. 2, 4, 5, 5, 6, 6, 6, 7, 7, 8, and 10 years One way to represent these 11 ages with one most typical value is to calculate the mean age for the group of 11 children. To find the mean, we must first sum the ages (2 + 4 + 5 + 5 + 6 + 6 + 6 + 7 + 7 + 8 + 10 = 60), and then divide the total by the number of cases (66 11 = 6 years). The mean for this age group is 6 years. Computing Means To compute the mean, or average, we use the following definition: Definition of Mean The mean is the average, the location in the distribution of values at which the deviations above it and the deviations below it are equal.
Computing Means We can think of the mean as the balance point. The mean is sensitive to exceptional values. 1 + 4 + 4 + 5 = 14 1 + 1 + 1 + 1 + 2 + 2 + 2 + 4 = 14 The mean is the average, the location in the distribution of values at which the deviations above it and the deviations below it are equal. Graphically, the sum of the total distances to the data points below the mean equals the sum of the total distances to the data points above the mean. The mean is sensitive to every value. Computing Median The median is the middle value in a group of ordered values. Look again at the ages of the 11 children with chicken pox: 2, 4, 5, 5, 6, 6, 6, 7, 7, 8, and 10 years 5 values on the left 5 values on the right median Notice that the ages have been ordered from the youngest to the oldest, and the middle value is the sixth value from either end. The median for this group is 6 years. In the previous example, there were an odd number of values; so, the median was actually one of the values.
If we have an even number of values, the median is found by averaging the two middle values. For example, imagine that the ordered ages of the group of six children are: 1, 2, 3, 4, 5, and 6 In this case, there is no single middle data value. The two middle value are 3 and 4 years. The average of these is ((3 + 4) 2) 3.5. Therefore, 3.5 years represents the median value for this group of six ages, but it is not one of the ages. In general, to find the median for a set of n numbers: 1. First sort the values in order. 2. If the number of values is odd, the median is the number located in the exact middle of the list. 3. If the number of values is even, the median is found by computing the mean of the two middle numbers. Recall that the mean is dramatically affected by extreme values; whereas, the median is not dramatically affected. Finding the Modes The mode is the most frequently occurring value in a group of values. The most frequently occurring age in the group of 11 children with chicken pox: 2, 4, 5, 5, 6, 6, 6, 7, 7, 8, and 10 years is 6 years. Three of the children were 6 years old, and none of the other ages were represented more than twice. It is important to remember that there may not be a single, most frequently occurring value in the distribution, and if there is, it may not be unique i.e., there may be more than one mode.
Choosing the Most Appropriate Average When we attempt to choose the most appropriate statement of central tendency to use when describing a set of data, two factors must be considered: First, is the shape of the distribution. If the distribution is symmetrical, the mean, median, and mode will be equal or very close, and each may be used as the most typical representative value. If the shape of the distribution is not symmetrical (skewed), the median is the best choice as a measure of central tendency. The second factor to consider is the scale of measurement. If we are dealing with unordered categories, then our only choice is mode. For continuous data we may use the mean, median, or mode depending on symmetry. Measures of Spread or Dispersion Consider the following data: Range = upper extreme lower extreme = 35 20 = 15 20, 22, 22, 25, 26, 27, 27, 28, 30, 35 20, 22, 22, 25, 26, 27, 27, 28, 30, 35
Box Plots Boxplots are another graphical means of displaying key characteristics of data. The idea is to arrange the data in increasing order and choose three numbers Q1, Q2, and Q3 that divide it into four equal parts as indicated below. Minimum data point Median Q 1 Q 2 Q 3 Maximum data point Bottom 25% Min Q 1 Q 2 Q 3 Top 25% Max 15 20 25 30 35 45 Outliers An outlier is a value that is located very far away from almost al of the other data values. Relative to the other data, an outlier s an extreme value. An outlier is any value that is more than 1.5 times the interquartile range above the upper quartile or below the lower quartile. Outliers are commonly indicated with an asterisk. Outlier * Min Q 1 Q 2 Q 3 Max
Mean Absolute Deviation The mean absolute deviation (MAD) makes use of the absolute value to find the distance each data point is away from the mean. The following steps are used to determine the MAD. 1. Measure the distance from the mean by simply subtracting the data value minus the mean. 2. Find the absolute value of the differences. 3. Sum those absolute values. 4. Find the mean by dividing the sum by the number of scores. A visual picture of the mean absolute value deviation for the 11 ages of children with chicken pox. Bar Graph of Ages with Mean Ages Segment Marking the Mean Children Listed Numerically Compute the MAD for the following two sets of data.
Variance and Standard Deviation The variance and the standard deviation are two commonly used statements of dispersion. The variance of a sample may be defined as the sum of the squared deviations from the mean value divided by the number of values. The variance is calculated as follows: 1. Find the deviation of each value in the set from the mean value. 2. Square each of these deviations. 3. Sum all of these squared deviations 4. Divide the sum of the squared deviations from the mean by the number of values. Formula for the variance, v: The standard deviation, s, is the square root of the variance. Now, let us calculate the variance and standard deviation of the ages of the 11 children with chicken pox. Variance Standard deviation
Normal Distributions A normal distribution is a frequency distribution with continuous, randomly occurring data plotted on the x axis and frequency (counts) plotted on the y axis. This distribution is actually a theoretical distribution, but many real world situations are close to this idea 1. The normal curve has a bell-shape. 2. The curve extends infinitely in both directions and gets closer and closer to the x-axis but never reaches it. 3. The curve is symmetrical about its center point, but not all symmetrical distributions are normal. 4. The three statements of central tendency (mean, median, and mode) all fall in the exact same place. On a normal curve, about 68% of the values lie within 1 standard deviation of the mean, about 95% lie within 2 standard deviations, and about 99.8% are within 3 standard deviations.
Applications of the Normal Curve Suppose that cholesterol values for a population have a mean of 200 mg/dl and a standard deviation of 40 mg/dl. The following shows the raw cholesterol levels for plus/minus 1 and 2 standard deviations. Example 1 Calculate the mean, median, and mode for the following data sets: a. 2, 5, 7, 5, 8, 9, 5, 10, 8 b. 10, 12, 12, 15, 17, 12, 18, 14, 11, 13 c. 17, 21, 21, 18, 39, 17, 13
Example 2 John s fall-quarter grades follows are below. Find his grade point average for the term (A = 4, B = 3, C = 2, D = 1, F = 0). Course Credits Grades Math 5 B English 3 A Physics 5 C German 3 D Handball 1 A Example 3 For certain workers, the mean wage is $5.00/hr, with a standard deviation of $0.50. If a worker is chosen at random, what is the probability that the worker s wage is between $4.50 and $5.50? Assume a normal distribution of wages. Example 4 Ginny s median score on three tests was 90. Her mean score was 92 and her range was 6. What were her three test scores? 10.3 #A-1, 3, 9, 13, 15, 17, 19, 21, 23, B-1, 13, 19, 21