can be denoted Σx = x + x 2 x 2 ,..., x n is calculated as follows: x = Σx n

Size: px
Start display at page:

Download "can be denoted Σx = x + x 2 x 2 ,..., x n is calculated as follows: x = Σx n"

Transcription

1 Unit 2 Part 1: Measures of Central Tendency A small video recycling business had the following daily sales over a six- day period: $305, $285, $240, $376, $198, $264 A single number that is, in some sense representative of this whole set of numbers, a kind of middle value, would be a measure of central tendency. There are 3 measures of central tendency: mean, median, and mode. 2:1 THE MEAN The mean is the most common measure of central tendency (also called the arithmetic mean). The mean of a sample is denoted x (read x bar ), while the mean of a complete population is denoted µ (the lower case Greek letter mu). For our purposes, data sets will always be considered to be samples, so we will use x. 2:1:1 Finding the mean To find the mean of a set of data items, add up all the items and then divide the sum by the number of items. (The mean is what most people associate with the word average. ) Since adding up, or summing, a list of items is a common procedure in statistics, we use the symbol for summation, (the capital Greek letter sigma) or add them all up. Therefore, the sum of n items, say x 1,x 2,..., x n can be denoted Σx = x + x x n The subscripts on the observations, xi, are just a way of keeping the n observations distinct. They do not necessarily indicate order or any other special facts about the data. The mean of n data items x 1, x 2,..., x n is calculated as follows: x = Σx n Example 1: We can use this formula to find the central tendency of the daily sales figures above. Mean = x = Σx n = Add the daily sales. Divide by the number of days. 6 = 1668 or The mean value (the average daily sales ) for the week is $278. Example 1: Finding the Mean of a List of Sales Figures Last year s annual sales for eight different flower shops were as follows: $374,910 $321,872 $242,943 $351,147 $382,740 $412,111 $334,089 $262,900 Find the mean annual sales for the eight shops. Solution: x = Σx n = 2,682,712 = 335,339 Add the sales, divide by the number of shops 8 The mean annual sales amount is $335,339. **GC: Verify this solution on your graphing calculator using the STATS features discussed in class. Page 1

2 2:1:2 Finding a Weighted Mean In words, the weighted mean (or weighted average) of a group of (weighted) items is the sum of all products of items times weighting factors, divided by the sum of all weighting factors. The weighted mean of n numbers, x 1, x 2,..., x n, that are weighted by the respective factors f 1, f 2,..., f n is calculated as follows. w = Σ(x f ) Σf The calculation of a grade- point average is an example of a weighted mean because the grade points for each course grade must be weighted according to the number of units of the course. (For example, five units of A is better than two units of A.) The number of units is called the weighting factor. Example: The following table shows the units and grades earned by one student. In one common method of defining grade- point average, an A grade is assigned 4 points, with 3 points for B, 2 for C, and 1 for D. Compute grade- point average as follows. Step 1 Multiply the number of units for a course and the number assigned to each grade. Step 2 Add these products. Step 3 Divide by the total number of units. Grade point average = 43 = 3.07 (rounded) 14 2:1:3 Finding the mean of a frequency distribution The weighted mean formula is commonly used to find the mean for a frequency distribution. In this case, the weighting factors are the frequencies. Example: Find the mean salary for a small company that pays annual salaries to its employees as shown in the frequency distribution to the right. According to the weighted mean formula, we can set up the work as follows below: Mean salary = (rounded) **GC: Verify the solution above using your graphing calculator. Page 2

3 2:1:4 Misleading Means and Extreme Values (Outliers) For some data sets the mean can be a misleading indicator of average. An extreme value like this is referred to as an outlier. Since a single outlier can have a significant effect on the value of the mean, we say that the mean is highly sensitive to extreme values. Consider Barry Matlock who runs a small business that employs five workers at the following annual salaries. $16,500 $16,950 $17,800 $19,750 $20,000 The employees, knowing that Barry accrues vast profits to himself, decide to go on strike and demand a raise. To get public support, they go on television and tell about their miserable salaries, pointing out the mean salary in the company. 16, , , , ,000 x= = 91,000 = $18,200 Mean salary (employees) 5 5 The local television station schedules an interview with Barry to investigate. In preparation, Barry calculates the mean salary of all workers (including his own salary of $188,000). x= 16, , , , , ,000 6 = 279,000 5 = $46,500 Mean salary (incl. Barry s) When the TV crew arrives, Barry calmly assures them that there is no reason for his employees to complain since the company pays a generous mean salary of $46,500. The employees, of course, would argue that when Barry included his own salary in the calculation, it caused the mean to be a misleading indicator of average. This was so because Barry s salary is not typical. It lies a good distance away from the general grouping of the items (salaries). 2:1:5 Mean vs. Median and resistance measures The introduction of a single outlier in the example above increased the mean by 8 percent but left the median unaffected. Outliers are often considered as possible errors in the data so one should always examine them closely. Because the mean cannot resist the influence of extreme observations, we say that it is not a resistant measure of center. A measure that is resistant does more than limit the influence of outliers. Its value does not respond strongly to changes in a few observations, no matter how large those changes may be. The mean fails this requirement because we can make the mean as large as we wish by making a large enough increase in just one observation. Because even a single change in the data may cause the mean to change, while the median and mode may not be affected at all, the mean is the most sensitive measure. You can see that the median is more resistant than the mean. In a symmetric distribution, the mean, median, and mode (if a single mode exists) will all be equal. In a non- symmetric distribution, the mean is often unduly affected by relatively few extreme values and, therefore, may not be a good representative measure of central tendency. For example, distributions of salaries, family incomes, or home prices often include a few values that are much higher than the bulk of the items. In such cases, the median is a more useful measure. Page 3

4 2:2 THE MEDIAN Another measure of central tendency, which is not so sensitive to extreme values, is the median. This measure divides a group of numbers into two parts, with half the numbers below the median and half above it. 2:2:1 Finding the Median M Find the median of a group of items as follows. Step 1 Rank the items (that is, arrange them in numerical order from least togreatest). Step 2 If the number of items is odd, the median is the middle item in the list. Step 3 If the number of items is even, the median is the mean of the two middle items. For Barry Matlock s business, all salaries (including Barry s), arranged in numerical order, are shown here. $16,500, $16,950, $17,800, $19,750, $20,000, $188,000 median = 17, = 37,550 2 = 18,775 This figure is a representative average, based on all six salaries, that the employees would probably agree is reasonable. **GC: Verify the solution above using the instructions for your graphing calculator at the end of Part 1. Set up a lists for the salaries and one for the frequencies. Use 1- var stats to find the median. STAT>CALC> 1- Var Stats L1 ENTER Note: you will need to scroll down on the results to find the median Med Example: Find the median of each list of numbers (showing your work). a) 6, 7, 12, 13, 18, 23, 24 b) 17, 15, 9, 13, 21, 32, 41, 7, 12 c) 147, 159, 132, 181, 174, 253 SOLUTIONS a) This list is already in numerical order. The number of values in the list, 7, is odd, so the median is the middle value, or 13. b) First, place the numbers in numerical order from least to greatest. 7, 9, 12, 13, 15, 17, 21, 32, 41 The middle number can now be picked out. The median is 15. c) First write the numbers in numerical order. 132, 147, 159, 174, 181, 253 Since the list contains an even number of items, namely 6, there is no single middle item. Find the median by taking the mean of the two middle items, 159 and = = Median Page 4

5 2:2:2 Median position in a Frequency Distribution Locating the middle item (the median) of a frequency distribution is a bit different. First find the total number of items in the set by adding the frequencies. Then the median is the item whose position is given by the following formula. Position of median = n +1 2 = f +1 2 Example: Find the medians for the following distributions. Note: this formula gives the position, and not the value, of the median. Solution (a) Arrange the work as follows. Tabulate the values and frequencies, and the cumulative frequencies, which tell, for each different value, how many items have that value or a lesser value. Adding the frequencies shows that there are 20 items total Position of median = = = 10.5 The median, then, is the average of the tenth and eleventh items. To find these items, make use of the cumulative frequencies. Since the value 4 has a cumulative frequency of 10, the tenth item is 4 and the eleventh item is 5, making the median median = = = 4.5 Solution (b) From the cumulative frequency column, the fourteenth through the twenty- third items are all 6 s. This means the eighteenth item is a 6, so the median is 6. Page 5

6 2:3 MODE The third measure of central tendency is the mode. The mode of a data set is the value that occurs most often. 2:3:1 Pros and cons of using Mode as Measure of Central Tendency It is traditional to include the mode as a measure of central tendency, because many important kinds of data sets do have their most frequently occurring values centrally located. However, there is no reason the mode cannot be one of the least values in the set or one of the greatest. In such a case, the mode really is not a good measure of central tendency. The mode is the only measure that must always be equal to one of the data items of the distribution. In fact, more of the data items are equal to the mode than to any other number. When the data items being studied are nonnumeric, the mode may be the only usable measure of central tendency such as most popular brand, most stylish hat. A fashion shop planning to stock only one hat size for next season would want to know the mode (the most common) of all hat sizes among their potential customers. Likewise, a designer of family automobiles would be interested in the most common family size. In examples like these, designing for the mean or the median might not be right for anyone. Example 1: Suppose ten students earned the following scores on a business law examination. 74, 81, 39, 74, 82, 80, 100, 92, 74, 85 Notice that more students earned the score 74 than any other score. This is unimodal (has one mode) and the mode is 74. Example 2: Finding Modes for Sets of Data Find the mode for each set of data a) 51, 32, 49, 49, 74, 81, 92 b) 482, 485, 483, 485, 487, 487, 489 c) 10,708, 11,519, 10,972, 17,546, 13,905, 12,182 d) Solution a) 51, 32, 49, 49, 74, 81, 92 The number 49 occurs more often than any other. Therefore, 49 is the mode. The numbers do not need to be in numerical order when looking for the mode. b) 482, 485, 483, 485, 487, 487, 489 Both 485 and 487 occur twice. This list is said to have two modes, or to be bimodal. d) c) No number here occurs more than once. This list has no mode. The frequency distribution shows that the most frequently occurring value (and, thus, the mode) is 22. Page 6

7 Questions for Discussion Sections 1, 2, and 3 1. Find the mean and median for each of the following data sets by hand. Use summation notation and x- bar when showing your work. a) 1, 5, 6 b) 1, 3, 4, 89 c) 1, 3, 789, 9, , If the mean, median, and mode are all equal for the set (70, 110, 80, 60, x), find the value of x. Use the U.S. Airline Safety table for questions # Airline Fatalities in the United States The table pertains to scheduled commercial carriers. Fatalities data include those on the ground except for the September 11, 2001, terrorist attacks. Use a statistical calculator to enter and calculate the mean, median, and mode (if any) for a) Departures b) Fatal accidents c) Fatalities 4. The year 2001 was clearly an anomaly. If the data for that year are reduced by 4 fatal accidents and 265 fatalities, which of the three measures change and what are their new measures of central tendency for each of the following? a) Fatal accidents b) Fatalities 5. Following 2001, in what year did airline departures start to increase again? 6. For the frequency distribution shown, find: (a) mean (to the nearest tenth) (b) median, and (c) mode or modes (if any). 7. Scores on a Biology Exam The display here represents scores achieved on a 100- point biology exam by the 34 members of the class. Calculator the mean, median and mode Page 7

8 8. Calculating a Missing Test Score Katie Campbell s Business professor lost his grade book, which contained Katie s five test scores for the course. A summary of the scores (each of which was an integer from 0 to 100) indicates the following: The mean was 88. The median was 87. The mode was 92. (The data set was not bimodal.) What is the least possible number among the missing scores? 9. Mean, median, and mode are often requested in the same problem, as in #5. Are they equally useful? Which of the three is the least likely to be useful? What are advantages to using mean instead of median? To using median instead of mean? Give an example in which mean is useful and median would not make sense. Give an example in which median is useful and mean would not make sense. 10. Below is a college transcript showing 2 years of grades at a 4- year college. a) Make a frequency distribution table for the graded courses shown. b) Calculate the grade point average of the two years shown/ c) The college requires 120 credits in order to graduate. If the student takes 30 credits in classes that give grades in their third year, what average class grade must the student earn in order to finish year 3 with a 3.0 GPA? Page 8

9 2:4 Central Tendency in Grouped Frequency Histograms 2:4:1 Mean in grouped frequency data Finding the mean for the raw data and a histogram are not exactly the same (they are close, but not identical), although they are based upon the same data set. The mean for the raw data is computed using the values of all the observations. The mean of the histogram used only the midpoints of the classes and the number of observations in each class. For the convenience of summarizing the data we pay a price we give up some of the information and lose precision. The mean of a histogram depends upon the width and the number of classes we choose to display the raw data. Although you have been given some general guidelines for building histograms, there is no one best way to display the data. If you change the number or width of the classes, this will change the midpoints and the number of observations within each class. From the expression you should be able to see that the value of the mean might change slightly. Example: Imagine that we have taken the pulse rates of ten well- conditioned athletes. The data, which have already been rank- ordered, are shown below. Pulse rates of ten well- conditioned athletes (beats per minute) a) What is the mean or average for the ten numbers? b) Make a frequency table of pulse rate classes with their frequencies c) Make a grouped frequency histogram of the data d) Find the mean of the histogram. Solutions x = ( ) 10 a) = 38.2 Beats per minute b) Frequency Table Pulse rate classes Frequency (number of atheletes) 30 to to to to total 10 There is a quick way to estimate the mean of a histogram. The mean is at the balance point of the histogram. Think of the observations as weights and the x- axis of the histogram as the wooden board. Each class has a stack of weights, one for each observation. Below the wooden board is a steel rod. Move the rod back and forth. Where the board balances is the mean of the histogram Let s apply this idea of balance to this idea to the pulse rate histogram. At what pulse rate would the histogram balance? It wouldn t balance at 35 beats per minute because there would be too much weight (too many observations) to the right and the histogram would tilt down to the right. Stop to think: At roughly what pulse rate would the histogram balance? Please do this before reading on Page 9

10 c) I hope your estimate of the mean of the histogram was in the upper thirties. You can t do any better than that. After all, the balance point idea only provides a rough estimate of the mean. If you wish to obtain the exact value of the average for the histogram, you will need to compute it. The expression for computing the mean of a histogram is actually derived from the balance point idea. d) Take the number of observations in the first class and place the weight at the center or midpoint of the class. The midpoint of the first class is ( ) 2 = Now do it for all the classes. If you close your eyes, you should be able to see a two- pound weight hanging from the midpoint of the first class, a five- pound weight hanging from the second class, a two- pound weight hanging from the midpoint of the third class, and a one- pound weight hanging from the midpoint of the fourth class. Now, in order to compute the exact balance point (the mean), you must multiply the weights by their respective midpoints, sum them, and divide by the total number of observations. x = [(32.5 2) + (37.5 5) + (42.5 2) + (47.5 1) ] 10 = 38.5 beats per minute Hopefully, the computed value is close to your quick estimate based on the balance point idea. Questions for Discussion Section 4 Part 1 1) Now redefine the classes into widths of six (the present widths are five) starting at 30 beats per minute. 2) Redraw the histogram and compute its mean. The value of the mean will change slightly. Page 10

11 2:4:2 Median in grouped frequency data A graphical way to determine the median for tabled data in a histogram is presented next. Let s compute the median for the pulse rate data histogram. The histogram is redrawn below. Here is the procedure: 1) Determine which observations (or between which two observations) is the median value. As there are ten observations, the median is halfway between Observations 5 and 6. Five observations lie above and below the median. 2) Determine the class that contains the median. Observations 5 and 6 lie in the second class- the one that goes from 35 to beats per minute. How do we know this? The first two observations fall into the first class; the next five observations fall into the second class. These include Observations 3, 4, 5, 6, and 7. Thus Observations 5 and 6 fall into this class. 3) Space the observations out evenly within the class. To do this, divide the class that contains the median into subintervals of equal length, one for each observation in the class. In our example the number of observations in the class that contains the median is five. Therefore, subdivide the interval from 35 to beats per minute into five equal subintervals. This is shown below. Determine the widths of the subintervals by dividing the width of the interval ( = 4.99 or 5) by the number of observations in the interval (5). Thus each subinterval is equal to one beat per minute. 4) Place each of the observations in the class that contains the median observation(s) at the midpoint or center of each subinterval. Place the first observation in the class at the midpoint of the first subinterval and continue until all observations have been placed on the line graph. Locate the median observation(s) and read the corresponding value from the line graph. The median for ten observations lies halfway between observations 5 and 6. According to the line graph, halfway between these two observations is 38 beats per minute. The four- step procedure works every time. Page 11

12 While the mean of a histogram is its balance point, the median splits the histogram so that half the observations lie above it and half below it. Here are two more histograms. Stop to Think: For which histogram will the mean be larger than the median? Take a few minutes to think about your answer before reading below Hopefully you concluded that the mean will be larger than the median for Histogram A and the mean will be smaller than the median for histogram B. Histogram A is called a long right- hand tailed histogram. The mean will be further to the right (larger) than the median because the observations in Class 3 will cause the balance point to shift towards the right. The median will lie within Class 1 between Observations 5 and 6, but the mean will probably lie somewhere in the beginning of Class 2. The median will lie in Class 3 between Observations 5 and 6. Thus, for a long right- hand tailed histogram, the mean will be larger than the median. For a long left- hand tailed histogram, the mean will be smaller than the median. Stop to Think: Under what conditions would you expect the mean and median to be about the same? You are correct if you said for a histogram that is neither right- hand nor left- hand tailed. These are called near- symmetric histograms. 2:4:3 Symmetry in Data Sets The most useful way to analyze a data set often depends on whether the distribution is symmetric or non- symmetric. In a symmetric distribution, as we move out from the central point, the pattern of frequencies is the same (or nearly so) to the left and to the right. In a non- symmetric distribution, the patterns to the left and right are different. Figure 8 shows several types of symmetric distributions, while Figure 9 shows some non- symmetric distributions. Notice that a bimodal distribution may be either symmetric or non- symmetric. Page 12

13 Questions for Discussion Section 4 part 2 1. A t an urban university there are 7,000 undergraduates who are years old, 2,000 undergraduates who are years old, 1,000 undergraduates who are years old, and 1,000 undergraduates who are 36 years old or older. Without doing any math, explain in plain English why the mean will be larger than the median. 2. A group of 12 meteorology students were asked to guess at the average annual temperature in Reno, Nevada. Their guesses are shown in the histogram below. a. Using the balance point idea, estimate the mean of the histogram. b. Compute the mean and median of the histogram. c. What conclusion about the mean and median can you draw for a symmetric histogram? 3) The following data represent the daily high temperatures (in degrees Fahrenheit) for the month of June in a southwestern U.S. city. a) Make a histogram, choosing appropriate widths and classes. b) Use the balance point idea to estimate the mean from the histogram. c) Compute the mean and median from the histogram. d) Compute the mean and median from the raw data 4) Below are 50 observations shown as raw data and as a histogram. Which presentation is easier to understand and why? Page 13

14 Unit 2 Part 2: Measures of Spread/Measures of Dispersion INTRODUCTION A measure of center alone can be misleading. Two nations with the same median family income are very different if one has extremes of wealth and poverty and the other has little variation among families. A drug with the correct mean concentration of active ingredient is dangerous if some batches are much too high and others much too low. We are interested in the spread or variability of incomes and drug potencies as well as their centers. The simplest useful numerical description of a distribution consists of both a measure of center and a measure of spread. A proper measure of spread or variability should use all the data and provide us with information on how close the observations are to the mean or the median. Measures of Central Tendency versus Measures of Spread Central tendency and dispersion (or spread tendency ) are different and independent aspects of a set of data. Which one is more critical can depend on the specific situation. Example: Suppose tomatoes sell by the basket. Each basket costs the same, and each contains one dozen tomatoes. If you want the most fruit possible per dollar spent, you would look for the basket with the highest average weight per tomato (regardless of the dispersion of the weights). On the other hand, if the tomatoes are to be served on an hors d oeuvre tray where presentation is important, you would look for a basket with uniform- sized tomatoes, that is a basket with the lowest weigh dispersion (regardless of the average of the weights). See the illustration at the side. Example: Another situation involves target shooting (also illustrated at the side). The five hits on the top target are, on average, very close to the bulls eye, but the large dispersion (spread) implies that improvement will require much effort. On the other hand, the bottom target exhibits a poorer average, but the smaller dispersion means that improvement will require only a minor adjustment of the gun sights. (In general, consistent errors can be corrected more easily than more dispersed errors.) In this case, good consistency (lesser dispersion) is more desirable than a good average (central tendency). 2:5 THE RANGE In the past, you have learned about using range of data to measure its spread. The range of a data set is a straightforward measure but we will see that using range as a measure of spread has some serious deficiencies. For any set of data, the range of the set is defined as follows. Range = (greatest value in the set) (least value in the set). You are a newly appointed manager in charge of two production lines. Your goals are to increase the average production rates about 50 units per hour and achieve a relatively consistent hourly output. After all, you don t want to produce 100 widgets per hour on Monday and only 10 widgets per hour on Tuesday. This would drive your sales people up the wall because they could never be sure how long it would take to fill an order. On your first day, your assistant manager reports that the average production rate on the two production lines is 60 units per hour. Stop to think: Have you accomplished your goals? Why or why not? Page 14

15 Check your understanding: A mean of 60 units per hour tells you that one of your goals has been accomplished. Can you tell from the mean if you are obtaining a consistent production rate? In the table are hourly production lines over a several day period. While both data sets have the same average and median, there is a major difference between the two departments. The hourly production rate in Department 1 is very consistent, whereas the production rate is volatile in Department 2. You can see that the mean or the median can t capture all the information from the raw data. The ranges for the two data sets are = 2 and = 120, respectively. This Hourly Production Rates for Two Lines Department Department x- bar Median indicates that there is very little spread in the production rate of Department 1 and a large amount of spread in Department 2. Stop to think: Can you determine the weaknesses of the range as a measure of spread? The more obvious weakness is that in computing the range is that you use only two values from the data set. The rest of the data is ignored. How can range measure spread in the data if only two observations are used? At best, it is a quick estimate of spread. Also, you can easily be misled by the range. Example #2: Suppose two other data sets are collected and the ranges in both data sets are the same. Can you conclude that the amount of spread within the data is the same? Data set A While the range is 20 for Data Set A and Data Set B is the equal, is the amount of spread the same? I think you ll agree that Data Set B has more spread or variability. After all, all the numbers are different. The reason you can be misled by the range is that you only use the smallest and largest values and ignore the other observations. Example #3: Look at the points scored by Max and Molly on five different quizzes, as shown in the table to the right. The ranges for the two students make it tempting to conclude that Max is more consistent than Molly. However, Molly is actually more consistent, with the exception of one very poor score. That score, 6, is an outlier that, if not actually recorded in error, must surely be due to some special circumstance. (Notice that the outlier does not seriously affect Molly s median score, which is more typical of her overall performance than is her mean score.) The second weakness of the range is that it ignores the mean and median in calculating the spread in a data set. Page range Data Set B

16 2.6 PERCENTILES, QUARTILES, DECILES AND THE FIVE- NUMBER SUMMARY Remember: The simplest useful numerical description of a distribution consists of both a measure of center and a measure of spread. Using the median of the data as our measure of center, we can describe the spread or variability of a distribution by giving several percentiles. 2:6:1 Percentiles of a distribution The pth percentile of a distribution is the value such that p percent of the observations fall at or below that percentile. The median is the 50 th percentile, which means 50 percent of the observations fall at or below the media. So the use of percentiles to report spread is particularly appropriate when the median is our measure of center. Example: when you take the Scholastic Aptitude Test (SAT), or any other standardized test taken by large numbers of students, your raw score usually is converted to a percentile score. If you scored at the eighty- third percentile on the SAT, it means that you outscored approximately 83% of all those who took the test. (It does not mean that you got 83% of the answers correct.) 2:6:2 The Quartiles The most commonly used percentiles other than the median are the quartiles. The first quartile is the 25 th percentile, and the third quartile is the 75 th percentile. (The second quartile is the median itself.) To calculate a percentile, arrange the observations in increasing order and count up the required percent from the bottom of the list. Our definition of percentiles is a bit inexact, because there is not always a value with exactly p percent of the data at or below it. We will be content to take the nearest observation for most percentiles, but the quartiles are important enough to require an exact recipe. To calculate the quartiles: 1. Arrange the observations in increasing order and locate the median M in the ordered list of observations. 2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. 3. The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. Example The 2001 highway mileages of the 18 gasoline powered two seater cars, arranged in increasing order, are The median, M, is midway between the center pair of observations. We have marked its position in the list by. The first quartile is the median of the 9 observations to the left of the position of the median. It is the 5 th of these, Q1 = 21. Similarly, the third quartile is the median of the 9 observations to the right of the. Check that Q3 = 27. Example The 2001 highway mileages of the 11 minicompact cars are (in order): The median is the bold 25. The first quartile is the median of the five observations falling to the left of this point in the list, Q1 = 21. Similarly, Q3 = 28 Graphing Calculators: These values can usually be found in the 1- variable statistics screen if you scroll down. Note: not all statistical software uses the same rules for calculating the quartiles. The differences are too small to affect conclusions based on the data. Just use the values that your software gives you. Page 16

17 2:6:3 THE FIVE- NUMBER SUMMARY The five- number summary of a set of observations consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols, the five number summary is Minimum Q1 M Q3 Maximum These five numbers offer a reasonably complete description of center and spread. Range is found using the smallest and largest observations. These single observations tell us little about the distribution as a whole, but they give information about the tails of the distribution that is missing if we only know Q1, M, and Q3. The five- number summaries for highway gas mileages are for two- seaters and for minicompacts. The median describes the center of the data distribution; the quartiles show the spread of the center half of the data; the minimum and the maximum show the full spread of the data. 2:6:4 THE BOXPLOT (or BOX- AND- WHISKER PLOT) he five- number summary leads to another visual representation of a distribution, the boxplot. The figure to the right shows boxplots for both city and highway gas mileages for our two groups of cars. Because boxplots show less detail than histograms or stemplots, they are best used for side- by- side comparison of more than one distribution, as in the figure. A boxplot is a graph of the five- number summary. A central box spans the quartiles Q1 and Q3. A line in the box marks the median M. Lines extend from the box out to the smallest and largest observations. When you look at a boxplot, first locate the median, which marks the center of the distribution. Then look at the spread. The quartiles show the spread of the middle half of the data, and the extremes (the smallest and largest observations) show the spread of the entire data set. We see at once that city mileages are lower than highway mileages. The median gas mileages for the two groups are very close together, but the two- seaters are more variable. In particular, the two- seater group contains some cars with quite low gas mileages. Example Construct a box plot for the weekly study times data from Unit 1. Remember that forty students, selected randomly in the school cafeteria one morning, were asked to estimate the number of hours they had spent studying in the past week (including both in- class and out- of- class time). Find the 5- number summary yourself before checking your answers on the next page Page 17

18 Solution: To determine the quartiles and the minimum and maximum values more easily, we use the stem- and- leaf display (with leaves ranked), given in Unit The median will be the average of the 20 and 21 st data item. = From the stem and leaf display, there are 20 data items above and below the median, so the first quartile will be between item 10 and 11 and the third quartile will be between item 30 and Q 1 = = 24 Q 3 = = The minimum and maximum items are evident from the stem- and- leaf display. They are 12 and 72. The box plot is made by using a reference number line that includes the five- number summary and an appropriate scale. The box plot illustrates: 1. central tendency (the location of the median); 2. the location of the middle half of the data (the extent of the box); 3. dispersion (the range is the extent of the whiskers); and 4. skewness (the nonsymmetry of both the box and the whiskers). 2:6:5 The Interquartile Range IQR The interquartile range IQR is the distance between the first and third quartiles, IQR = Q 3 Q 1. The distance between the quartiles, the range of the center half of the data, is a more resistant measure of spread. This distance is called the interquartile range. You should be aware that no single numerical measure of spread, such as IQR, is very useful for describing skewed distributions. The two sides of a skewed distribution have different spreads. We can often detect skewness from the five- number summary by comparing how far the first quartile and the minimum are from the median (left tail) with how far the third quartile and the maximum (right tail) are from the median. The interquartile range is mainly used as the basis of a rule of thumb for identifying specific outliers. 2:6:6 The 1.5 x IQR criterion for suspected outliers Call an observation a suspected outlier if it falls more than 1.5 IQR above the third quartile or below the first quartile. Page 18

19 Example One of the most striking findings of the 2000 census was the growth of the Hispanic population in the United States. The table below presents the percent of adults (age 18 and over) in each of the 50 states who identified themselves in the 2000 census as Spanish/Hispanic/Latino. The range of the data was divided into classes of equal width. The data ranged from 0.6 to 38.7 so the classes were chosen as follows: 0.0 < percent Hispanic < percent Hispanic < percent Hispanic 40.0 Looking at the distribution, the five- number summary is The largest observation (New Mexico, 38.7% Hispanic) is an outlier. How shall we describe the spread of this distribution? The range of all the observations depends entirely on the smallest and largest observations and does not describe the spread of the majority of the data. For our data on Hispanics in the states, IQR = = 5.0. The quartiles and the IQR are not affected by changes in either tail of the distribution. They are therefore resistant, because changes in a few data points have no further effect once these points move outside the quartiles. For the Hispanics data: 1.5 IQR = = 7.5 Any values below = 5.5 or above = 14.5 are flagged as possible outliers. There are no low outliers, but 7 states are flagged as possible high outliers. This distribution is strongly skewed and has a quite compact middle half (small IQR). The histogram for the data suggest that only New Mexico is truly an outlier in the sense of deviating from the overall pattern of the distribution. The other 6 states are just part of the long right tail. You see that the 1.5 IQR rule does not remove the need to look at the distribution and use judgment. It is useful mainly when large volumes of data must be scanned automatically. Page 19

20 2:6:7 Modified boxplots We can modify boxplots to plot suspected outliers individual. In a modified boxplot, the lines extend out from the central box only to the smallest and largest observations that are not suspected outliers. Observations more than 1.5 IQR outside the box are plotted as individual points. See the figure below. The modified boxplot, and especially the histogram previously shown, tell us much more about the distribution than the five- number summaries or other numerical measures. 2:6:8 Deciles Deciles are the nine values (denoted D 1, D 2,...D 9 ) along the scale that divide a data set into ten (approximately) equal- sized parts, and quartiles are the three values ( n 1) that divide a data set into four (approximately) equal- sized parts. Since deciles and quartiles serve to position particular items within portions of a distribution, they also are measures of position. We can evaluate deciles by finding their equivalent percentiles. D1= P10, D2 = P20, D3= P30,, D9= P90 2:6:9 Summary The routine methods of statistics compute numerical measures and draw conclusions based on their values. These methods are useful, and we will study them carefully. But they cannot be applied blindly, by feeding data to a computer program, because statistical measures and methods based on them are generally meaningful only for distributions of sufficiently regular shape. This principle will become clearer as we progress, but it is good to be aware at the beginning that quickly resorting to fancy calculations is the mark of a statistical amateur. Look, think, and choose your calculations selectively. Page 20

21 Questions for Discussion Section 6 1) In a national standardized test, Kimberly Austin scored at the ninety- second percentile. If 67,500 individuals took the test, about how many scored higher than Kimberly did? 2) Let the three quartiles (from least to greatest) for a large population of scores be denoted Q 1,Q 2,Q 3. (a) Is it necessarily true that Q 2 Q 1 = Q 3 Q 2? (b) Explain your answer to part (a). 3) Leading U.S. Trade Partners Countries in the table are ranked by value of 2008 imports to the United States from the countries. Exports are from the United States to the countries. Use this information to answer the questions below. a) Determine which country occupies i. the fifteenth percentile in population ii. third quartile in exports iii. The fourth decile in imports iv. the first quartile in exports b) Determine who was relatively higher: China in imports or Canada in exports. c) Give the five- number summary for both imports and exports. d) Construct box plots for both exports and imports, one above the other in the same drawing. e) What does your box plot of exports indicate about the following characteristics of the exports data? i. the central tendency ii. the dispersion iii. the location of the middle half of the data items i. Comparing your two box plots, what can you say about the 2008 trade balance with this group of countries? g) Comparing your two box plots, what can you say about the 2008 trade balance with this group of countries? Page 21

22 2:7 VARIANCE AND STANDARD DEVIATION Remember: The simplest useful numerical description of a distribution consists of both a measure of center and a measure of spread. The five- number summary is not the most common numerical description of a distribution. Using the mean of the data as our measure of center, we can describe the spread or variability of a distribution by using the variance and standard deviation. Suppose a company had ten employees and the average (mean) salary was $100,000 per year. Your inclination is probably to say Where do I sign up? But think about it for a minute. If the average salary is $100,000 that does not guarantee that you ll be getting big bucks. Suppose that nine employees earn $10,000 per year and the big boss earns $910,000 per year. The average for the ten people is $100,000 yet none of the ten salaries are close to the mean. If we had computed a measure of spread which used the mean, its value would have been large. This would have warned you that the ten observations are not all close to the mean. You should not have expected to obtain a salary that is close to the mean salary of $100,000. On the other hand, if the value of the spread had been low, then you could have concluded that all the observations are close to the mean. And if you took a job with the company, your salary would be close to $100,000. With the proper measure of spread, you can tell how much meaning you can attach to the mean and whether it represents the data or not. 2:7:1 Variance of data The variance s 2 of a set of observations is the average of the squares of the deviations of the observations from their mean x. In symbols, the variance of n observations x 1, x 2,..., x n is: s 2 = (x 1 x) 2 + (x 2 x) (x n x) 2 n 1 or in more compact notation: s 2 = 1 n 1 Σ(x Σ(x x)2 x)2 = n 1 The deviations x i x display the spread of the values xi about their mean x. Some of these deviations will be positive and some negative because some observations fall on each side of the mean. In fact, the sum of the deviations of the observations from their mean will always be zero. Squaring the deviations makes them all positive so that observations far from the mean in either direction have large positive squared deviations. The variance is the average squared deviation. Therefore, s 2 and s will be large if the observations are widely spread about their mean, and small if the observations are all close to the mean. The individual steps involved in this calculation are as follows: Step 1: Calculate x, the mean of the numbers Step 2: Find the deviations from the mean Step 3: Square each deviation Step 4: Sum the squared deviations Step 5: Divide the sum in Step 4 by n 1 Properties of the variance 1) Unlike the range, all the data are used. 2) Unlike the range, the variance doesn t just measure spread or dispersion, but measures spread or dispersion around the mean. 3) The variance can never be negative because we sum the squared dispersions around the mean. When the variance equals zero, this means that all the numbers are equal to the mean; there is no dispersion about the mean. So if you ever compute a negative variance, you can be sure that you made a math error. You can t have less than no dispersion! Page 22

23 2:7:2 Standard Deviation, s The standard deviation measures spread by looking at how far the observations are from their mean. The sample standard deviation, s, is the square root of the variance s 2. (The standard deviation of a population (rather than the sample of the population) is denoted σ, the lowercase Greek letter sigma.) Σ(x x)2 1 s = = Σ(x x)2 n 1 n 1 Step 6: Take the square root of the quotient in Step 5 above. The standard deviation is merely the positive square root of the variance. It tells you how far the observations are away from the mean. Properties of the Standard Deviation 1) s measures spread about the mean and should be used only when the mean is chosen as the measure of center. 2) S = 0 only when there is no spread. This happens only when all observations have the same value. Otherwise, s > 0. As the observations become more spread out about their mean, s gets larger. 3) S, like the mean x, is not resistant. A few outliers can make s very large. The use of squared deviations renders s even more sensitive than x to a few extreme observations. For example, dropping the Honda Insight from our list of two- seater cars reduces the mean highway mileage from 25.8 MPG to 23.4 MPG. It cuts the standard deviation more than half, from 11.4 MPG with the Insight to 5.3 MPG without it. Distributions with outliers and strongly skewed distributions have large standard deviations. The number s does not give much helpful information about such distributions. Working one or two short examples by hand helps you understand how the standard deviation is obtained. In practice you will always use either software or a calculator that will find s from keyed- in data Example: Find the variance and standard deviation of the data set: 31, 41, 53, 57 Step 1: Find the mean of the sample, x Add these values and divide by the total number of values, 5. The mean is 46. Step 2: Find the Deviations from the Mean of each observation, x i x To find the deviations from the mean, subtract 46 from each data value. Step 3: Square each deviation from the mean x i x ( ) 2 We cannot obtain a measure of dispersion by finding the mean of the deviations, because this number is always 0, since the positive deviations just cancel out the negative ones. To avoid this problem of positive and negative numbers canceling each other, we square each deviation. Page 23 To check your work, add the deviations. The sum of the deviations for a set of data should be 0.

24 The chart shows the squares of the deviations for the data set. Step 4: Sum the squares ( x x) 2 = = 392 Step 5: divide by the number of observations n in the population or by n 1 if a sample population*. ( ) 2 n 1 x x The variance, the average of the squared deviations, can now be found by dividing their sum by the number of data values n (5 in this case), which we would do if our data values composed a population. However, since we are considering the data to be a sample, we divide by n 1instead (this is explained later) s 2 = = 98 4 Step 6: To find the sample standard deviation, take the positive square root of the variance. This makes up, in a way, for squaring the deviations earlier, and gives a kind of average of the deviations from the mean. Continuing our calculations from the chart above, we obtain s = ( x x ) 2 n 1 = = = For actual calculation purposes, we recommend the use of a scientific calculator, or a statistical calculator, that does all the detailed steps automatically. Example (graphing calculator): A person s metabolic rate is the rate at which the body consumes energy. Metabolic rate is important in studies of weight gain, dieting, and exercise. Here are the metabolic rates of 7 men who took part in a study of dieting (the units are calories per 24 hours. These are the same calories used to describe the energy content of foods.) Enter these data into your calculator and verify that x = 1600 calories and s = calories The table below plots these data as dots on the calorie scale, with their mean marked by an asterisk (*). The arrows mark two of the deviations from the mean. If you were calculating s by hand, you would find the first deviation as x i x = = 192 The idea of the variance is straightforward: it is the average of the squares of the deviations of the observations from their mean. The details we have just presented, however, raise some questions which we will explore further. Page 24.

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

3: Summary Statistics

3: Summary Statistics 3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

AP * Statistics Review. Descriptive Statistics

AP * Statistics Review. Descriptive Statistics AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.) Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html

More information

2 Describing, Exploring, and

2 Describing, Exploring, and 2 Describing, Exploring, and Comparing Data This chapter introduces the graphical plotting and summary statistics capabilities of the TI- 83 Plus. First row keys like \ R (67$73/276 are used to obtain

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

More information

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MATH 103/GRACEY PRACTICE EXAM/CHAPTERS 2-3. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. MATH 3/GRACEY PRACTICE EXAM/CHAPTERS 2-3 Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) The frequency distribution

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

Bar Graphs and Dot Plots

Bar Graphs and Dot Plots CONDENSED L E S S O N 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. Unit 1 Number Sense In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. BLM Three Types of Percent Problems (p L-34) is a summary BLM for the material

More information

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

consider the number of math classes taken by math 150 students. how can we represent the results in one number? ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.

More information

Measures of Central Tendency and Variability: Summarizing your Data for Others

Measures of Central Tendency and Variability: Summarizing your Data for Others Measures of Central Tendency and Variability: Summarizing your Data for Others 1 I. Measures of Central Tendency: -Allow us to summarize an entire data set with a single value (the midpoint). 1. Mode :

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers 1.3 Measuring Center & Spread, The Five Number Summary & Boxplots Describing Quantitative Data with Numbers 1.3 I can n Calculate and interpret measures of center (mean, median) in context. n Calculate

More information

Exploratory Data Analysis. Psychology 3256

Exploratory Data Analysis. Psychology 3256 Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Chapter 1 Review 1. As part of survey of college students a researcher is interested in the variable class standing. She records a 1 if the student is a freshman, a 2 if the student

More information

Topic 9 ~ Measures of Spread

Topic 9 ~ Measures of Spread AP Statistics Topic 9 ~ Measures of Spread Activity 9 : Baseball Lineups The table to the right contains data on the ages of the two teams involved in game of the 200 National League Division Series. Is

More information

Frequency Distributions

Frequency Distributions Descriptive Statistics Dr. Tom Pierce Department of Psychology Radford University Descriptive statistics comprise a collection of techniques for better understanding what the people in a group look like

More information

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Introduction; Descriptive & Univariate Statistics

Introduction; Descriptive & Univariate Statistics Introduction; Descriptive & Univariate Statistics I. KEY COCEPTS A. Population. Definitions:. The entire set of members in a group. EXAMPLES: All U.S. citizens; all otre Dame Students. 2. All values of

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Mind on Statistics. Chapter 2

Mind on Statistics. Chapter 2 Mind on Statistics Chapter 2 Sections 2.1 2.3 1. Tallies and cross-tabulations are used to summarize which of these variable types? A. Quantitative B. Mathematical C. Continuous D. Categorical 2. The table

More information

Students summarize a data set using box plots, the median, and the interquartile range. Students use box plots to compare two data distributions.

Students summarize a data set using box plots, the median, and the interquartile range. Students use box plots to compare two data distributions. Student Outcomes Students summarize a data set using box plots, the median, and the interquartile range. Students use box plots to compare two data distributions. Lesson Notes The activities in this lesson

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Chapter 3. The Normal Distribution

Chapter 3. The Normal Distribution Chapter 3. The Normal Distribution Topics covered in this chapter: Z-scores Normal Probabilities Normal Percentiles Z-scores Example 3.6: The standard normal table The Problem: What proportion of observations

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck! STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.

More information

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces Or: How I Learned to Stop Worrying and Love the Ball Comment [DP1]: Titles, headings, and figure/table captions

More information

Midterm Review Problems

Midterm Review Problems Midterm Review Problems October 19, 2013 1. Consider the following research title: Cooperation among nursery school children under two types of instruction. In this study, what is the independent variable?

More information

THE BINOMIAL DISTRIBUTION & PROBABILITY

THE BINOMIAL DISTRIBUTION & PROBABILITY REVISION SHEET STATISTICS 1 (MEI) THE BINOMIAL DISTRIBUTION & PROBABILITY The main ideas in this chapter are Probabilities based on selecting or arranging objects Probabilities based on the binomial distribution

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Ch. 3.1 # 3, 4, 7, 30, 31, 32

Ch. 3.1 # 3, 4, 7, 30, 31, 32 Math Elementary Statistics: A Brief Version, 5/e Bluman Ch. 3. # 3, 4,, 30, 3, 3 Find (a) the mean, (b) the median, (c) the mode, and (d) the midrange. 3) High Temperatures The reported high temperatures

More information

Chapter 2: Frequency Distributions and Graphs

Chapter 2: Frequency Distributions and Graphs Chapter 2: Frequency Distributions and Graphs Learning Objectives Upon completion of Chapter 2, you will be able to: Organize the data into a table or chart (called a frequency distribution) Construct

More information

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction CA200 Quantitative Analysis for Business Decisions File name: CA200_Section_04A_StatisticsIntroduction Table of Contents 4. Introduction to Statistics... 1 4.1 Overview... 3 4.2 Discrete or continuous

More information

Working with whole numbers

Working with whole numbers 1 CHAPTER 1 Working with whole numbers In this chapter you will revise earlier work on: addition and subtraction without a calculator multiplication and division without a calculator using positive and

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. STATISTICS/GRACEY PRACTICE TEST/EXAM 2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Identify the given random variable as being discrete or continuous.

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

Mean, Median, and Mode

Mean, Median, and Mode DELTA MATH SCIENCE PARTNERSHIP INITIATIVE M 3 Summer Institutes (Math, Middle School, MS Common Core) Mean, Median, and Mode Hook Problem: To compare two shipments, five packages from each shipment were

More information

a. mean b. interquartile range c. range d. median

a. mean b. interquartile range c. range d. median 3. Since 4. The HOMEWORK 3 Due: Feb.3 1. A set of data are put in numerical order, and a statistic is calculated that divides the data set into two equal parts with one part below it and the other part

More information

How To Write A Data Analysis

How To Write A Data Analysis Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

More information

2. Filling Data Gaps, Data validation & Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics 2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)

More information

COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY EXERCISE 8/5/2013. MEASURE OF CENTRAL TENDENCY: MODE (Mo) MEASURE OF CENTRAL TENDENCY: MODE (Mo)

COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY EXERCISE 8/5/2013. MEASURE OF CENTRAL TENDENCY: MODE (Mo) MEASURE OF CENTRAL TENDENCY: MODE (Mo) COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY Prepared by: Jess Roel Q. Pesole CENTRAL TENDENCY -what is average or typical in a distribution Commonly Measures: 1. Mode. Median 3. Mean quantified

More information

Box-and-Whisker Plots

Box-and-Whisker Plots Mathematics Box-and-Whisker Plots About this Lesson This is a foundational lesson for box-and-whisker plots (boxplots), a graphical tool used throughout statistics for displaying data. During the lesson,

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

First Midterm Exam (MATH1070 Spring 2012)

First Midterm Exam (MATH1070 Spring 2012) First Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notecard. Calculators are allowed, but other electronics are prohibited. 1. [40pts] Multiple Choice Problems

More information

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab 1 Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab I m sure you ve wondered about the absorbency of paper towel brands as you ve quickly tried to mop up spilled soda from

More information

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: What do the data look like? Data Analysis Plan The appropriate methods of data analysis are determined by your data types and variables of interest, the actual distribution of the variables, and the number of cases. Different analyses

More information

Practice#1(chapter1,2) Name

Practice#1(chapter1,2) Name Practice#1(chapter1,2) Name Solve the problem. 1) The average age of the students in a statistics class is 22 years. Does this statement describe descriptive or inferential statistics? A) inferential statistics

More information

Chapter 3 Review Math 1030

Chapter 3 Review Math 1030 Section A.1: Three Ways of Using Percentages Using percentages We can use percentages in three different ways: To express a fraction of something. For example, A total of 10, 000 newspaper employees, 2.6%

More information

Interpreting Data in Normal Distributions

Interpreting Data in Normal Distributions Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data

Introduction to Environmental Statistics. The Big Picture. Populations and Samples. Sample Data. Examples of sample data A Few Sources for Data Examples Used Introduction to Environmental Statistics Professor Jessica Utts University of California, Irvine jutts@uci.edu 1. Statistical Methods in Water Resources by D.R. Helsel

More information

Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

More information

Shape of Data Distributions

Shape of Data Distributions Lesson 13 Main Idea Describe a data distribution by its center, spread, and overall shape. Relate the choice of center and spread to the shape of the distribution. New Vocabulary distribution symmetric

More information

AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

AP STATISTICS REVIEW (YMS Chapters 1-8)

AP STATISTICS REVIEW (YMS Chapters 1-8) AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous Chapter 2 Overview Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Classify as categorical or qualitative data. 1) A survey of autos parked in

More information

Describing and presenting data

Describing and presenting data Describing and presenting data All epidemiological studies involve the collection of data on the exposures and outcomes of interest. In a well planned study, the raw observations that constitute the data

More information

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey): MATH 1040 REVIEW (EXAM I) Chapter 1 1. For the studies described, identify the population, sample, population parameters, and sample statistics: a) The Gallup Organization conducted a poll of 1003 Americans

More information

Probability Distributions

Probability Distributions CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Unit 9 Describing Relationships in Scatter Plots and Line Graphs Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear

More information

Sta 309 (Statistics And Probability for Engineers)

Sta 309 (Statistics And Probability for Engineers) Instructor: Prof. Mike Nasab Sta 309 (Statistics And Probability for Engineers) Chapter 2 Organizing and Summarizing Data Raw Data: When data are collected in original form, they are called raw data. The

More information

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175) Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

More information

2 Sample t-test (unequal sample sizes and unequal variances)

2 Sample t-test (unequal sample sizes and unequal variances) Variations of the t-test: Sample tail Sample t-test (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing

More information

Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

More information

3 Describing Distributions

3 Describing Distributions www.ck12.org CHAPTER 3 Describing Distributions Chapter Outline 3.1 MEASURES OF CENTER 3.2 RANGE AND INTERQUARTILE RANGE 3.3 FIVE-NUMBER SUMMARY 3.4 INTERPRETING BOX-AND-WHISKER PLOTS 3.5 REFERENCES 46

More information

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13

CHAPTER THREE COMMON DESCRIPTIVE STATISTICS COMMON DESCRIPTIVE STATISTICS / 13 COMMON DESCRIPTIVE STATISTICS / 13 CHAPTER THREE COMMON DESCRIPTIVE STATISTICS The analysis of data begins with descriptive statistics such as the mean, median, mode, range, standard deviation, variance,

More information

Revision Notes Adult Numeracy Level 2

Revision Notes Adult Numeracy Level 2 Revision Notes Adult Numeracy Level 2 Place Value The use of place value from earlier levels applies but is extended to all sizes of numbers. The values of columns are: Millions Hundred thousands Ten thousands

More information

Section 1.3 Exercises (Solutions)

Section 1.3 Exercises (Solutions) Section 1.3 Exercises (s) 1.109, 1.110, 1.111, 1.114*, 1.115, 1.119*, 1.122, 1.125, 1.127*, 1.128*, 1.131*, 1.133*, 1.135*, 1.137*, 1.139*, 1.145*, 1.146-148. 1.109 Sketch some normal curves. (a) Sketch

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Week 3&4: Z tables and the Sampling Distribution of X

Week 3&4: Z tables and the Sampling Distribution of X Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

More information