Statistics Chapter 2 Frequency Tables A frequency table organizes quantitative data. partitions data into classes (intervals). shows how many data values are in each class. Test Score Number of Students 61-70 4 71-80 8 81-90 15 91-100 7 A frequency table partitions data into classes or intervals and shows how many data values are in each class. The classes or intervals are constructed so that each data falls into exactly one class. If the frequency is converted into percentage of individuals then we have a relative frequency table. Data Classes and Class Frequency Class: an interval of values. Example: 61 x 70 Frequency: the number of data values that fall within a class. Four data fall within the class 61 x 70. Relative Frequency: the proportion of data values that fall within a class. 11.7% of the data fall within the class 61 x 70. Structure of a Data Class A data class is basically an interval on a number line. It has: A lower limit a and an upper limit b. A width. A lower boundary and an upper boundary (integer data). A midpoint. 1 P a g e
Example - If a = 60 and b = 69 for integer data, what is the value of the lower boundary? a). 60 b). 59.5 c). 9 d). 64.5 Constructing Data Classes Find the class width largest data values smallest data values class width Desired number of classes Increase the computed value to the next higher whole number. Find the class limits. The lower limit of the leftmost class is set equal to the smallest value in the data set. The lower class limit is the lowest data values that can fit in a class. The upper class limit is the highest data values that can fit in a class. The class width is the difference between the lower class limit of one class and the lower class limit of the next class. Find the class boundaries (integer data). Subtract 0.5 from the lower class limit and add 0.5 to the upper class limit. Example - For a certain data set, the minimum value is 25 and the maximum value is 58. If you wish to partition the data into 5 classes, what would be the class width? a). 5 b). 6 c). 7 d). 8 Building a Frequency Table Find the class width, class limits, and class boundaries of the data. Use Tally marks to count the data in each class. Record the frequencies (and relative frequencies if desired) on the table. class frequency Relative Frequency total of all frequency 2 P a g e
Example - A task force to encourage car pooling conducted a study of one-way commuting distances of workers in the downtown Dallas area. A random sample of 60 of these workers was taken. The commuting distances of the workers in the sample in miles are as follows. 13 47 10 3 16 20 17 40 4 2 7 25 8 21 19 15 3 17 14 6 12 45 1 8 4 16 11 18 23 12 6 2 14 13 7 15 46 12 9 18 34 13 41 28 36 17 24 27 29 9 14 26 10 24 37 31 8 16 12 16 Make a frequency table for these data with six classes. 3 P a g e
Histograms Histogram graphical summary of a frequency table. Uses bars to plot the data classes versus the class frequencies. A graphical representation of this information can be useful. A histogram uses bars to represent each class, where the width of the bar is the class width and the height of the bar is the class frequency. Making a Histogram Make a frequency table. Place class boundaries on horizontal axis. Place frequencies on vertical axis. For each class, draw a bar with height equal to the class frequency Example Use the data from the commuting distances of workers in the downtown Dallas area, to make a histogram. 4 P a g e
Distribution Shapes Critical Thinking A bimodal distribution shape might indicate that the data are from two different populations. Outliers data values that are very different from other values in the data set. Outliers may indicate data recording errors. Outliers in a data set are data values that are very different from other measurements in the data set. They many indicate that an error occurred or the data may be an actual data point. Do you include Outliers in statistical analysis? It depends, any decision about outliers should include people what are familiar with the field and the purpose of the study. Exploratory Data Analysis EDA is the process of learning about a data set by creating graphs. EDA specifically looks for patterns and trends in the data. EDA also identifies extreme values. Graphical Displays represent the data. induce the viewer to think about the substance of the graphic. should avoid distorting the message of the data. 5 P a g e
Bar Graphs Used for qualitative or quantitative data. Can be vertical or horizontal. Bars are uniformly spaced and have equal widths. Length/height of bars indicate counts or percentages of the variable. Good practice requires including titles and units and labeling axes. Below is an example of a cluster bar graph because there are two bars for year of birth. One bar represents the life expectancy of men and the other represents the life expectancy of women. The height of each bar represents the life expectancy in years. Pareto Charts A bar chart with two specific features: Heights of bars represent frequencies. Bars are vertical and are ordered from tallest to shortest. All graphs need the following: Title, both axis labeled, both axis have a scale. 6 P a g e
Example The data below represents student s responses for reasons for being late for the months of September October. Make a bar Pareto graph showing the causes for lateness. Cause Frequency Snoozing after alarm goes off 15 Car trouble 5 Too long of breakfast 13 Last-minute studying 20 Finding something to wear 8 Talking too long with roommate 9 Other 3 Circle Graphs/Pie Charts Used for qualitative data Wedges of the circle represent proportions of the data that share a common characteristic. Good practice requires including a title and either wedge labels or legend. 7 P a g e
Example The following data comes from a survey reported in USA Today. How long do we spend talking on the phone in the evening (after 5pm)? 500 people were surveyed. Time Number Fractional Part Percentage Number of Degrees <30 Minutes 296 296/500 59.2 0.592 * 360 = 213 30min. 1 hour 83 83/500 16.6 0.166 * 360 = 60 >1 hour 121 Total Fill in the above table, and draw a circle graph representing the data. Time-Series Shows data measurements in chronological order. Data are plotted in order of occurrence at regular intervals over a period of time. Example Suppose you have been in the walking/jogging exercise program for 20 weeks, and for each week you have recorded the distance you covered in 30 minutes. Week 1 2 3 4 5 6 7 8 9 10 Distance 1.5 1.4 1.7 1.6 1.9 2.0 1.8 2.0 1.9 2.0 Make a Time series graph for the above table 8 P a g e
Critical Thinking Which type of graph to use? Bar graphs are useful for quantitative or qualitative data. Pareto charts identify the frequency in decreasing order. Circle graphs display how a total is dispersed into several categories. Time-series graphs display how data change over time. Example - What type of graph would be best for showing the ice cream flavor preferences of a group of 100 children? a). Histogram c). Time series graph b). Pareto graph d). Circle graph Stem and Leaf Plots Displays the distribution of the data while maintaining the actual data values. Each data value is split into a stem and a leaf. A stem-and-leaf display is a method of exploratory data analysis that is used to rank-order and arrange data into groups. Stem-and-leaf displays organize numbers in much the same way alphabetization organizes words. 9 P a g e
Stem and Leaf Plot Construction Critical Thinking- By looking at the stem-and-leaf display sideways, we can see the distribution shape of the data. Large gaps between stems containing leaves, especially at the top or bottom, suggest the existence of outliers. Watch the outliers are they data errors or simply unusual data values? 10 P a g e
Example For the below table of data points from the winning scores of the conference championship games over the last 35 years make a stem-and-leaf plot of the data. 132 118 124 109 104 101 125 83 99 131 98 125 97 106 112 92 120 103 111 117 135 143 112 112 116 106 117 119 110 105 128 112 126 105 102 11 P a g e