Chapter 2 Student Lecture Notes 2-1 Department of Quantitative Methods & Information Systems Business Statistics: Chapter 2 Graphs, Charts, and Tables Describing Your Data QMIS 120 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should be able to: Construct a frequency distribution both manually and with a computer Construct and interpret a histogram Create and interpret bar charts, pie charts, and stem-and-leaf diagrams Present and interpret data in line charts and scatter diagrams Chap 2-2
Chapter 2 Student Lecture Notes 2-2 Raw data Ages (in years) of 20 students selected from CBA are reported in the way they are collected. The data values are recorded in the following table. Ages of 20 Students 21 19 24 18 20 19 30 22 24 23 20 21 22 25 23 19 20 18 24 25 Chap 2-3 Raw data The same students were asked about their status. The responses of the sample are recorded in the following table Status of 20 Students J F F J S J F S J J F F S J J F S S S S Chap 2-4
Chapter 2 Student Lecture Notes 2-3 Frequency Distributions What is a Frequency Distribution? A frequency distribution is a list or a table containing the values of a variable (or a set of ranges within which the data fall)... and the corresponding frequencies with which each value occurs (or frequencies with which data fall within each range) Chap 2-5 Frequency Distributions Weekly Earnings of 100 Employees of a company Weekly Earnings (dollars) 401 to 600 601 to 800 801 to 1000 1001 to 1200 1201 to 1400 1401 to 1600 Number of employees f 9 22 39 15 9 6 Chap 2-6
Chapter 2 Student Lecture Notes 2-4 Why Use Frequency Distributions? A frequency distribution is a way to summarize data The distribution condenses the raw data into a more useful form... and allows for a quick visual interpretation of the data Chap 2-7 Frequency Distribution: Discrete Data Discrete data: possible values are countable Example: An advertiser asks 200 customers how many days per week they read the daily newspaper. Row Data 5,6,1,2,4,5,7,2,3,5,1,3,2,5,0,2,2,0,7,7,1,2,4,3,5,6,7,1,1,1,1,2,5,0,0,0,1,20,7,5,3,6,2,1,6,2,1,4,2,4,5,3,1,0,2,3,6,5,7,4,1,2,3,5,6,1,0, 0,0,0,0,1,1,1,2,3,5,1,4.. Chap 2-8
Chapter 2 Student Lecture Notes 2-5 Frequency Distribution: Discrete Data It is called Single-Value approach Number of days read Frequency 0 44 1 24 2 18 3 16 4 20 5 22 6 26 7 30 Total 200 Chap 2-9 Relative Frequency Relative Frequency: What proportion is in each category? Number of days read Frequency Relative Frequency 0 44.22 1 24.12 2 18.09 3 16.08 4 20.10 5 22.11 6 26.13 7 30.15 Total 200 1.00 44.22 200 22% of the people in the sample report that they read the newspaper 0 days per week Chap 2-10
Chapter 2 Student Lecture Notes 2-6 Frequency Distribution: Discrete Data Example: Construct a frequency distribution table for the following data Team Home Runs Team Home Runs Anaheim 152 Milwaukee 139 Arizona 165 Minnesota 167 Atlanta 164 Montreal 162 Baltimore 165 New York Mets 160 Boston 177 New York Yankees 223 Chicago Cubs 200 Oakland 205 Chicago White Sox 217 Philadelphia 165 Cincinnati 169 Pittsburgh 142 Cleveland 192 St. Louis 175 Colorado 152 San Diego 136 Detroit 124 San Francisco 198 Florida 146 Seattle 152 Houston 167 Tampa Bay 133 Kansas City 140 Texas 230 Los Angeles 155 Toronto 187 Home Runs Hit by Major League Baseball Teams During the 2002 Season Chap 2-11 Frequency Distribution: Discrete Data Chap 2-12
Chapter 2 Student Lecture Notes 2-7 Frequency Distribution: Continuous Data Continuous Data: may take on any value in some interval Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27 Temperature is a continuous variable because it could be measured to any degree of precision desired Chap 2-13 Grouping Data by Classes Chap 2-14
Chapter 2 Student Lecture Notes 2-8 Frequency Distribution Example Data from low to high: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Chap 2-15 Frequency Histograms The classes or intervals are shown on the horizontal axis frequency is measured on the vertical axis Bars of the appropriate heights can be used to represent the number of observations within each class Such a graph is called a histogram Chap 2-16
Frequency Chapter 2 Student Lecture Notes 2-9 Histogram Example Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 7 6 5 4 3 2 1 0 0 3 Histogram 6 0 5 10 15 20 25 30 3640 45 50 55 60 More Class Class Midpoints Endpoints 5 4 2 0 No gaps between bars, since continuous data Chap 2-17 Questions for Grouping Data into Classes 1. How wide should each interval be? (How many classes should be used?) 2. How should the endpoints of the intervals be determined? Often answered by trial and error, subject to user judgment The goal is to create a distribution that is neither too "jagged" nor too "blocky Goal is to appropriately show the pattern of variation in the data Chap 2-18
Frequency 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 More Frequency Chapter 2 Student Lecture Notes 2-10 How Many Class Intervals? Many (Narrow class intervals) may yield a very jagged distribution with gaps from empty classes Can give a poor indication of how frequency varies across classes 3.5 3 2.5 2 1.5 1 0.5 0 Temperature (X axis labels are upper class endpoints) Few (Wide class intervals) may compress variation too much and yield a blocky distribution can obscure important patterns of variation. 12 10 8 6 4 2 0 0 30 60 More Temperature (X axis labels are upper class endpoints) Chap 2-19 General Guidelines Number of Data Points Number of Classes under 50 5-7 50 100 6-10 100 250 7-12 over 250 10-20 Class widths can typically be reduced as the number of observations increases Distributions with numerous observations are more likely to be smooth and have gaps filled since data are plentiful Chap 2-20
Chapter 2 Student Lecture Notes 2-11 Class Width The class width is the distance between the lowest possible value and the highest possible value for a frequency class The class width is W = Largest Value - Smallest Value Number of Classes Chap 2-21 Histograms in Excel 1 Select Data Tab 2 Data Analysis 3 Choose Histogram Chap 2-22
Chapter 2 Student Lecture Notes 2-12 Histograms in Excel (continued) 4 Input data and bin ranges Select Chart Output Chap 2-23 Ogives An Ogive is a graph of the cumulative relative frequencies from a relative frequency distribution Ogives are sometime shown in the same graph as a relative frequency histogram Chap 2-24
Frequency Cumulative Frequency (%) Chapter 2 Student Lecture Notes 2-13 Ogives (continued) 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Chap 2-25 Ogive Example 7 6 5 4 3 2 1 0 Histogram 0 5 10 15 20 25 30 3640 45 50 55 60 More Class Class Midpoints Endpoints 100 80 60 40 20 0 Chap 2-26
Chapter 2 Student Lecture Notes 2-14 Ogives in Excel Excel will show the Ogive graphically if the Cumulative Percentage option is selected in the Histogram dialog box Chap 2-27 Other Graphical Presentation Tools Categorical Data Quantitative Data Bar Chart Pie Charts Stem and Leaf Diagram Chap 2-28
Chapter 2 Student Lecture Notes 2-15 Bar and Pie Charts Bar charts and Pie charts are often used for qualitative (category) data Height of bar or size of pie slice shows the frequency or percentage for each category Chap 2-29 Bar Chart Example 1 Investor's Portfolio Savings CD Bonds Stocks 0 10 20 30 40 50 Amount in $1000's (Note that bar charts can also be displayed with vertical bars) Chap 2-30
Freuency Chapter 2 Student Lecture Notes 2-16 Bar Chart Example 2 Number of days read Frequency 0 44 1 24 2 18 3 16 4 20 5 22 6 26 7 30 Total 200 50 40 30 20 10 0 Newspaper readership per week 0 1 2 3 4 5 6 7 Number of days newspaper is read per week Chap 2-31 Pie Chart Example Current Investment Portfolio Investment Amount Percentage Type (in thousands $) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110 100 CD 14% Savings 15% Stocks 42% (Variables are Qualitative) Bonds 29% Percentages are rounded to the nearest percent Chap 2-32
Chapter 2 Student Lecture Notes 2-17 Tabulating and Graphing Multivariate Categorical Data Investment in thousands of dollars Investment Investor A Investor B Investor C Total Category Stocks 46.5 55 27.5 129 Bonds 32.0 44 19.0 95 CD 15.5 20 13.5 49 Savings 16.0 28 7.0 51 Total 110.0 147 67.0 324 Chap 2-33 Tabulating and Graphing Multivariate Categorical Data Side by side charts C omparing Investors (continued) S avings CD B onds S toc k s 0 10 20 30 40 50 60 Inves tor A Inves tor B Inves tor C Chap 2-34
Chapter 2 Student Lecture Notes 2-18 Side-by-Side Chart Example Sales by quarter for three sales territories: 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East 20.4 27.4 59 20.4 West 30.6 38.6 34.6 31.6 North 45.9 46.9 45 43.9 60 50 40 30 20 East West North 10 0 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr Chap 2-35 Dot Plot A One of the simplest methods for graphing and understanding quantitative data is to create a dot plot. A horizontal axis shows the range of values for the observations. Each data point is represented by a dot placed above the axis. Chap 2-36
Chapter 2 Student Lecture Notes 2-19 Dot Plot Dot plots can help us detect outliers (also called extreme values) in a data set. Outliers are the values that are extremely large or extremely small with respect to the rest of the data values. Chap 2-37 Dot Plot Example : The following table lists the number of runs batted in (RBIs) during the 2004 Major League Baseball playoffs by members of the Boston Red Sox team with at least one at-bat. Create a dot plot for these data. Chap 2-38
Chapter 2 Student Lecture Notes 2-20 Dot Plot Step 1. First we draw a horizontal line that includes the minimum and the maximum values in this data set. Step 2. Place a dot above the value on the numbers line that represents each RBI listed in the table Chap 2-39 Stem and Leaf Diagram Another simple way to see distribution details from qualitative data METHOD 1. Separate the sorted data series into leading digits (the stem) and the trailing digits (the leaves) 2. List all stems in a column from low to high 3. For each stem, list all associated leaves Chap 2-40
Chapter 2 Student Lecture Notes 2-21 Example: Data sorted from low to high: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Here, use the 10 s digit for the stem unit: 12 is shown as 35 is shown as Stem Leaf 1 2 3 5 Chap 2-41 Example: Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 28, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Completed Stem-and-leaf diagram: Stem Leaves 1 2 3 7 2 1 4 4 6 7 8 3 0 2 5 7 8 4 1 3 4 6 5 3 8 Chap 2-42
Chapter 2 Student Lecture Notes 2-22 Using other stem units Using the 100 s digit as the stem: Round off the 10 s digit to form the leaves Stem Leaf 613 would become 6 1 776 would become 7 8... 1224 becomes 12 2 Chap 2-43 Line Charts and Scatter Diagrams Line charts show values of one variable vs. time Time is traditionally shown on the horizontal axis Scatter Diagrams show points for bivariate data one variable is measured on the vertical axis and the other variable is measured on the horizontal axis A trend line is a line that provides an approximation of that relationship. Chap 2-44
Inflation Rate (%) Chapter 2 Student Lecture Notes 2-23 Line Chart Example Year Inflation Rate 1985 3.56 1986 1.86 1987 3.65 U.S. Inflation Rate 1988 4.14 1989 4.82 1990 5.40 6 1991 4.21 5 1992 3.01 4 1993 2.99 1994 2.56 3 1995 2.83 2 1996 2.95 1 1997 2.29 0 1998 1.56 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 1999 2.21 2000 3.36 Year 2001 2.85 2002 1.59 2003 2.27 2004 2.68 2005 3.39 2006 3.24 Chap 2-45 Scatter Diagram Example Volume per day Cost per day 23 125 26 140 29 146 33 160 38 167 42 170 50 188 55 195 60 200 Chap 2-46
Chapter 2 Student Lecture Notes 2-24 Types of Relationships Linear Relationships Y Y X X Chap 2-47 Types of Relationships (continued) Curvilinear Relationships Y Y X X Chap 2-48
Chapter 2 Student Lecture Notes 2-25 Types of Relationships (continued) No Relationship Y Y X X Chap 2-49 Cross-tabulation Example: Draw a scatter diagram for the following data which lists the total amount spent in KD by costumers in a restaurant. x (person) 1 1 2 2 3 3 4 4 5 5 y (KD) 8 7 14 18 20 22 21 26 29 33 Chap 2-50
Chapter 2 Student Lecture Notes 2-26 Cross-tabulation A cross tab is a tabular summary of data of two variables. They are usually presented in a matrix format. Not like a frequency distribution (one variable). A contingency table describes the distribution of two or more variables simultaneously. Each cell shows the number of respondents that gave a specific combination of responses It can be used with any level of data (What are they?) Chap 2-51 Cross-tabulation example Example: In a survey of the quality rating and the meal price conducted by a consumer restaurant review agency, the following table was produced: Restaurant Quality rating Meal Price 1 Good 18 2 Very Good 22 3 Good 28 4 Excellent 38 5 Very Good 33 6 Good 28... Chap 2-52
Chapter 2 Student Lecture Notes 2-27 Cross-tabulation example Quality rating is a qualitative variable with the rating categories of good, very good and excellent Chap 2-53 Cross-tabulation example Also, we can find the row percentage Chap 2-54
Chapter 2 Student Lecture Notes 2-28 Cross-tabulation example Dividing the totals in the right margin of the cross tab by the grand total provides relative and percentage frequency distribution for the quality rating variable. Chap 2-55 Cross-tabulation example Try it for the meal price (column totals) Chap 2-56
Chapter 2 Student Lecture Notes 2-29 Cross-tabulation example Example: The following data are for 30 observations involving two qualitative variables x (A, B and C) and y (1 and 2). Obs. x y Obs. x y Obs. x y 1 A 1 11 A 1 21 C 2 2 B 1 12 B 1 22 B 1 3 B 1 13 C 2 23 C 2 4 C 2 14 C 2 24 A 1 5 B 1 15 C 2 25 B 1 6 C 2 16 B 2 26 C 2 7 B 1 17 C 1 27 C 2 8 C 2 18 B 1 28 A 1 9 A 1 19 C 1 29 B 1 10 B 1 20 B 1 30 B 2 1- Construct a cross tabulation for the data 2- Calculate the row percentages Chap 2-57 Cross-tabulation example Chap 2-58
Chapter 2 Student Lecture Notes 2-30 Cross-tabulation example Chap 2-59 Chapter Summary Data in raw form are usually not easy to use for decision making -- Some type of organization is needed: Table Graph Techniques reviewed in this chapter: Frequency Distributions, Histograms, and Ogives Bar Charts and Pie Charts Stem and Leaf Diagrams Line Charts and Scatter Diagrams Chap 2-60
Chapter 2 Student Lecture Notes 2-31 Copyright The materials of this presentation were mostly taken from the PowerPoint files accompanied Business Statistics: A Decision-Making Approach, 7e 2008 Prentice-Hall, Inc. Chap 2-61