Chapter 4 Displaying and Describing Categorical Data Chapter Goals Learning Objectives This chapter presents three basic techniques for summarizing categorical data. After completing this chapter you should be able to: Create a pivot table in Excel Interpret frequency tables for categorical variables Construct, customize and interpret bar charts and pie charts Construct and interpret contingency tables by making segmented and cluster bar charts Now that we ve learned how to collect data through sampling (Chapter 3), we ll learn how to summarize and interpret categorical data. In the coming chapters, we ll learn different analytical tools to deal with quantitative data. We summarize categorical data with counts and percents. Once we have a data set that has been summarized (say with a frequency, relative frequency, or contingency table), Excel makes bar and pie charts. Sometimes data is presented in the summarized format. If not, we can make a pivot table. 4.1 Summarizing a Categorical Variable The three rules for Data Analysis are: make a picture, make a picture, make a picture. Ok, let s begin. In Chapter 15, we will find data that was collected in the Chicago Female Fashion Study to determine characteristics of frequent shoppers at different department stores in the Chicago area. One of the characteristics measured was Age. Open the workbook Market_segmentation.xls from the Chapter 15 datasets. Notice that the data is in two columns that only contain the labels for two categorical variables, Age and Shopping Frequency. This worksheet is an example of un-summarized data, or stacked data. To analyze it, we need to make frequency or contingency tables. These tables are also called pivot tables in Excel. 4.1.1 Frequency Tables with Excel Excel calls frequency tables Pivot Tables. We ll make a pivot table for Age. From the Insert Tab choose the Pivot Table button from the Tables group on the far left of the ribbon. 20
4.1 Summarizing a Categorical Variable 21 Click on the radial button next to Select a table or range, and type in A1:B1001 to specify the data range. I ve chosen to have the pivot table appear in a New Worksheet so I ve selected that radial button. Click OK. The new worksheet opens and looks a bit threatening. As shown in the picture below, the left side of the worksheet area has field areas where you can drag and drop your variable names to create the frequency table or contingency table (discussed in Section 4.3). [Note: If your table does not look like mine, you can fix that by right clicking on the
22 Chapter 4 Displaying and Describing Categorical Data table that does appear, and selecting Pivot Table Options. Now click on the Display tab and choose Classic Table Layout.] The right side of the worksheet area is called the Pivot Table Field List. You can drag and drop the names of the variables into these boxes as well. Drag and drop the variable Age from the Pivot Table Field List on the right into the table on the left in the box marked Drop Row Fields Here. Drag and drop the variable Age one more time into the table on the left into the box marked Drop Value Fields Here. Notice that the Pivot Table Field List has been updated and should look like the screen shot to the right. The pivot table, or frequency table, shows just the counts for each category. It is important to also report the relative frequency (although we can do that in our heads easily for this example). Drag and drop the variable Age from the Pivot Table Field List into the bottom right box in the same window titled Values. The resulting configuration of the Pivot Table Field List should look like the screenshot to the right, and you should now have two age columns in your pivot table. Click on the cell in the left pivot table that says Count of Age2. From the Pivot Table Tools Tab, choose the Options sub tab, and click on the Calculations pull down menu. Choose the Summarize Values By button and select More Options A dialog box titled Value Field Settings opens. The Custom Name field should read Count of Age2. We ll rename that.
4.1 Summarizing a Categorical Variable 23 In the field for Custom Name, type in something descriptive, like Percentage Age. Select the Show Values As tab and in the pull down menu choose % of Grand Total as shown in the screenshot to the right. Click OK. Your pivot table on the left should now look like the following: We see that the largest group of shoppers was between the ages of 25 and 44. These women made up 54.8% of the whole group surveyed. 4.1.2 Frequency tables with XLSTAT 1 Making a frequency table in XLSTAT is a bit easier. From the worksheet, choose the Visualizing data toolbar from the XLSTAT tab as we did in Chapter 2. Choose the menu option Univariate plots as shown to the right. First begin with the General tab in the dialog box that opens. This time we have Qualitative data so make sure the box is checked next to that option (and unchecked from the Quantitative option) as in the screenshot to the right. Our Age data are in cells A1:A1001. You can type in the cells directly, or click in the appropriate field and then select column A in the dataset. 1 Do you not see the XLSTAT tab? If not, review the troubleshooting tips given in Chapters 2 and 3 of this manual. An easy way to avoid this problem is to always open Excel through the XLSTAT application.
24 Chapter 4 Displaying and Describing Categorical Data We ll use the Sheet option again and make sure the box next to Sample labels is checked. Now click on the Charts(2) tab. Choose Relative Frequencies to display percents, and select both bar charts and Pie charts. Click OK. The screenshot is shown to the right. A new worksheet will open titled Uni1. It contains our frequency table and charts (which we will need later in this chapter). A portion of the output is shown in the screenshot to the right. I found this table on the far right of the table titled Descriptive Statistics. It displays both the frequency and relative frequency tables for Age. 4.2 Displaying a Categorical Variable The main types of charts for categorical data that we will make in this class are bar charts, pie charts and segmented bar charts (this last chart will be covered in Section 4.4). Charts are essential for understanding and interpreting our data. Remember, make a picture, make a picture, make a picture! 4.2.1 Bar Charts using Excel In Excel, it is easy to make charts from pivot tables. Click anywhere in our pivot table created in Section 4.1. From the Pivot Table Tools Tab, choose the Options sub tab, and click on the Pivot Chart button in the Tools group. In the dialog window, choose the top left chart option for the simplest bar chart as shown on the next page. Click OK.
4.2 Displaying a Categorical Variable 25 The chart that displays contains both our columns. Choose to either display the chart with counts or with percentages, but not both. The chart with our percentages is always the better choice. Right-click on the Count of Age button on the upper left to choose Remove Field. The chart shows the same info as our relative frequency table, but in a visual form that satisfies the Area Principle. Here we can see how dominant the age group 25 to 44 was in the survey, and that the other three groups are more equally distributed. 4.2.2 Pie Charts using Excel Pie charts automatically display the relative frequency distribution of a categorical variable. Here we will use Excel to make pie charts from the data in our pivot table. As we did to make a bar chart, we click anywhere in our pivot table created in Section 4.1. From the Pivot Table Tools Tab, choose the Options sub tab, and click on the Pivot Chart button in the Tools group.
26 Chapter 4 Displaying and Describing Categorical Data In the dialog window, choose the first chart option in the Pie group for the simplest pie chart as shown on the previous page. Click OK. The chart that pops up on your screen is nice, but we can change the layout and style to add important information. Click on the chart anywhere to access the Pivot Chart Tools tab. Under this tab are three sub tabs, Design, Layout and Format. Choose the Design tab. The Chart Layouts group now becomes available. It contains many styles. I ve chosen the third one on the second row to include percentages on the pie, and a legend describing the categories. The modified picture is shown below. 4.2.3 Bar and Pie Charts using XLSTAT We actually made these charts already, back in Section 4.1.2! If you review your output in the sheet Uni1 and scroll down, you will find a bar and pie chart that look just like the ones we made in Excel. Since these are Excel charts, you can modify them just as we did in the previous sections. To do so, click anywhere on the chart you wish to modify to access the Chart Tools tab. 4.3 Exploring Two Categorical Variables: Contingency Tables Contingency tables summarize the relationship between two categorical variables, or differences between subgroups of a population. Excel calls these tables pivot tables again. In the last several sections, we have looked at the age distribution of women in a market research survey. We might want to know if the frequency that women shop has some relationship to their ages. 4.3.1 Contingency Tables and Segmented Bar Charts using Excel Segmented bar charts compare the distributions of shopping frequency between the different age groups by displaying the conditional distribution of Frequency for each Age group. Return to the pivot table worksheet.
4.3 Exploring Two Categorical Variables: Contingency Tables 27 In the Pivot Table Field List check the boxes next to both variables Age and Shopping Frequency. Drag and drop the variable name Age into the pivot table region on the left into the box marked Drop Row Fields Here. Drag and drop the variable name Shopping Frequency into the pivot table in the box marked Drop Column Fields Here. Once more, drag and drop the variable Age into the center of the pivot table marked Drop Value Fields Here. The Pivot Table Field List now looks the screen shot to the right and the pivot table gives the contingency table for the counts of individual women in each of 16 different categories. We ll rename the table and ask for the results in conditional percentages. Right-click on the upper left corner of the pivot table in the cell titled Count of Age. Choose the option Value Field Setttings. (We came to this dialog box before through the Pivot Table Tools tab by choosing the Options tab and then the button Calculations.) See the picture to the right. The dialog box can be used as before. We can rename the table I ve chosen the name Frequency by Age. This time under the tab Show Values As, choose % Row Total as our calculation.
28 Chapter 4 Displaying and Describing Categorical Data We are now ready to make a chart. Click anywhere in the pivot table. From the Pivot Table Tools Tab, choose the Options sub tab, and click on the Pivot Chart button in the Tools group. In the dialog window, choose the third chart option in the Column group for the simplest segmented bar chart. Click OK. What this chart shows are the different distributions of Shopping Frequency by Age. Two observations can be made from this chart. The percentage of women shopping Never/ Hardly ever declines in the older women (ages 45 and up). Also, the percentage of women in the 25-44 age group who shop 5 or more times per year is markedly lower than in the other age groups. These are observations that are easier to spot once we have the segmented bar charts, and not so easy to spot with numbers in a table. 4.3.2 Cluster bar charts using Excel Another comparison of conditional percentages can be made with a side-by-side or cluster bar chart. Since Excel s cluster bar charts do not automatically compute percentages (as in the segmented bar chart in Section 4.3.1), we have to compute the conditional percentages explicitly. No problem! Our pivot table already shows those percentages. Click anywhere in the pivot table. From the Pivot Table Tools Tab, choose the Options sub tab, and click on the Pivot Chart button in the Tools group. In the dialog window, choose the first chart option in the Column group for the simplest cluster bar chart. Click OK.
4.3 Exploring Two Categorical Variables: Contingency Tables 29 4.3.3 Contingency tables with XLSTAT Return back to the original workbook with the data. From the XLSTAT tab choose the Preparing data Toolbar and choose the command Create a contingency table. In the dialog box that opens, first type in our variable cells under the General tab. Our row variable is Age so type in A1:A1001, or click in the field and then on the column in the worksheet to select it. Our column variable will be Shopping Frequency so type in B1:B1001 there, or select the column. Complete the box as shown in the screenshot. Now choose the Outputs tab. Select Contingency table and Percentages/Row as shown above. Click OK. A new worksheet will open titled Contingency Table. In it are our results. See the screenshot below.
30 Chapter 4 Displaying and Describing Categorical Data We have considered four different kinds of charts in this chapter. * The basic bar chart and pie chart from Section 4.2 summarize one categorical variable * The segmented and cluster charts from Section 4.3 summarize the relationship between two categorical variables. When using Excel, the advantage to the pie chart and segmented bar chart is that Excel automatically presents results using percentages, not just counts. In both Section 4.1 and 4.3, we did a separate step to recalculate and view those counts in terms of percentages. We didn t have to do that if we were only intending to make those types of charts.