Sometimes We Must Raise Our Voices



Similar documents
Quantitative Displays for Combining Time-Series and Part-to-Whole Relationships

Principles of Data Visualization

Visualizing Multidimensional Data Through Time Stephen Few July 2005

Stephen Few, Perceptual Edge Visual Business Intelligence Newsletter November/December 2008

Quantitative vs. Categorical Data: A Difference Worth Knowing Stephen Few April 2005

Introduction to Geographical Data Visualization

Visualization Quick Guide

Dashboard Design for Rich and Rapid Monitoring

Common Mistakes in Data Presentation Stephen Few September 4, 2004

This file contains 2 years of our interlibrary loan transactions downloaded from ILLiad. 70,000+ rows, multiple fields = an ideal file for pivot

Numbers as pictures: Examples of data visualization from the Business Employment Dynamics program. October 2009

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin

GRAPHING DATA FOR DECISION-MAKING

Choosing Colors for Data Visualization Maureen Stone January 17, 2006

Copyright 2008 Stephen Few, Perceptual Edge Page of 11

Save the Pies for Dessert

Effective Big Data Visualization

Best Practices for Dashboard Design with SAP BusinessObjects Design Studio

Effectively Communicating Numbers

Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics

Criteria for Evaluating Visual EDA Tools

Top 5 best practices for creating effective dashboards. and the 7 mistakes you don t want to make

Data Visualization. BUS 230: Business and Economic Research and Communication

Expert Color Choices for Presenting Data

Chapter 6: Constructing and Interpreting Graphic Displays of Behavioral Data

Excel Chart Best Practices

top 5 best practices for creating effective campaign dashboards and the 7 mistakes you don t want to make

Unit Charts Are For Kids

Plotting Data with Microsoft Excel

STEPHEN FEW SHOW ME THE NUMBERS

Improve Your Vision and Expand Your Mind with Visual Analytics

Intro to Excel spreadsheets

(1) Organize the data

The basics of storytelling through numbers

Exploring All of Your Options

Common Tools for Displaying and Communicating Data for Process Improvement

Introduction to Dashboards in Excel Craig W. Abbey Director of Institutional Analysis Academic Planning and Budget University at Buffalo

Dashboard Design for Real-Time Situation Awareness. Stephen Few, author of Information Dashboard Design

Data Visualization Techniques

Problems With Using Microsoft Excel for Statistics

Data Visualization Handbook

East Coast Visual Business Intelligence Workshop

A Picture Really Is Worth a Thousand Words

The Power of Visual Analytics

Effective Visualization Techniques for Data Discovery and Analysis

INF2793 Research Design & Academic Writing Prof. Simone D.J. Barbosa simone@inf.puc-rio.br sala 410 RDC. presentation

Tableau Data Visualization Cookbook

This document contains Chapter 2: Statistics, Data Analysis, and Probability strand from the 2008 California High School Exit Examination (CAHSEE):

an introduction to VISUALIZING DATA by joel laumans

Good Graphs: Graphical Perception and Data Visualization

DATA VISUALIZATION 101: HOW TO DESIGN CHARTS AND GRAPHS

MARS STUDENT IMAGING PROJECT

Representing Data with Frequency Graphs

Visualization Software

Analyzing Experimental Data

Using Excel for descriptive statistics

BANA6037 Data Visualization Fall Semester 2014 (14FS) / First Half Session Section 001 S 9:00a- 12:50p Lindner 107

ABSTRACT INTRODUCTION DATA FEEDS TO THE DASHBOARD

Plots, Curve-Fitting, and Data Modeling in Microsoft Excel

Briefing document: How to create a Gantt chart using a spreadsheet

Get to the Point HOW GOOD DATA VISUALIZATION IMPROVES BUSINESS DECISIONS

Visualizing Data from Government Census and Surveys: Plans for the Future

Chapter 4 Creating Charts and Graphs

Graphing Information

The North Carolina Health Data Explorer

Precise Leads White Paper. 6 Best Practices: How Insurance Agents Can Use the Web to Build Long-term Customer Relationships

Interactive Voting System. IVS-Basic IVS-Professional 4.4

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Scatter Plots with Error Bars

Ingo Hilgefort, Visual BI Best Practices for Dashboard Design with SAP BusinessObjects Design Studio Session 2684

Use Cases and Design Best Practices

Choosing a successful structure for your visualization

Basic Understandings

Gestation Period as a function of Lifespan

Unresolved issues with the course, grades, or instructor, should be taken to the point of contact.

Data Visualization for the Practitioner

START TIME: 12 NOON Eastern

Get The Picture: Visualizing Financial Data part 1

Designing Information Displays. Overview

Dashboard Design: Beyond Meters, Gauges, and Traffic Lights

Data Visualization. Introductions

DATA VISUALIZATION PAST, PRESENT, AND FUTURE STEPHEN FEW PERCEPTUAL EDGE

a white paper Marketing in a Mobile World

Atomic Force Microscope and Magnetic Force Microscope Background Information

Regulatory Compliance Needs Process Management

5 Tips for Creating Compelling Dashboards

Demographics of Atlanta, Georgia:

Create a Data Visualization Cookbook Using Excel. Yueyue Fan, Ethan Pan, Cheryl M. Ackerman

SAS BI Dashboard 3.1. User s Guide

Exercise 1: How to Record and Present Your Data Graphically Using Excel Dr. Chris Paradise, edited by Steven J. Price

Infographics in the Classroom: Using Data Visualization to Engage in Scientific Practices

Create Charts in Excel

Creating Bar Charts and Pie Charts Excel 2010 Tutorial (small revisions 1/20/14)

Data Visualization. or Graphical Data Presentation. Jerzy Stefanowski Instytut Informatyki

Exploratory Data Analysis with R

Transcription:

perceptual edge Sometimes We Must Raise Our Voices Stephen Few, Perceptual Edge Visual Business Intelligence Newsletter January/February 9 When we create a graph, we design it to tell a story. To do this, we must fi rst fi gure out what the story is. Next, we must make sure that the story is presented simply, clearly, and accurately, and that the most important parts will demand the most attention. When we communicate verbally, there are times when we need to raise our voices to emphasize important points. Similarly, when we communicate graphically, we must fi nd ways to make the important parts stand out visually. My original thinking about graph design was formed almost entirely by the work of Edward Tufte. I owe him not only for the formative development of my knowledge, but also for inspiring me to pursue this line of work in the fi rst place. I left his one-day seminar over years ago with my mind ablaze and my heart beginning to nourish the kernel of an idea that eventually grew into my current profession. Even after many years of working in the fi eld of data visualization, which has involved a great deal of experience and study that has expanded my expertise into many areas that Tufte hasn t specifi cally addressed, I have only on rare occasions discovered reasons to disagree with any of his principles. The topic that I m addressing in this article, however, deals with one of those rare disagreements. The Data-Ink Ratio In the book The Visual Display of Quantitative Information (193), Tufte s fi rst and perhaps fi nest, he introduced the concept of the data-ink ratio. It s a concept that I teach in my books and courses one with which I heartily agree but for one aspect, which I think Tufte took a bit too far. Here s the concept, as Tufte originally defi ned it: A large share of ink on a graphic should present data-information, the ink changing as the data change. Data-ink is the non-erasable core of a graphic, the non-redundant ink arranged in response to variation in the numbers represented. Then, Data-ink ratio = data-ink total ink used to print the graphic = proportion of a graphic s ink devoted to the non-redundant display of data-information = 1. proportion of a graphic that can be erased without loss of data-information. (The Visual Display of Quantitative Information, Edward R. Tufte, Graphics Press, Cheshire CT, 193, p.93) Minimizing Non-Data Ink Two principles of graph design are closely attached to this concept. The fi rst principle is: Erase non-data ink, within reason. (ibid, p. 9) I embrace this principle wholeheartedly. Any form of decoration and even some standard components of a graph (for example, axis lines) do not represent data, and therefore fi t into the category of non-data ink. Any non-data ink that serves no meaningful and necessary purpose should be erased, for it can only distract from the data. On the other hand, non-data ink that supports the display of data in useful ways, such as the ink that forms a graph s axis lines, should remain, but its visual salience should be reduced until it s visible enough to do its job but not so visible that it competes with the data for attention. Copyright 9 Stephen Few, Perceptual Edge Page 1 of 9

Minimizing Data-Ink The second principle is: Erase redundant data-ink, within reason. (ibid, p. 9) Regarding this principle, although Tufte and I agree in spirit, we part ways in degree. We draw the line differently between redundant data-ink that is useful and should therefore remain, and that which is not and should therefore be removed. Tufte explained this principle using the following example: 35.9 Redundant data-ink depicts the same number over and over. The labeled, shaded bar of the bar chart, for example, unambiguously locates the altitude in six separate ways (any fi ve of the six can be erased and the sixth will still indicate the height): as the (1) height of the left line, () height of shading, (3) height of right line, () position of top horizontal line, (5) position (not content) of number at bar s top, and () the number itself. (ibid, p. 9) ly, this single bar only encodes its value of 35.9 in one rather than six ways the text label that appears above the bar for without other bars to provide points of comparison, all attributes of the bar that rely on its height are meaningless. If we assume, however, that more than one bar like this appears in a graph, then it is true that these six attributes redundantly represent a single value. Even so, must we conclude that all but one should be removed? I believe that some redundancy in a value s graphical display is useful. Tufte further made his case using several examples of graphs with redundant data ink, which he redesigned to remove redundancy. Below, I ve recreated one of his examples: a bar graph. The one on the left is fairly typical and the one on the right is Tufte s improved version. (In my recreation of Tufte s example below, I replaced the data labels with some that are more familiar (actual, budget, etc.), and reduced it to a portion of the original.) Notice that each vertical bar on the left has been reduced to a single vertical line on the right. Copyright 9 Stephen Few, Perceptual Edge Page of 9

Notice that by removing the vertical lines that divide the regions something that the white space between the regions alone does quite well avoidable clutter has been eliminated. I believe that the other reductions, however, have gone too far. Before I make my case, it s worth noting that Tufte s redesign could have been pared down even more. If the original bar graph didn t include confi dence intervals (the short vertical line at the top of each data bar, a.k.a. error bars), then Tufte s version could have been further reduced to something like this: Even now, we re not fi nished, for one more redundancy reduction can be made. In the spirit of Tufte s intention, we could reasonably argue that proximity alone is enough to link the actual and budget pairs of bars, so we could erase the horizontal lines that connect them at the bottom, and we could also get by with showing the and labels only once, resulting in the following: We could, but should we? I don t think so. Speaking Above a Whisper I believe that data-ink usually requires some redundancy for a graph to be easily read and used to make comparisons. Also, I believe that by reducing the size and visual weight of data-encoding objects, such as these bars, we force our eyes to work too hard, which slows us down. Based on several small, minimalist examples in his books, I suspect that Tufte has incredibly good eyesight certainly better than mine, even with my glasses. Most of us benefi t from making text and images a little larger than the minimum that the telecommunications industry decided long ago was acceptable for phone books. Copyright 9 Stephen Few, Perceptual Edge Page 3 of 9

Here s the same actual versus budget information, this time displayed in a way that is easier and faster to read. The message hasn t been altered in any way, but it s more clearly represented. 1 Now, even if the graph were made smaller, we could still easily read and compare the values. The width of the bars gives them additional visual weight without forcing the graph to take up more space on the page and without forcing our eyes to process meaningless visual content. Let s now look at the other examples of ink (both data-ink and non-data-ink) that have been added to this graph to determine if they re useful. First, let s consider the additional numbers along the Y-axis to label the quantitative scale. Tufte s design proposed that the 1-value scale ( through 13) would work as well with labels for only three values:,, and. These numbers are data-ink they provide information but Tufte felt that we could easily fi ll in the missing numbers based on the three that are there, so he deemed the others redundant. It s certainly true that we can figure out the missing numbers, but which is more costly: the existence of more numbers along the scale or the extra time and effort that s required to mentally fi ll them in? I believe that by labeling the beginning and end of the scale (values and 1) and including every other number (,,, etc.), the scale can be read and understood more effi ciently. I chose to only include every other number along the scale because including them all would have produced visual clutter, drawing undue attention to the Y-axis. I believe this design choice results in a better balance between too much and too little information, making the graph as easy as possible to read and interpret without adding so much visual content along the axis to cause visual distraction. What about the axis lines themselves are they useful? I should mention that even Tufte put the X-axis back on his graph after removing it, even though it wasn t strictly necessary, because the data bars seemed to fl oat uncomfortably in space without them. So the only difference that remains regarding axes is whether the vertical Y-axis line is useful. Similar to the odd appearance of data bars when they lack a baseline to ground them at zero, I fi nd that tick marks look odd without an axis line to anchor them, and I like the container for the graph s plot area (the data region) that s formed by the framework of both axes. Whether an axis line that joins the tick marks along a quantitative scale actually assists us or not while reading and interpreting a graph, I m confi dent that a thin light-gray line of non-data ink won t hinder the process. I like the way the actual and budget bars are labeled in this graph, because it eliminates the need for a legend, which defi nitely includes redundancy that we should avoid whenever possible. Unfortunately, we don t always have the space that s required for labels of this type (that is, without resorting to vertically-oriented text, which is hard to read), nor do the products that we use to produce graphs always give us the option of including two Copyright 9 Stephen Few, Perceptual Edge Page of 9

levels of labels along a single axis (in this case, regions and the distinction between actual and budget). When we can t label data objects in this manner, a legend is often unavoidable, as illustrated below. I created my version of the graph using Excel, which wouldn t allow me to place the actual and budget labels directly under the bars (without playing a trick, that is), so a legend was my only labeling option. The small colored squares next to the actual and budget labels are a repetition of the colors that already appear in the bars, and thus redundant, but even if we stick to Excel s limitations, we can at least make it relatively easy to associate the labels with the bars by arranging them side-by-side in the same order as the bars, rather than above and below one another to the right of the graph, which is Excel s default. 1 To further appreciate the usefulness of data-ink with greater than minimal visual weight, look at the following line graph of actual versus budget data by quarter. 1 Qtr 1 Qtr Qtr 3 Qtr We can certainly see the thin actual and budget lines, but when they re rendered this small, we must work harder than necessary to differentiate them, even though their colors are the same two shades of gray that appear in the previous bar graphs. By increasing the stroke weight of the lines in the following example, we ve increased the amount of ink that s being used to display the data, but to a degree that does no harm while clearly making it easier to read and compare the two lines. Copyright 9 Stephen Few, Perceptual Edge Page 5 of 9

1 Qtr 1 Qtr Qtr 3 Qtr We could certainly take this too far, making the lines so thick that we can no longer read and compare them accurately, as I ve done below, but it isn t diffi cult to strike a useful balance between too little and too much data-ink. 1 Qtr 1 Qtr Qtr 3 Qtr Shouting to Emphasize What s Important Not all information is created equal. This is certainly the case when we re trying to communicate a particular message. Key points might need to be emphasized to get the message across. When presenting information graphically, this emphasis must be created visually to express the equivalent of a raised voice and more precise enunciation. In the following graph, which displays a ranked list of the countries with the highest per capita gross domestic product (GDP), each country is displayed in precisely the same manner. Copyright 9 Stephen Few, Perceptual Edge Page of 9

Countries Ranked by GDP per Capita in USD k k k k k k k Liechtenstein Qatar Luxembourg Bermuda Kuwait Norway Jersey Brunei Singapore United States (Source: The World Factbook, published by the Central Intelligence Agency (CIA), based on data collected between and ) This graph tells the story clearly, if you don t wish to focus your audience s attention on any particular country. If, however, you wish to emphasize the fact that the United States comes in last among the top countries when ranked by GDP, it would be helpful to increase the salience of parts or all of the items that represent the United States. Here s an example that increases the salience of the bar, its label, and adds redundant data-ink to display the value as text. Countries Ranked by GDP per Capita in USD k k k k k k k Liechtenstein Qatar Luxembourg Bermuda Kuwait Norway Jersey Brunei Singapore United States $, By leaving everything else alone and increasing the salience of the United States only, we ve raised our voices in an appropriate and useful manner. Copyright 9 Stephen Few, Perceptual Edge Page 7 of 9

Let s look at one fi nal example. The scatterplot below has been designed to tell the story of the strong correlation that exists between the amount of time that salespeople spend on sales calls with customers and the resulting number of sales. Number of Orders There is a strong correlation between the amount of time salespeople interact with customers and the number of orders that customers place. 9 7 5 3 3 5 7 Hours of Sales Interaction (Dark data points represent last year s top customers based on the number of orders placed. The largest data point represents last year s top customer.) A key point is being emphasized in this scatterplot to further illustrate the strength of the correlation: the fact that high sales in the past don t guarantee ongoing high sales, which is demonstrated by the fact that the previous year s top customers were not necessarily the current year s top customers, but instead moved to positions in the current year that were strongly related to the amount of time that they received from the sales team. Without distinguishing the prior year s top customers through extra visual salience, achieved in this case by increasing the data-ink associated with those values (by adding fi ll color, and for the top-ranked customer, by increasing its size as well), this aspect of the story would not have been presented clearly. In Summary While data-ink should never be arbitrarily increased beyond the minimum that s required, we shouldn t hesitate to increase it to a degree that will improve a graph s ability to tell its story clearly and in a way that reduces the audience s effort to read and understand that message to the minimum. Like musicians, we must fi nd the right volume to optimize the audience s listening experience and the right dynamics of soft and loud to touch their hearts and minds in the ways we intend. Copyright 9 Stephen Few, Perceptual Edge Page of 9

About the Author Stephen Few has worked for over 5 years as an IT innovator, consultant, and teacher. Today, as Principal of the consultancy Perceptual Edge, Stephen focuses on data visualization for analyzing and communicating quantitative business information. He provides training and consulting services, writes the monthly Visual Business Intelligence Newsletter, speaks frequently at conferences, and teaches in the MBA program at the University of California, Berkeley. He is the author of two books: Show Me the Numbers: Designing Tables and Graphs to Enlighten and Information Dashboard Design: The Effective Visual Communication of Data. You can learn more about Stephen s work and access an entire library of articles at www.perceptualedge.com. Between articles, you can read Stephen s thoughts on the industry in his blog. Copyright 9 Stephen Few, Perceptual Edge Page 9 of 9