Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics

Similar documents
Based on Chapter 11, Excel 2007 Dashboards & Reports (Alexander) and Create Dynamic Charts in Microsoft Office Excel 2007 and Beyond (Scheck)

Diagrams and Graphs of Statistical Data

Visualization Quick Guide

Using Excel for descriptive statistics

R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol

Statistics Revision Sheet Question 6 of Paper 2

a. mean b. interquartile range c. range d. median

This file contains 2 years of our interlibrary loan transactions downloaded from ILLiad. 70,000+ rows, multiple fields = an ideal file for pivot

Principles of Data Visualization

CSU, Fresno - Institutional Research, Assessment and Planning - Dmitri Rogulkin

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

"Excel with Excel 2013: Pivoting with Pivot Tables" by Venu Gopalakrishna Remani. October 28, 2014

Analysis One Code Desc. Transaction Amount. Fiscal Period

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Visualizing Data from Government Census and Surveys: Plans for the Future

MetroBoston DataCommon Training

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015

Data Exploration Data Visualization

Interpreting Data in Normal Distributions

GRAPHING DATA FOR DECISION-MAKING

Exploratory data analysis (Chapter 2) Fall 2011

Data Visualization. BUS 230: Business and Economic Research and Communication

Get to the Point HOW GOOD DATA VISUALIZATION IMPROVES BUSINESS DECISIONS

Data Visualization Handbook

Intro to Statistics 8 Curriculum

Box Plots. Objectives To create, read, and interpret box plots; and to find the interquartile range of a data set. Family Letters

Examples of Data Representation using Tables, Graphs and Charts

Infographics in the Classroom: Using Data Visualization to Engage in Scientific Practices

How To: Analyse & Present Data

Exercise 1: How to Record and Present Your Data Graphically Using Excel Dr. Chris Paradise, edited by Steven J. Price

(1) Organize the data

Northumberland Knowledge

Assignment 4 CPSC 217 L02 Purpose. Important Note. Data visualization

SPSS Manual for Introductory Applied Statistics: A Variable Approach

Data Visualization. Introductions

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

Intro to Excel spreadsheets

If the World Were Our Classroom. Brief Overview:

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Summarizing and Displaying Categorical Data

Information Literacy Program

Numbers as pictures: Examples of data visualization from the Business Employment Dynamics program. October 2009

Module 2: Introduction to Quantitative Data Analysis

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus

VisualCalc AdWords Dashboard Indicator Whitepaper Rev 3.2

Data exploration with Microsoft Excel: univariate analysis

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Excel 2007 Charts and Pivot Tables

OBI 11g Data Visualization Best Practices

Data Interpretation QUANTITATIVE APTITUDE

Statistics Chapter 2

Spreadsheet. Parts of a Spreadsheet. Entry Bar

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Effective Big Data Visualization

Foundation of Quantitative Data Analysis

Introduction to Dashboards in Excel Craig W. Abbey Director of Institutional Analysis Academic Planning and Budget University at Buffalo

Intermediate PowerPoint

Directions for Frequency Tables, Histograms, and Frequency Bar Charts

GUIDELINES FOR PREPARING POSTERS USING POWERPOINT PRESENTATION SOFTWARE

TABLEAU COURSE CONTENT. Presented By 3S Business Corporation Inc Call us at : Mail us at : info@3sbc.com

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

How to make a line graph using Excel 2007

Data Visualization Techniques

Chapter 1: Exploring Data

Life Insurance Companies Mortality Analysis. Overview

Use Cases and Design Best Practices

Advanced Microsoft Excel 2010

Data Visualization Basics for Students

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Sometimes We Must Raise Our Voices

MARS STUDENT IMAGING PROJECT

Data representation and analysis in Excel

Using INZight for Time series analysis. A step-by-step guide.

INF2793 Research Design & Academic Writing Prof. Simone D.J. Barbosa simone@inf.puc-rio.br sala 410 RDC. presentation

Case 2:08-cv ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8

DATA VISUALIZATION 101: HOW TO DESIGN CHARTS AND GRAPHS

Data Visualization Techniques

Statistical Analysis of. Manual Therapists Funded by ACC:

Create Charts in Excel

APES Math Review. For each problem show every step of your work, and indicate the cancellation of all units No Calculators!!

Excel Unit 4. Data files needed to complete these exercises will be found on the S: drive>410>student>computer Technology>Excel>Unit 4

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs


NASA Explorer Schools Pre-Algebra Unit Lesson 2 Student Workbook. Solar System Math. Comparing Mass, Gravity, Composition, & Density

Describing and presenting data

What is a Box and Whisker Plot?

Excel Chart Best Practices

Chapter 4 Creating Charts and Graphs

Quantitative Displays for Combining Time-Series and Part-to-Whole Relationships

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

Microsoft Excel 2010 Pivot Tables

Descriptive Statistics and Exploratory Data Analysis

CREATING EXCEL PIVOT TABLES AND PIVOT CHARTS FOR LIBRARY QUESTIONNAIRE RESULTS

Section 1.1 Exercises (Solutions)

Transcription:

Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics

Why visualize data? The human eye is extremely sensitive to differences in: Pattern Colors Format 2 3 2 2 4 5 6 7 8 9 2 1 1 1 1 4 2 6 3 2 3 6 8 9 6 4 1 1 1 1 4 5 6 7 2 3 8 9 3 5 9 1 1 1 1 4 5 2 3 5 6 7 8 9 8 7 1 1 1 1 3 2 2 4 5 6 7 8 8 9 0 1 1 1 1 2 3 2 2 4 5 6 8 9 2 1 1 1 1 4 2 6 3 2 3 6 8 9 6 4 1 1 1 1 4 5 6 2 3 8 9 3 5 9 1 1 1 1 4 5 2 3 5 6 8 9 8 3 1 1 1 1 3 2 2 4 5 6 8 8 9 0 1 1 1 1 Because of our amazing ability to decipher these differences instantly, representing complex data sets with data graphics is an efficient method to communicate what the numbers are saying. The visual display of quantitative information serves as a vehicle to traverse a complex data world. Graphics reveal data.

What is the best way to display the data? Let the data instruct you Do not have a pre-specified mode of displaying the data. Do whatever it takes to display data in the most appropriate way. Design should be content-driven not methodology driven.

CONTEXT, CONTEXT, CONTEXT! Put the data into a human context What are we comparing the data to? Previous rounds (historical context) Has the clinic performance rate improved over time? Other similar clinics How well is the clinic performing compared to other clinics: In the same district/province/region (geographic context) With the same caseload With the same resources Care Provided Documented Chart Selected Data Collected Data Analyzed Data Visualized Data Reported Data Interpreted Decisions Made

Graphical Excellence Have the audience in mind. What is the purpose of the graphic? Description, exploration Make large data sets coherent Reveal the data at several levels of detail Induce reader to think about the content, not the methodology Encourage eye to compare different pieces of data Spatial orientation, patterns, colors, formatting Avoid distortion of the data Axes, scaling, labeling Clear and easy to read Integrate words and numbers with graphics Tufte, Edward. The Visual Display of Quantitative Information. Connecticute, Graphic Press: 2001. Page 13.

Theory of Data Graphics Above all else show the data 1) Maximize data-ink ratio. I. Erase non-data-ink II. Erase redundant data-ink 2) Remove Chart Junk. I. Shadows II. 3D-rendering III. Other ornaments 3) Avoid Optical Vibration Before After Performane Rate Performane Rate 1 0.8 0.6 0.4 0.2 0 Clinical Visits Percentage of adult patients who had at least one visit in each half of the year 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 Clinic Clinical Visits Percentage of adult patients who had at least one visit in each half of the year 1 2 3 4 5 6 7 8 9 Clinic Tufte, Edward. The Visual Display of Quantitative Information. Connecticut, Graphic Press: 2001. Page 13.

120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Examples

Bar Charts Good for comparing a set of categorical values. Best when there are not too many categories and/or variables. 1 Clinical Visits Percentage of adult patients who had at least one visit in each half of the year Performane Rate 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 Clinic Tips: Organizing data from largest to smallest may be helpful in highlighting data. Keep it simple: do not use shadows or 3D rectangles.

Too many categories can make bar charts messy. When there are this many bars on a bar graph, make sure to ask yourself if it is contextually appropriate to compare all of the values on the bar chart. 100 90 Clinical visits (2011) Percentage of eligible adult patients who had at least one clinical visit in each half of the year. Performance Rate (%) 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Clinic

Too many variables per category can also make bar charts messy. Is it appropriate to compare all of the variables within a category? 100 90 80 Mean Clinic Scores by Indicator (2011) Performance Rate (%) 70 60 50 40 30 20 10 Clinical Visits TB Screening CTX Nutritional Assessment Prevention Education Alcohol Screening 0 A B C D E Clinic

Pie Charts Work well if you want to compare individual slices of the pie with the whole pie. It may be difficult to compare different sections of a given pie chart or to compare data across different pie charts. A bar chart (histogram or stack chart) or table may be more appropriate in that case.

Too many variables make a pie chart hard to manage. If the variables are numerical, consider using a histogram instead. You can also consider combining categories but remember that this could hide variation and alter how the data are interpreted. CD4 Count Distribution <50 51-100 101-150 151-200 201-250 251-300 301-350 351-400 451-500 501-550 551-600 601-650 651-700 701-750 751-800 801-850 851-900 901-950 951-1000 1000+ CD4 Count Distribution <50 51-100 101-200 201-250 400+

Tables Tables often work better than bar charts and pie charts when there are too many data points and too many descriptors of those data points. Many people may not consider this as a way to visualize data, but tables still use specific formatting and spatial orientation to communicate the data more easily. In terms of data ink, every piece of a table is critical information. However, tables may not be good at showing patterns over time. CD4 Monitoring Mean Clinic Scores Percentage of eligible patients who had at least one CD4 count during the review period

Table Formatting Tips Do not use gridlines. The space between the numbers visually separate categories. Underline the column headers Consider Zebra Striping: light shading to separate specific groups you want to highlight. Before After CD4 Monitoring Indicator Results Clinic Performance Rate Denominator A 60% 100 B 75% 150 C 50% 120 CD4 Monitoring Indicator Results Clinic Performance Rate (%) Denominator A 60 100 B 75 150 C 50 120

Line Charts Line charts work well to show trends over intervals of time (time series). The more data points, the better. Line charts show a continuous line even though data may be discrete. Tips: Use different colors to differentiate between different line. Remember that our eyes will naturally compare two different lines on the same chart. If two data points are not comparable, then maybe they should not be on the same graph. Label the lines directly on the chart instead of using a legend.

Line charts are very prone to distortion. 25 Percentage of eligible patients screened for tuberculosis Y Axis Scale: 0 to 25 Y Axis Scale: 0 to 100 100 Performance Rate (%) 20 15 10 5 Performance Rate (%) 75 50 25 0 Jan Feb Mar Apr May June 25 20 15 10 5 0 Jan Feb Mar Apr May June 0 Jan Feb Mar Apr May June Y Axis Scale: 15 to 20 Y Axis Scale: 0 to 25 Height > Width Performance Rate (%) Performance Rate (%) 20 19 18 17 16 15 Jan Feb Mar Apr May June

Box-and-whisker Plots Are a great way to compare different sets of data. Several different descriptive statistics can be compared: Max, min, upper quartile, median, lower quartile, range and interquartile range. Namibia Food Security Oct 10 - Mar 11 Jan - Jun 10 Jul - Dec 09 Jan - Jun 09 Jul - Dec 08 Review Period Jan - Jun 08 0 10 20 30 40 50 60 70 80 90 100 Performance Rate (%)

The next few examples illustrate how important labeling is. Labeling provides more context to the data, allowing for more rigorous and accurate interpretations of the data. Mortality Rate (# deaths / 1000 people/year) Mortality Rates of People Actively Playing Popular Sports in 2011 12 10 8 6 4 2 0 Soccer Rugby Cricket Golf Is playing golf more dangerous than other sports?

Mortality Rate (# deaths / 1000 people/year) 12 10 8 6 4 2 0 Mortality Rate of People Actively Playing Popular Sports in 2011 Average Age = 23 Average Age = 20 Average Age = 25 Average Age = 60 Soccer Rugby Cricket Golf

Performance Rate (%) 100 90 80 70 60 50 40 30 20 10 0 What can we conclude? Percent of Adults who received a TB assessment during the review period (Adult, 2008) Clinic A Clinic B Clinic C

Performance Rate (%) 100 90 80 70 60 50 40 30 20 10 0 Percent of Adults who received a TB assessment during the review period (Adult, 2008) n = 2 n = 150 n = 200 Clinic A Clinic B Clinic C Clinic C only has 2 eligible patients!

Write on Graphs: Use words, numbers and graphics in combinations Use words directly on graphs to provide more context. For example, on a clinic level run chart, use words and arrows to denote when a QI project was implemented. Here s an example from Namibia.

Graph/Table Combinations Graphs and tables can be utilized together. The table provides more context and detail while the graph reveals any patterns of the data. Here s an example using data form Uganda.

Sparklines: Intense, Simple, Word-Sized Graphics Invented by Edward Tufte, these powerful graphics add tremendously to the meaning of numbers. They provide context. For example, I can say that the current temperature is 30 degrees Celsius. However, if I include a sparkline that shows the weather during the previous 24 hours, it immediately puts that 30 degrees into context. The sparklines I showed in the previous slide show the spread of the data. Each little tick mark represents an individual clinic s score. The red mark is the mean of those scores. Since I oriented the spreads in the same column, I can quickly see how the spread changes from round to round.

Small Multiples When clinic level data are aggregated, detail at the clinic level is lost. Looking at longitudinal mean clinic scores, individual clinic trends cannot be extrapolated. There are several visualization techniques that encourage the eye to examine both clinic level and aggregate level patterns. Small multiples, a series of graphics that show the same combination of variables, is one such technique. Here is an example of what it would look like. Created by Jorge Camoes

Heat Maps Use color to encourage the eye to examine both clinic level and aggregate level patterns. In this example, each color represents a range of performance rates. The more red the color, the closer the performance rate is to 0%. The more green the color the closer the performance rate is to 100%. A B C D E F Jan Jun Jul Dec Jan Jun Jul Dec Jan Jun Mar Apr G Namibia Food Security Indicator Results Percentage of eligible adult patients assessed for food security by clinic and review period. Clinic H I J K L M N O P Key to Swatch Colors Rate (%) 0 to 10 11 to 20 21 to 30 31 to 40 41 to 50 51 to 60 61 to 70 71 to 80 81 to 90 91 to 100

Summary Context is essential for graphical integrity. Provide historical data when available. Label axes properly. Always provide denominators to percentages. Do whatever it takes to display the data in the best way with integrity and clarity. Data visualization should be content-driven not methodology driven Use combinations of words, numbers and graphics. Combine tables and charts together Creating an excellent data graphic takes time. Like good writing it requires revising and editing.