Part 2: Data Visualization How to communicate complex ideas with simple, efficient and accurate data graphics
Why visualize data? The human eye is extremely sensitive to differences in: Pattern Colors Format 2 3 2 2 4 5 6 7 8 9 2 1 1 1 1 4 2 6 3 2 3 6 8 9 6 4 1 1 1 1 4 5 6 7 2 3 8 9 3 5 9 1 1 1 1 4 5 2 3 5 6 7 8 9 8 7 1 1 1 1 3 2 2 4 5 6 7 8 8 9 0 1 1 1 1 2 3 2 2 4 5 6 8 9 2 1 1 1 1 4 2 6 3 2 3 6 8 9 6 4 1 1 1 1 4 5 6 2 3 8 9 3 5 9 1 1 1 1 4 5 2 3 5 6 8 9 8 3 1 1 1 1 3 2 2 4 5 6 8 8 9 0 1 1 1 1 Because of our amazing ability to decipher these differences instantly, representing complex data sets with data graphics is an efficient method to communicate what the numbers are saying. The visual display of quantitative information serves as a vehicle to traverse a complex data world. Graphics reveal data.
What is the best way to display the data? Let the data instruct you Do not have a pre-specified mode of displaying the data. Do whatever it takes to display data in the most appropriate way. Design should be content-driven not methodology driven.
CONTEXT, CONTEXT, CONTEXT! Put the data into a human context What are we comparing the data to? Previous rounds (historical context) Has the clinic performance rate improved over time? Other similar clinics How well is the clinic performing compared to other clinics: In the same district/province/region (geographic context) With the same caseload With the same resources Care Provided Documented Chart Selected Data Collected Data Analyzed Data Visualized Data Reported Data Interpreted Decisions Made
Graphical Excellence Have the audience in mind. What is the purpose of the graphic? Description, exploration Make large data sets coherent Reveal the data at several levels of detail Induce reader to think about the content, not the methodology Encourage eye to compare different pieces of data Spatial orientation, patterns, colors, formatting Avoid distortion of the data Axes, scaling, labeling Clear and easy to read Integrate words and numbers with graphics Tufte, Edward. The Visual Display of Quantitative Information. Connecticute, Graphic Press: 2001. Page 13.
Theory of Data Graphics Above all else show the data 1) Maximize data-ink ratio. I. Erase non-data-ink II. Erase redundant data-ink 2) Remove Chart Junk. I. Shadows II. 3D-rendering III. Other ornaments 3) Avoid Optical Vibration Before After Performane Rate Performane Rate 1 0.8 0.6 0.4 0.2 0 Clinical Visits Percentage of adult patients who had at least one visit in each half of the year 1 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 Clinic Clinical Visits Percentage of adult patients who had at least one visit in each half of the year 1 2 3 4 5 6 7 8 9 Clinic Tufte, Edward. The Visual Display of Quantitative Information. Connecticut, Graphic Press: 2001. Page 13.
120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Examples
Bar Charts Good for comparing a set of categorical values. Best when there are not too many categories and/or variables. 1 Clinical Visits Percentage of adult patients who had at least one visit in each half of the year Performane Rate 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 Clinic Tips: Organizing data from largest to smallest may be helpful in highlighting data. Keep it simple: do not use shadows or 3D rectangles.
Too many categories can make bar charts messy. When there are this many bars on a bar graph, make sure to ask yourself if it is contextually appropriate to compare all of the values on the bar chart. 100 90 Clinical visits (2011) Percentage of eligible adult patients who had at least one clinical visit in each half of the year. Performance Rate (%) 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Clinic
Too many variables per category can also make bar charts messy. Is it appropriate to compare all of the variables within a category? 100 90 80 Mean Clinic Scores by Indicator (2011) Performance Rate (%) 70 60 50 40 30 20 10 Clinical Visits TB Screening CTX Nutritional Assessment Prevention Education Alcohol Screening 0 A B C D E Clinic
Pie Charts Work well if you want to compare individual slices of the pie with the whole pie. It may be difficult to compare different sections of a given pie chart or to compare data across different pie charts. A bar chart (histogram or stack chart) or table may be more appropriate in that case.
Too many variables make a pie chart hard to manage. If the variables are numerical, consider using a histogram instead. You can also consider combining categories but remember that this could hide variation and alter how the data are interpreted. CD4 Count Distribution <50 51-100 101-150 151-200 201-250 251-300 301-350 351-400 451-500 501-550 551-600 601-650 651-700 701-750 751-800 801-850 851-900 901-950 951-1000 1000+ CD4 Count Distribution <50 51-100 101-200 201-250 400+
Tables Tables often work better than bar charts and pie charts when there are too many data points and too many descriptors of those data points. Many people may not consider this as a way to visualize data, but tables still use specific formatting and spatial orientation to communicate the data more easily. In terms of data ink, every piece of a table is critical information. However, tables may not be good at showing patterns over time. CD4 Monitoring Mean Clinic Scores Percentage of eligible patients who had at least one CD4 count during the review period
Table Formatting Tips Do not use gridlines. The space between the numbers visually separate categories. Underline the column headers Consider Zebra Striping: light shading to separate specific groups you want to highlight. Before After CD4 Monitoring Indicator Results Clinic Performance Rate Denominator A 60% 100 B 75% 150 C 50% 120 CD4 Monitoring Indicator Results Clinic Performance Rate (%) Denominator A 60 100 B 75 150 C 50 120
Line Charts Line charts work well to show trends over intervals of time (time series). The more data points, the better. Line charts show a continuous line even though data may be discrete. Tips: Use different colors to differentiate between different line. Remember that our eyes will naturally compare two different lines on the same chart. If two data points are not comparable, then maybe they should not be on the same graph. Label the lines directly on the chart instead of using a legend.
Line charts are very prone to distortion. 25 Percentage of eligible patients screened for tuberculosis Y Axis Scale: 0 to 25 Y Axis Scale: 0 to 100 100 Performance Rate (%) 20 15 10 5 Performance Rate (%) 75 50 25 0 Jan Feb Mar Apr May June 25 20 15 10 5 0 Jan Feb Mar Apr May June 0 Jan Feb Mar Apr May June Y Axis Scale: 15 to 20 Y Axis Scale: 0 to 25 Height > Width Performance Rate (%) Performance Rate (%) 20 19 18 17 16 15 Jan Feb Mar Apr May June
Box-and-whisker Plots Are a great way to compare different sets of data. Several different descriptive statistics can be compared: Max, min, upper quartile, median, lower quartile, range and interquartile range. Namibia Food Security Oct 10 - Mar 11 Jan - Jun 10 Jul - Dec 09 Jan - Jun 09 Jul - Dec 08 Review Period Jan - Jun 08 0 10 20 30 40 50 60 70 80 90 100 Performance Rate (%)
The next few examples illustrate how important labeling is. Labeling provides more context to the data, allowing for more rigorous and accurate interpretations of the data. Mortality Rate (# deaths / 1000 people/year) Mortality Rates of People Actively Playing Popular Sports in 2011 12 10 8 6 4 2 0 Soccer Rugby Cricket Golf Is playing golf more dangerous than other sports?
Mortality Rate (# deaths / 1000 people/year) 12 10 8 6 4 2 0 Mortality Rate of People Actively Playing Popular Sports in 2011 Average Age = 23 Average Age = 20 Average Age = 25 Average Age = 60 Soccer Rugby Cricket Golf
Performance Rate (%) 100 90 80 70 60 50 40 30 20 10 0 What can we conclude? Percent of Adults who received a TB assessment during the review period (Adult, 2008) Clinic A Clinic B Clinic C
Performance Rate (%) 100 90 80 70 60 50 40 30 20 10 0 Percent of Adults who received a TB assessment during the review period (Adult, 2008) n = 2 n = 150 n = 200 Clinic A Clinic B Clinic C Clinic C only has 2 eligible patients!
Write on Graphs: Use words, numbers and graphics in combinations Use words directly on graphs to provide more context. For example, on a clinic level run chart, use words and arrows to denote when a QI project was implemented. Here s an example from Namibia.
Graph/Table Combinations Graphs and tables can be utilized together. The table provides more context and detail while the graph reveals any patterns of the data. Here s an example using data form Uganda.
Sparklines: Intense, Simple, Word-Sized Graphics Invented by Edward Tufte, these powerful graphics add tremendously to the meaning of numbers. They provide context. For example, I can say that the current temperature is 30 degrees Celsius. However, if I include a sparkline that shows the weather during the previous 24 hours, it immediately puts that 30 degrees into context. The sparklines I showed in the previous slide show the spread of the data. Each little tick mark represents an individual clinic s score. The red mark is the mean of those scores. Since I oriented the spreads in the same column, I can quickly see how the spread changes from round to round.
Small Multiples When clinic level data are aggregated, detail at the clinic level is lost. Looking at longitudinal mean clinic scores, individual clinic trends cannot be extrapolated. There are several visualization techniques that encourage the eye to examine both clinic level and aggregate level patterns. Small multiples, a series of graphics that show the same combination of variables, is one such technique. Here is an example of what it would look like. Created by Jorge Camoes
Heat Maps Use color to encourage the eye to examine both clinic level and aggregate level patterns. In this example, each color represents a range of performance rates. The more red the color, the closer the performance rate is to 0%. The more green the color the closer the performance rate is to 100%. A B C D E F Jan Jun Jul Dec Jan Jun Jul Dec Jan Jun Mar Apr G Namibia Food Security Indicator Results Percentage of eligible adult patients assessed for food security by clinic and review period. Clinic H I J K L M N O P Key to Swatch Colors Rate (%) 0 to 10 11 to 20 21 to 30 31 to 40 41 to 50 51 to 60 61 to 70 71 to 80 81 to 90 91 to 100
Summary Context is essential for graphical integrity. Provide historical data when available. Label axes properly. Always provide denominators to percentages. Do whatever it takes to display the data in the best way with integrity and clarity. Data visualization should be content-driven not methodology driven Use combinations of words, numbers and graphics. Combine tables and charts together Creating an excellent data graphic takes time. Like good writing it requires revising and editing.