Data visualization of social survey analysis: a note for researchers Tania Burchardt, October 2012 Background This note arises from a British Academy project (ref MC110753) which involved testing various representations of results from analysis of the Life Opportunities Survey with nonacademic audiences, including civil servants and practitioners in the voluntary sector. The problem Analysis of large-scale survey data can produce powerful conclusions and is a common component of research in the social sciences. However, the techniques of analysis including cross-tabulations, significance testing and multiple regression do not lend themselves to straightforward communication with non-academic audiences, such as policymakers, journalists and campaigners. Often there seems to be a trade-off between representing results accurately and fully on the one hand, and representing results in a way which tells a story to non-specialists on the other. This note explores ways in which these twin objectives may be met. Considerations Good visualization means communicating clearly a message based on the results of the data analysis. But different messages may be most relevant for different audiences, so it is necessary in the first instance to think about who our audiences are, secondly to work out which results or messages from the results are most relevant for those audiences, and thirdly to design visualizations which best communicate those particular messages for those specific audiences. For example, in this project, results on inequalities in participation in leisure and cultural activities were presented to Age UK and to Disability Rights UK. The former were of course particularly interested in contrasts between the over- and under-65s, and in differences among the older population, while the latter were particularly interested in contrasts between disabled and non-disabled people, with age as a secondary consideration. These were more clearly communicated in separate graphics, even though the underlying analysis was the same. As academics we are generally reluctant to spell out key messages too explicitly, preferring to present the data and leave our readers to draw their own conclusions. But feedback from civil servants and voluntary sector practitioners in this project consistently favoured explicit messaging, for example, through use of the title of a slide or graphic: options A or B, rather than option C below. Indeed this was regarded as essential by one civil servant, if the audience was senior policy officials or ministers. 1
Percent with impairment Percent with impairment Percent with impairment Putting the message in the title helps (A or B preferred to C) Option A Rates of impairment increase with age and are higher for women than for men in all age groups except the oldest 80 70 60 50 40 30 20 10 0 16 to 24 25 to 34 35 to 44 45 to 54 55 to 64 65 to 74 75 to 84 85 and over 31 29 26 Women Men Option B Rates of impairment by age and sex 80 70 60 50 40 30 Rates of impairment increase with age and are higher for women than for men in all age groups except 85 and over 31 26 29 Women Men 20 10 0 16 to 24 25 to 34 35 to 44 45 to 54 55 to 64 65 to 74 75 to 84 85 and over Option C Rates of impairment by age and sex 80 70 60 50 40 30 20 10 31 26 29 Women Men 0 16 to 24 25 to 34 35 to 44 45 to 54 55 to 64 65 to 74 75 to 84 85 and over 2
As academics we are also used to demonstrating our credentials by naming the technique we have used (for example, ordered logit regression) and including information to show that we have thoroughly tested the validity of our model. Communicating our expertise and authority may be an important part of the message to our audience, but it is best dealt with separately, for example in an introductory slide giving details of the project team and/or methodology. Generally, technical details detract from, rather than contribute to, graphics intended to communicate to non-specialist audiences (Option D preferred to Option E below). A number of other contrasts between Options D and E can be observed: Orientation: chart labels are easier to read if the text is horizontal. This may mean rotating the chart (eg bars rather than columns). Order of elements: in Option D, the groups of bars conveying the most important information (in this case, the characteristics which have the strongest association with risk of impairment) are at the top of the chart. In addition, the reference category within each group comes first. Confidence intervals: in Option D, confidence intervals have been removed, and instead the bar representing a category which is not statistically significant at the 95% level is made transparent. Whether or not this is appropriate depends on whether differences between categories within a group are important (for example, if knowing that age 35-54 is significantly different from age 55-64 is part of the message, confidence intervals should be shown). Where confidence intervals are shown, it is considered best practice not to include cross-bars (unlike in Option E), since these draw attention to the ends of the range, which are not particularly important. In general, civil servants and practitioners interviewed for this project thought that showing which results were significantly different from zero was useful, but that confidence intervals themselves were too fiddly and detracted visually from the key message. Marginal probabilities: logit or probit coefficients indicate whether the association is positive or negative but their size is not intuitively interpretable. Logistic results are sometimes presented as odds ratios but interpretation going beyond negative and positive association (above or under 1 in the case of odds ratios) is difficult for non-specialists. Probit regressions usually produce results very similar to logistic regression, and can be reported as marginal probabilities. This may be preferable, although marginal probabilities must be calculated for a particular set of values of the covariates (usually, the mean). Whichever metric is used, the interpretation needs to be spelled out as clearly as possible within the graphic. 3
4
Option D Remove technical detail unless it is essential to the message (D preferred to E) Option E 5
Dynamic and interactive presentations Thus far, we have assumed that the visualization is static. Where dynamic elements can be introduced for example, in a live presentation or where an electronic resource is being developed - new possibilities open up. The first is to add elements sequentially, allowing the audience to absorb pieces of information one at a time. For example, the chart shown above as Option A could be built up showing the overall age distribution of impairment first, followed by the distribution for women and finally that for men. Another possibility is to show an overall graphic with lots of information to give an overall picture (like Option D), and then to unpack it, highlighting the key messages individually for example, homing in on the results by age and explaining them in more detail. Prezi is a presentation software package designed to enable this to happen (see Useful sources below). Screencasting is a technique in which conventional Powerpoint-style slides are interspersed with short videos of someone talking (for example, explaining the interpretation of the slide). In general, reducing the number of messages per graphic will help to clarify, and this is much easier to achieve in a dynamic presentation. Finally, where interaction is possible, for example, through a webpage or other digital publishing, it is possible to filter the information displayed according to the user s interests. A set of simple headline results or eye-catching graphics can be offered as an entry point. Users can click on those which interest them, leading to a more detailed breakdown or fuller description, and this can continue until full technical detail or even the raw data are displayed. The British Social Attitudes Survey 2012 digital edition is an example of this approach (see Useful sources). Final note Good visualization is time-consuming: identifying audiences and key messages; creating suitable graphics; testing and refining them. Copying and pasting Stata output into a Word document is quick and may be sufficient for some purposes. Using off-the-peg solutions, such as standard charts in Excel or Powerpoint, takes several steps, including reorganising the data and selecting the results to present. But in many cases, a graphic tailored for the specific audience and message will be necessary, and this can take anything from a few minutes to a few hours for each chart. Sufficient allowance of time and resources for visualization needs to be built in at project planning stage. 6
Useful sources Oxford Consultants for Social Inclusion (OCSI) / Communities and Local Government Improving data visualization for the public sector dataviz project http://www.improving-visualisation.org/ - good discussion of the principles of visualization - gallery of examples - links to web-based resources, including chart choosers Tufte, E (2001) The Visual Display of Quantitative Information. Graphics Press. - a useful discussion and guide to data visualization British Social Attitudes 2012 edition http://www.bsa-29.natcen.ac.uk/ - an example of interactive presentation of data, starting with headlines and clicking through to gradually more detailed breakdowns, analysis and interpretation Camtasia http://www.techsmith.com/camtasia.html - screencasting software Prezi www.prezi.com - presentation software package that enables an overview graphic to be broken down into component parts. Designed mainly for conceptual rather than statistical presentations, but could be adapted. 7