Graphs and charts often mislead by obscuring the unreliability of their source data. But even if a graph-maker wants to do better, it can be hard to present such information intelligibly, without long or technical sidebars. Here is one approach for visually displaying both the primary data, and their reliability, in one graph.
Above is simple bar graph. Let's imagine the data are missile tests conducted by the military: different guidance systems are installed and fired, and the likelihood of their hitting a target is recorded. B appears the clear winner, while A and C seem to be complete duds.
We might ask how reliable these data are. Various statistics attempt to measure that -- from standard deviation to p-values -- but a very basic figure remains invaluable: the sample size in each category; that is, the number of data points recorded to create aggregate scores (hence the n= notation).
You can see from the second graph that we should seriously doubt some of these numbers: half the guidance systems have not been tested very thoroughly; and our outstanding outlyer, B, has only had a single test!
How do we best inform a reader about the reliability of the test results?
- We cannot present the reader with raw n-values: he does not know what they are, and will ignore them. Graphing them separately does not fundamentally resolve that problem, and could add confusion by appearing like a second set of tests.
- Statisticians have developed visual apparatus like error bars, but these are far too technical. (And my experience has shown that even scientists believe statistically insignificant results quite readily, so these are no panacea.)
- A very small data set could be shown completely, in a dot or scatter plot, but large datasets become hopelessly messy (and methods to reduce overplotting have their own drawbacks).
One Possible Method
What we need is a visual queue for reliability that is integrated into other graphs, with a form that implies its meaning: that is, there is a visual metaphor at work, so that unreliable results appear less reliable.
In this graph we essentially have a set of "fuel gauges" where the size indicates how seriously each should be taken, and the fill mark indicates its reading. I think the dubious nature of B becomes clear, even as its value remains obviously high; while D and E are equally reliable despite E's higher value.
What I've done here is combined the two previous graphs. Each category gets a bar, with a filled portion appropriate to the percentage value. The size of each bar corresponds to the sample size, scaled so as to fit in the graph. (One could also scale them by some constant or a significance test.) Vertical position also depends on value: the top of each filled bar is set to the y-axis.
Some will dislike the redundancy here. But this allows simultaneous and direct comparisons to be made between category's values, and between their reliability scores. Redundancy should also be less of a sin when presenting to a lay-audience: repetition sometimes has merit, and we gain the visual metaphor of a filling gauge, which reduces the need for further explanations. (And I for one have never been convinced by Tufte's ink-to-page ratios.)
The next time you need graph intended for a general audience, try to encode reliability data as part of it -- and if appropriate, try out this "fuel-gauge" method.