On Thursday 15 September 2016, a discussion took place during class about the reliability and credibility of a news source. One of the students argued that “When there are graphs based on statics, you could assume that the source is trustworthy”. The reasoning may a little bit simplistic, but not surprising. Idris, Jackson and Abrahart (2011) proved that the visualisation of data has more impact on the perceived credibility of information than the actual authority of the data source. Pete Warden, an experienced data scientist agrees:
“The wonderful thing about being a data scientist is that I get all of the credibility
of genuine science, with none of the irritating peer review or reproducibility worries …
I thought I was publishing an entertaining view of some data I’d extracted,
but it was treated like a scientific study”.
The illustration Warden uses is the Facebook friend network visualisation across the United states, which was extremely popular and even cited in the New York Times as evidence for growing social division. Although Warden clarifies in his article that the reasoning behind his network graph was perfectly solid, the clustering process was “produced by me squinting at all the lines, coloring in some areas that seemed more connected in a paint program, and picking silly names for the areas”.
It seems like the credibility of the visualisation of Warden was considered high, even though the visualisation was in fact intended as a “bit of fun”. This example is an important acknowledgement of a serious problem: the more visual appealing a data visualisation is, the more credibility it deserves. Even when the sources of the visualisation are not credible (Idris, Jackson and Abrahart, 2011).
The example of Warden seems quite harmless, but there are a lot of seriously misleading data visualisations that go unnoticed. Mostly because they are shown quickly on a screen, or because the reader is distracted by visual bells and whistles (Cairo, 2015). For example, December 2015, the White House announced: “Good news: America’s high school graduation rate has increased to an all-time high”. The announcement Twitter included the following ‘bar’chart:
It seems like an appropriate graph, but the question is: what does it even mean that five books is equal to 75%, or that 16 books is equal to 82%? But even more important: this is a column chart, and column charts must always start the y-axis at zero (Cairo, 2015). Why? See it yourself, beneath the same data with an appropriate scale:
Another good example of this phenomena is the map of the results of the Scottish independence referendum per region. The green surfaces are against independence and the red surfaces support the independence of Scotland. As you might have noticed, the colors are already a bit misleading, because our instinct will associate red with negative and green with positive. However, looking at the following chart, what do you suppose? A result of 90 percent against, 80 percent against, or 70 percent against?
Actually, the result was 55 percent against, because the Scottish population is highly concentrated in geographically compact urban areas. It seems that we are used to think that how greater an area on a chart is, the more meaningful it is – which is, of course, not the case a for geographical representations.
Rather than an essay against data visualisation, this blog should be read as a caution against blind acceptance. Because besides misleading data visualisations, there are hundreds of reliable visualisations which bring data alive that could otherwise take hours to unpick; since human beings consume information faster when it is expressed in diagrams or graphs than when it is presented as text (Verner, Wainwright and Schoenefeld,1997).
Nonetheless, it is always important to ask yourself the question: when is a visualisation credible? Unfortunately, there is no pat answer for this question. In some cases authors present their work as credible even in cases where the author may have taken some liberties in preparing the graphic (Hullman and Diakopoulos, 2011). In the future, authors should be forced to explicitly state on the bottom of a graph or illustration if it contains ‘predictions’ or other limitations. Until then, be critical!
Cairo, A. (2015). Graphics lies, misleading visuals: Reflections on the challenges and pitfalls of evidence-driven visual communication. In D. Bihanic (Ed.), New challenges for data design (pp. 103-116). Springer-Verlag, London.
Hullman, J. & Diakopoulos, N. (2011). Visualization rhetoric: Framing effects in narrative visualization. IEEE Transactions on visualization and computer graphics, 17 (12), 2231-2240.
Idris, Nurul Hawani and Jackson, Mike J. and Abrahart, Robert J (2011) map mash-ups: what looks good must be good? in: gisruk conference 2011, 27-29 april 2011, portsmouth. htp://eprints.utm.my/12576/1/NurulHawaniIdris_GISRUK.pdf
Verner, O. V., Wainwright, R. L., & Schoenefeld, D. A. (1997). Placing Text Labels on Maps and Diagrams using Genetic Algorithms with Masking. INFORMS Journal On Computing, 9(3), 266-275. doi:10.1287/ijoc.9.3.266