5 Examples of Awful Data Visualization

Reading Time: 11 Minutes


This Friday I’ll be giving a short presentation on data visualization (alongside some top notch speakers) at an event co-hosted by General Assembly and Keboola. Tickets are still available and if you’re in Singapore you should stop by. The event starts at 7PM and is free! You can register here.

In anticipation of the event I’ve been thinking a lot about data visualization, design principles and storytelling. I love my job because I get to spend a fair amount of my time thinking about creative ways to communicate through data. To get my creative juices flowing I often look for inspiration in a few different places, including but not limited to Nathan Yau’s Flowing Data and David McCandless Information is Beautiful. Yau and McCandless are both leaders in this field who create and curate some of the best examples of data visualization you can find on the web today. But beyond their craft they are also educators who advance a dialogue on best practices and principles for what I like to call empirical storytelling.

Flowing Data and Info is Beautiful can be great sources of inspiration if you're on the lookout for beautiful, creative and cutting edge data visualization. But sometimes I also like to draw a little inspiration from the worst examples of dataviz. These are the kinds of charts and infographics that ignore every basic rule and design principle when it comes to visualising data. From the deceptive to the confusing to the downright ugly monstrosities created in the name of statistics, sometimes it’s the lessons you learn from failure that are the most impactful.

Enter WTF Visualizations, a fabulous Tumblr blog that curates a collection of the most sinful dataviz blunders around. It’s as informative as it is amusing, and I thought it would be fun to take a look at a few recent WTF Viz submissions and break down what, exactly, makes them such a strain to both the eyes and the mind.

1 – Misleading labels and headers

What is it?

This is a snippet of a full graphic created by MPH Today, and is based on a recent peer reviewed article which analyzed “79 studies on the effects of stress and the human body”.

What’s wrong with it?

Of the 5 examples we’ll run through today this is probably the least sinful of the group. Design wise I actually think the graphic looks ok, though it has a little too much copy for my liking. That said, there is a problem with the section shown above, particularly the column titled Relationship. The horizontal bar chart is showing the volume of something, in this case, the occurrence of each symptom relative to the workplace stressor. Sure, there is a relationship between the symptom and the stressor, but labeling the column header as relationship is both confusing and misleading. The bar chart is either showing the total occurrences (in volume) or the frequency at which the symptom occurs, represented as a % of the total sample. I don’t know which because the graphic doesn’t tell me (and I couldn’t check because the journal article is behind a pay wall). But either way, the column title should clearly state the unit of measure (e.g. count, sum of, % of, etc) so the reader can easily understand what was measured and how to interpret it.

What should they have done?

In this case the horizontal bar chart was the right choice, but always remember to clearly and meaningfully label your chart or table axis and headers.

2 - Charts within charts

What is it?

Now that we’re warmed up let’s jump right into the deep end. Here’s an example of data visualization gone wrong, terribly wrong.  This graphic was created by a company named JBH, who by the way, create infographics for a living. I hate to name and shame, but seriously, if you’re going to tout infographic production as a core offering you need to understand the basic principles of data visualization and design. What this graphic is showing is the “State of Social Media Marketing in 2015”, which includes a range of stats related to social media network usage and behviour. The full graphic can be viewed here.

What’s wrong with it?

Honestly I had to stare at this graphic for about 5 minutes before I understood what was happening, and I'm still not sure I get it. The most problematic part of the graphic is the section shown above. It’s downright confusing.

The first problem is that they’ve presented a volume metric (Total Users) as a ratio metric (i.e. 99.48%), and it’s unclear to me as to what the data is showing here. Is this the % of total users who access each app on an Android device? If so, the only interpretation I can derive from this is that 99.48% of users of the YouTube mobile app in the USA are using an Android smartphone. But intuitively this can’t be true. I mean, surely more than 0.52% of YouTube app users in the U.S. are on iOS. Apple has a marketshare of roughly 43.6% in the U.S. and YouTube mobile is a popular app, so this just doesn’t seem possible. Which means that a) their data is wrong, b) they have twisted the interpretation of this so far it’s impossible to read, or c) I’m completely misreading this. But with this statement – “According to data for the USA from SimilarWeb, the share of total Android users was” – I’m just not sure how else this graphic can be read.

But the confusion doesn’t end there. The inner circle, which shows the % of active users, is also hugely problematic. My first question is, are active users a subset of total users? It seems logical that this should be true, and if so they’ve actually misinterpreted the data (e.g. that Twitter, Pinterest and LinkedIn have more active audiences). This graphic actually shows that YouTube and Facebook have the highest levels of activity, and I think what they’ve done is incorrectly conclude that the level of active users for Twitter, Pinterest and LinkedIn relative to the % of total users means that they have higher rates of activity, which is totally wrong. Either way, this graphic is poorly constructed and unnecessarily confusing. You shouldn't have to think this much to consume and interpret the meaning of an infographic.

What should they have done?

Honestly, I don’t know where to begin. My first suggestion would be to never create a pie chart within pie chart, or any other chart type for that matter. Beyond that, there are tons of other issues with the data they’ve used and how they have presented it (e.g. volume vs ratio metrics). Simply removing the pie within a pie isn’t going to solve this, so my suggestion would be to scrap this graphic completely and start over.

3 - The parts don’t add up to a whole

What is it?

This was created by a U.S. based storage company named Sparefoot. The graphic above is a snippet of the full infographic which was based on a combination of U.S. census data and Gallup polls, and was intended to show how American society is changing over time with respect to household living arrangements.

What’s wrong with it?

I’m a sucker for flat design and nice typography so I almost gave this one a pass. But the data visualization sin here is common enough that it should never happen. In short, the chart creator has used multiple values that aren’t part of a whole in a single pie chart. If you look at the above graphic you can see that each pie chart is related to a state (e.g. have children, don't have children, etc), and the charts are supposed to show the change over time between 3 non-adjacent time intervals (1990, 2003 and 2013). Quick tip, if you’re attempting to show change over time a pie chart is never going to be the right choice, a line or bar chart would be better suited to the task.

Anyways, the main issue here is that the 3 data points (i.e. time intervals) aren’t part of a whole, but they've been presented as if they are. For example, the values attached to the “Have children” pie chart shows data from 3 distinct data sets, and these don't combine to make 100% of something. By presenting them in a pie chart, the creator has unintentionally changed the meaning of the numbers. You can see the difference between the actual vs charted values (what the data means in the pie chart) in the table below.

Actual vs Charted Value

What should they have done?

Although I mentioned above that line charts are typically better suited to showing change over time, I wouldn’t recommend a line chart here as the time intervals aren’t adjacent (year over year), so a bar chart would be the best way to go.

4 - Failed connections

What is it?

This is actually taken from the same JBH graphic mentioned above (sorry JBH, but this infographic was a doozy).

What’s wrong with?

This graphic is definitely not as sinful as the first one covered above, but it presents the reader with some formatting problems that make it pretty painful to read. In particular, the data series values and labels have been separated from the chart. In fact, there isn’t even a clear legend, the data series labels are embedded within a paragraph of text. This renders the use of the pie/donut chart almost completely useless as the reader needs to re-associate the labels and values with the visualization in their head. Even more problematic is the colour coding. At first I thought they had synchronized the pie slice colours with the percentages, but then I realized that there are more slices than values (i.e. percentages). For example, there are 4 slices but only 3 values in the top chart, and 6 slices but only 5 values in the bottom chart. That and the colours don’t even match, there is a value highlighted in pink (i.e. 5%) on the bottom chart and this colour is nowhere to be found in the pie. Maybe the pie charts were just generic stock images and have no relation to the numbers in the paragraphs. But if that’s the case, they chose stock imagery that is strikingly close in both the number of datapoints (i.e. slices) and colour pallete. Either way, this one’s a mess.

What should they have done?

The pie chart here is fine, but the lesson is always include a legend and clear labelling, try to avoid separating things like the data values and labels, and finally, make sure your consistent with colour coding.

5 - Meaningless visualization

What is it?

This graphic was created by an agency called Blueberry Labs and shows the most common colours used by brands.

What’s wrong with it?

The author of this graphics was probably just looking for a visually appealing way to represent these numbers as a means to spice up the graphic. Unfortunately, they’ve created a confusing visualization which has 2 core problems. First, the size of the bubbles have no relationship with the values within them (e.g. why is 13% bigger than 28%?). Second, the overlap of the bubbles creates an unintentional venn diagram which can be misleading. The latter issue might sound like I’m being picky but they are showing relational data, so when I see the bubble overlap I ask questions like, is the overlap showing me another relationship, does the overlap of red and yellow show me the % of top brands that use orange? Again, I know I might sound overly picky here but they have chosen to visualize this data in a graphical way and have employed design choices that have very specific meanings in other applications. So if you going to use bubbles that contain a value and have them represented in different sizes, then make the size relative to the value. If you’re going to use semi-transparent overlapping bubbles that have zero relation, well, just don’t.

What should they have done?

There are 2 different data points here. For the 4 bubbles on the left, you might think that you can use a pie chart, but you’d be wrong. We need to know a little more about how the data was collected and coded, but I can tell right away that the 4 colours were not mutually exclusive (as in, a brand can use more than 1 colour). You can tell this from the graphic because the 4 values don’t equal 100%. This means that the best way to represent this data would be through a bar chart, although it looks to me like there is data missing (surely there are more than 4 colours), I would want to plot the full range of colours in a chart to get the full picture.

As for the data points on the right side (i.e. 95% and 5%), these are standalone stats so there isn’t really a way to meaningfully include these on a chart with the data on the left.  

Conclusion

As I’ve mentioned in previous posts, there’s more than one right way to go about visualizing data, but there are many, MANY wrong ways to do it. These are examples of the latter. If you’re work involves presenting data in visual ways, and almost every job does, then you should ensure you know some of the basic chart visualization design principles and do’s and don’ts. That way, you won’t risk ending up on WTF Visualization. Happy charting!