Should Chart Y-Axis Baselines Always be Zero? Context is Everything

Reading Time: 5 Minutes

Matt Yglesias and Johnny Harris over at Vox recently made a great video about a rather contentious issue when it comes to storytelling and data. That is, whether or not you should always set Y-axis baselines in bar or line charts to zero. I’ve written about this a few times in the past – here when I was covering a clear example of someone deceptively using data to tell a biased story, and here when I was discussing the different types of people who use data to tell a story and the challenges we face when it comes to interpretation.

If you haven't seen the video yet, take two and a half minutes to watch it below.

First things first, I want to be clear that I completely agree with everything Yglesias and Harris said in their video. Their main point is that context is everything and, provided you have the best intentions, there are times when not using a zero baseline is acceptable. I’ve stated in the past, perhaps rather hastily, that Y-axis baselines should always be zero without leaving much room for discussion. And I concede that Yglesias and Harris make a good case for why this shouldn't be an inflexible rule. Simply put, the context of your data and the variables you’re working with will determine when (and when not) to use a zero baseline in your charts.

I do however want to take a moment to reconcile my point of view with the core message of Yglesias and Harris' video. Yes, I agree that we can all chill out a bit when it comes to enforcing the zero baseline rule. But I also believe that this rule has merit and should be observed as it helps to ensure we, as data storytellers, are being as truthful as possible.

In my previous article about the meaningful interpretation of data I profiled 3 types of people when it comes to using data to communicate a belief or point of view. You can read the full article for the details, but just to recap, the 3 types of people are:

The Good (aka the Truthful) - Those who understand the fundamentals of data collection, data cleaning and analysis, who adhere to sensible rules of formatting and presentation, who seek to present data in consistent and logical ways, and those who do not manipulate the presentation or interpretation of data as a means to communicate their own view, bias or hypothesis regardless of the outcome.

The Bad (aka the Deceptive) - Regardless of their understanding of data collection, data cleaning and analysis these types of people knowingly manipulate the presentation, formatting or interpretation of data as a means to communicate their own view or hypothesis even if it's not true.

The Ugly (aka the Ignorant) - Those who have a limited understanding of the fundamentals of data collection, data cleaning, and analysis, and who attempt to present data in a truthful way but through errors in analysis or interpretation they misinterpret or unintentionally communicate an outcome which is not true.

In the video Yglesias and Harris distinguish between those who use non-zero baselines deceptively (the Bad) and those who have the best intentions and use non-zero baselines to help tell their story (the Good). Here’s a great quote from him on this:

“The truth is that you certainly can use truncated axes to deceive. But you can also use them to illuminate.”

I totally agree. As long as you don’t intentionally deceive your audience changing an axis baseline can actually help you tell your story in a more impactful way. That said, I think the authors are missing one key point here as they're drawing a conclusion between storytellers on opposite ends of a spectrum. That is, those who use data to deceive (i.e. the bad) vs those who seek to discover and convey truth (aka the good). But what about the ugly (aka the ignorant)? I think those who staunchly call out and enforce the zero baseline principle as an inflexible rule are doing so with the best intentions, particularly because there’s no shortage of people who make crappy, deceptive charts out of pure ignorance and not because they are attempting to dupe their audience. Deceptive uses of data and chart formatting are sometimes easy to detect (see Jason Chaffetz's impossible chart or almost any chart on Fox News). But it's the data storytellers who are ignorant to chart formatting best practice that we should be most concerned about, and it's precisely because of these types of people that I think we shouldn't completely throw this rule away.  

The zero baseline rule is an important one to follow and one that I will continue to abide by and teach to others. Why? Because it ensures that those who are less skilled and experienced in the practice of data visualization follow sensible principles and guidelines that ensure they are being as truthful as possible. It is, however, a rule that is much more flexible in practice than many of us allow it to be. And given the right conditions, it's a rule that can be broken from time to time. 

That being said, after watching Yglesias and Harris’ video, two learnings came to mind which I think will help reconcile the different perspectives on this issue. One learning is for the creators of charts and data visualization while the other is for the readers (i.e. everyone).

For Chart Creators

When creating a bar or line chart you should always start with a zero baseline before adjusting your axis so you can understand how changing the axis affects your data and the interpretation of it. Also, before switching to a non-zero baseline, ensure that you’ve considered the size of your dataset. For example, are you looking at a long enough time frame to ensure that moving to a non-zero baseline doesn’t change the context of your data? Skip to the 2 minute mark in the video above for a good example of how a time interval can drastically change the context of your chart.

For Chart Readers

You should always be aware of the context of the chart and should question whether the data is being conveyed in a way that is accurate and truthful. There are lots of cues that might give away a deceptive chart and a non-zero baseline is one of them. This isn't to say that a chart without a zero baseline is always trying to trick you, rather, that you should always be critical of data before accepting it as truth.

Do you agree with Yglesias and Harris or my points above? Let me know in the comments below.