Reading Time: 7 Minutes
So there’s a chart making the rounds in the media right now that was created by U.S. Rep. Jason Chaffetz (R-UT and chair of the House Oversight Committee) and presented during a senate hearing with Planned Parenthood president Cecile Richards. If you haven’t seen the video already you can watch it here. Just skip to the 1:13 mark, Richards’ response to the chart at the end is pretty awesome.
And for reference, below is the train wreck of a chart Chaffetz attempted to pass as truth during the hearing:
So Chaffetz created the chart shown above to convey what he believed to be a troubling trend at Planned Parenthood, that is, the amount of abortions they conduct has skyrocketed in recent years while other services, such as Cancer Screening and Prevention have dropped. But one look at this graphic and any good analyst should instantly know that something is amiss.
So what’s wrong with it? First and foremost there’s no y axis, which is an instant chart formatting sin. But that’s not even close to the biggest issue here. Just take a closer look at what’s actually happening in that chart. There’s a statistical anomaly here (actually there's 2 statistical anomalies) that should be a dead giveaway for why this chart has been deceptively formatted to convey the creators bias.
The x axis is a time interval broken down by year, which spans a total of eight years (2006 to 2013). Although the only data labels within the chart itself are displayed for year 1 (2006) and year 8 (2013), the yearly interval on the x axis suggest that the plotted lines are based on equal (i.e. annual) distributions of data. Looking at this chart it’s pretty clear this isn’t the case. Take the abortions data series for example. For it to have actually followed the trend shown by the line the change in abortions carried out would had to have increased by exactly 5321.429 each year. The same logic applies to the Cancer Screening & Prevention Services data series, as it the total amount would have had to decrease by exactly 153,114 each year. Possible? Sure. Probable? No way.
Chaffetz has in fact committed two major and unforgivable data visualization sins here in the way he constructed his chart. First, he took the data points for 2 non adjacent years and stretched the data across an 8 year time frame making it appear that there has been a consistent downward/upward trend in either data set. When you have only 2 data points like this, a better way to visualize it would be through a bar chart with only 2 labels, one for year 1 and the other for year 8 (the only 2 years Chaffetz included).
Although the above is more truthful this still wouldn’t be my first choice for how to effectively present this data. In my opinion, comparing 2 intervals in a bar chart like the one above is best suited when the time intervals are equal (e.g. month over month, year over year) and adjacent (i.e. Quarter 1 vs Quarter 2, 2013 vs 2014, etc). A lot can happen during that 8 year time frame and if you’re going to show the change over time I would want to see those extra data points to better understand what’s happened. But more on that in a moment.
The second major sin Chaffetz committed is that he used a multi-axis chart and didn’t convey this anywhere in the chart labeling. You can already see in the bar chart above that the minimum and maximum range for Abortions vs Cancer Screening & Prevention Services are on 2 completely different scales. Some people believe that you should never use multi-axis charts. I don’t agree, but if you do use them you have to label your axis and make sure that you y axis baseline is 0.
I’ve taken the liberty of retrieving the missing datasets between 2006 and 2013 (which took about 10 minutes, it’s beyond me why Chaffetz staff didn’t do the same). Worth noting though that I couldn’t find the Planned Parenthood annual report for 2008 so there is 1 year of data missing. I’m committing a minor sin here by not including 2008 with blank values but for the purpose of this post we’ll leave it out.
Here’s what the chart looks like with 7 years of data and with both data sets on a single axis.
Looks a little different from Chaffetz's chart, doesn't it? And just to be thorough below is the same data on a properly formatted multi axis line chart.
In the multi axis chart you can see a cross over in the data series but it’s not nearly as dramatic as Chaffetz’ original chart. Also keep mind that these data series are on different axis (i.e. scales), so abortions don’t actually overtake cancer screening and prevention services. What is actually happening here is that the lines cross over because of the relative trend where the 2 series are moving in opposite directions, this is an important distinction.
That said, Planned Parenthood does a lot more than just abortions and cancer screening and prevention. So I’ve pulled the data for the full range of services they provide (keep in mind the missing data for 2008). Also, since we’re now looking at 6 data series across 8 years, a single axis line chart probably isn’t the best way to visualize this. There are actually a few different routes we could go to plot the data, but here’s one way that might put things into perspective.
Above is a stacked vertical bar chart. This doesn’t show total volume of services, rather, it shows the % of total services provided that each category of service represents. For any single year this data would be best visualized using a pie chart, but I like to sometimes use this approach to show relative share across multiple intervals (i.e. years).
So now you can see how abortions and cancer screening and prevention services rank in terms of all the services Planned Parenthood actually provided over the course of the seven years. For me 2 things stand out in this chart. First, the relative share of STI/STD testing and treatment relative to all services provided has increased fairly consistently over the years. Why did this happen? Who knows. But I'm sure there's more than one answer. It's also worth pointing out that there are tons of external factors which could affect why the number of services provided for one category vs another increases or decreases in any given year. Variables like growth in the population and changing demographics are two obvious ones worth looking into. If Chaffetz were serious about drawing meaningful conclusions from this data he would understand that getting to the truth would involve accounting for such variables.
Interestingly enough, what the stacked bar chart above also shows is the relative share of total services abortions actually represent. Did the number of abortions Planned Parenthood carry out increase between 2006 to 2013? Yes. Are abortions becoming a more dominant service relative to all services offered by Planned Parenthood? Nope, not even close. In fact, as a % of all services the relative share of abortions hasn’t budged in 8 eight straight years. Seriously, it stayed constant at 3% of total services every single year from 2006 to 2013
There is certainly more than one right way to go about visualizing data, but there are many, MANY wrong ways to do it. Sometimes we convey data poorly out of ignorance, or a lack of awareness of the impact design choices can have on the audience’s interpretation of the data. But other times, we choose the wrong path out of pure manipulation. Chaffetz’s case falls into the latter. He knew what message he wanted to convey regardless of what truth were to be found in the data, and he manipulated the presentation of the data to tell his own story. He knew exactly what he was doing. In fact, he even doubled down on his use of the chart after a blowback in the media. So Chaffetz is either so oblivious to his error or he’s unwilling to accept the deception that he’s been called out on. Either way, I think he deserves recognition for his contribution to the field of statistics and data visualization. From now on I will be personally referring to any line chart created as deceptively as this a Chaffetz Curve. What do you think?